CIS 6930 Spring 26

Logo

Data Engineering at the University of Florida

CIS 6930 SP26 - Lecture Reading List

This document maps required and optional readings to each lecture in the course.


Module 1: Foundations (Weeks 1-4)

Week 1: Course Setup

No readings - infrastructure focus


Week 2: Model Context Protocol (MCP)

Lecture: MCP Fundamentals, Building MCP Servers, Multi-agent Pipelines

Type Paper Link Summary
Required Model Context Protocol Specification MCP Docs Official specification for MCP, covering core concepts, architecture, and protocol design. Essential for understanding how MCP enables communication between LLMs and external tools/data sources.
Required MCP Quickstart Guide MCP Quickstart Hands-on guide to building your first MCP server. Covers server creation, tool registration, and client integration.
Optional Building MCP Servers Tutorial MCP Servers Detailed tutorial on implementing custom MCP servers with examples.
Optional Multi-Agent Orchestration Patterns MCP Patterns Architectural patterns for building multi-agent systems with MCP.

Week 3: Prompt Engineering Basics

Lecture: Prompt engineering fundamentals, Chain-of-Thought, Structured Outputs

Type Paper Link Summary
Required Chain-of-Thought Prompting Elicits Reasoning (Wei et al., 2022) arXiv:2201.11903 Demonstrates that including reasoning steps in prompts enables LLMs to solve complex arithmetic, commonsense, and symbolic reasoning tasks. A 540B-parameter model with 8 CoT examples achieved SOTA on math word problems. Foundational work for understanding prompting techniques.
Optional The Prompt Report: A Systematic Survey of Prompting Techniques arXiv:2406.06608 Comprehensive taxonomy of 58 prompting techniques and 33 vocabulary terms. Use as a reference guide. If reading, focus on Sections 1-3 (Introduction, Taxonomy, Core Techniques) only - the full survey is extensive.
Optional Large Language Models are Zero-Shot Reasoners (Kojima et al., 2022) arXiv:2205.11916 Shows that simply adding “Let’s think step by step” improves reasoning performance dramatically (+61 percentage points on MultiArith).
Optional Tree of Thoughts: Deliberate Problem Solving with LLMs arXiv:2305.10601 Extends CoT by allowing exploration of multiple reasoning paths with backtracking. Achieved 74% success on Game of 24 compared to 4% with standard CoT prompting.
Optional Graph of Thoughts: Solving Elaborate Problems with LLMs arXiv:2308.09687 Models reasoning as an arbitrary graph with aggregation and refinement. Achieves 62% better quality than ToT on sorting while reducing costs by 31%.

Discussion: How to read research papers

Type Paper Link Summary
Required How to Read a Paper (Keshav) PDF Classic 3-pass method for reading research papers efficiently: first pass for overview (5 min), second for understanding (1 hour), third for deep comprehension.
Required LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models (Arora & Dell, 2024) ACL 2024 Demo Open-source package making transformer-based record linkage accessible without deep learning expertise. Treats linkage as text retrieval using sentence embeddings. Used as the in-class 3-pass reading exercise.

Week 4: Data Integration & Entity Resolution

Lecture: Data integration (reuse lecture7) Lecture: Entity resolution (reuse lecture9)

Type Paper Link Summary
Required LinkTransformer: Record Linkage with Transformer LMs ACL 2024 Demo Open-source package making transformer-based record linkage accessible without deep learning expertise. Outperforms string matching methods by a wide margin and supports multiple languages.
Optional Entity Resolution in Voice Interfaces (EMNLP 2024 Industry) EMNLP 2024 Industry application of entity resolution in voice assistant systems.

Module 2: LLM for Data (Weeks 5-8)

Week 5: LLMs for Data Tasks & Guest Speakers

Guest Speaker: Dr. Shiree Hughes (Monday, Feb 9) - Big Data at General Motors Guest Speaker: Mikhail Sinanan (Friday, Feb 13) - Data Engineering at Spotify

Guest Speaker Preparation: Big Data Fundamentals

Type Resource Link Summary
Required Apache Kafka Introduction Kafka Intro Official introduction to Kafka’s distributed event streaming platform. Covers producers, consumers, topics, and how Kafka handles real-time data feeds at scale.
Required Apache Spark Quick Start Spark Quick Start Official getting started guide for Spark. Introduces the API through interactive shell and shows how Spark processes data in-memory for 100x faster performance than MapReduce.
Required Hadoop Tutorial Overview GeeksforGeeks Hadoop Overview of Hadoop ecosystem: HDFS for distributed storage, MapReduce for processing, and YARN for resource management.
Optional Apache Kafka for Beginners DataCamp Kafka Comprehensive beginner guide covering Kafka architecture, brokers, partitions, and consumer groups.
Optional Spark By Examples Spark By Examples Hands-on Spark tutorials with code examples in Scala and PySpark.
Optional Building Blocks of Hadoop Pluralsight Deep dive into HDFS, MapReduce, and YARN architecture.

Guest Speaker Preparation: Spotify Data Platform

Type Resource Link Summary
Required Spotify’s Data Platform Explained Spotify Engineering Overview of Spotify’s data infrastructure processing 1.4 trillion data points daily. Covers data collection, processing, and management architecture.
Required NerdOut@Spotify: A Trillion Events Spotify Podcast 38-minute podcast on handling 50M events/second, Kafka to cloud transition, and data quality at scale.

Week 6: ML Fundamentals & Embeddings

Lecture: ML fundamentals (reuse lecture11 - PyTorch) Lecture: Embeddings (reuse lecture14)

Type Paper Link Summary
Optional Word2Vec (Mikolov et al., 2013) arXiv:1301.3781 Introduces efficient architectures for learning word vectors from large datasets. Achieved SOTA on syntactic and semantic word similarity while training on 1.6B words in less than a day. Foundational work for modern embeddings.
Optional MTEB: Massive Text Embedding Benchmark arXiv:2210.07316 Comprehensive benchmark spanning 8 embedding tasks, 58 datasets, and 112 languages. Reveals that no single embedding method dominates across all tasks. Essential for understanding embedding evaluation.

Week 7: RAG Systems

NEW Lecture: RAG architecture overview

Type Paper Link Summary
Required RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP (Lewis et al., 2020) arXiv:2005.11401 Foundational RAG paper combining parametric and non-parametric memory using Wikipedia retrieval. Achieved SOTA on open-domain QA and generates more factual, specific content than pure parametric models.
Required Enhancing RAG: A Study of Best Practices COLING 2025 Systematic study of RAG best practices and optimization strategies.
Optional Knowledge Graph-Guided RAG (KG2RAG) NAACL 2025 Integrates knowledge graphs with RAG for improved retrieval.
Optional GRAG: Graph Retrieval-Augmented Generation NAACL 2025 Findings Graph-based approach to retrieval-augmented generation.
Optional GraphRAG (Microsoft) arXiv:2404.16130 Addresses RAG limitations for global queries by building entity knowledge graphs with community summaries. Shows substantial improvements in answer comprehensiveness and diversity for analytical questions.
Optional Multi-Agent Filtering RAG (MAIN-RAG) ACL 2025 Multi-agent approach to filtering and refining RAG outputs.
Optional Towards Omni-RAG ACL 2025 Explores unified RAG approaches across modalities.

NEW Demo: Vector databases (Chroma, FAISS)

Type Paper Link Summary
Optional Introduction to Information Retrieval (Ch. 6-7: Vector Space Model) Stanford IR Book Classic textbook chapters on vector space models and similarity search.

Week 8: Evaluation

NEW Lecture: LLM evaluation fundamentals

Type Paper Link Summary
Required RAGAS: Automated Evaluation of Retrieval Augmented Generation EACL 2024 Reference-free framework for evaluating RAG systems across three dimensions: retrieval relevance, LLM faithfulness, and generation quality. Enables faster evaluation cycles without ground truth annotations.
Required In Benchmarks We Trust… Or Not? EMNLP 2025 Critical examination of benchmark reliability and limitations.
Optional MMLU: Measuring Massive Multitask Language Understanding arXiv:2009.03300 Comprehensive benchmark covering 57 academic and professional domains. Found that even the best models fall short of expert-level performance, especially on socially critical subjects like law and morality.
Optional Evaluation of LLMs Should Not Ignore Non-Determinism NAACL 2025 Examines the impact of LLM non-determinism on evaluation reliability.
Optional Examining Robustness of LLM Evaluation ACL 2024 Studies robustness issues in LLM evaluation methodologies.
Optional HalluLens: LLM Hallucination Benchmark ACL 2025 Benchmark specifically designed for measuring LLM hallucinations.

Module 3: Advanced Topics (Weeks 9-11)

Week 9: SPRING BREAK

No readings


Week 10: Ethics and Human-AI

Lecture: AI fairness and bias (reuse lecture18)

Type Paper Link Summary
Required Bias and Fairness in Large Language Models: A Survey Computational Linguistics 2024 Comprehensive 80+ page review proposing taxonomies for evaluation metrics, datasets, and mitigation techniques for LLM bias. Organizes mitigation by intervention stage: pre-processing, in-training, intra-processing, and post-processing.
Optional A Trip Towards Fairness: Bias and De-Biasing in LLMs *SEM 2024 Explores approaches to identifying and mitigating bias in LLMs.
Optional Addressing Statistical and Causal Gender Fairness in NLP NAACL 2024 Findings Examines gender fairness from statistical and causal perspectives.

NEW Lecture: Human-in-the-loop systems

Type Paper Link Summary
Required Building Effective Agents (Anthropic, 2024) Anthropic Research Practical guide distinguishing workflows (predefined orchestration) from agents (dynamic LLM control). Presents 6 design patterns from augmented LLM to orchestrator-workers. Emphasizes simplicity, transparency, and careful tool design.
Optional ReAct: Synergizing Reasoning and Acting in LLMs arXiv:2210.03629 Interleaves reasoning traces with actions for better task-solving. Overcomes hallucination by grounding in external knowledge, achieving 34% and 10% absolute improvements on decision-making benchmarks.
Optional Efficient Agents: Building Effective Agents While Reducing Cost arXiv:2508.02694 Strategies for building cost-efficient agentic systems.

Week 11: Paper Writing Sprint

NEW Lecture: Paper writing workshop

Type Paper Link Summary
Required How to Write a Great Research Paper (Simon Peyton Jones) Video Classic talk on research paper writing: start with the idea, write early and often, structure for clarity. Emphasizes that writing is thinking.
Optional The Science of Scientific Writing PDF Cognitive principles for clear scientific writing: put old information before new, keep subjects and verbs close together.

Module 4: Completion (Weeks 12-15)

Week 12: Crowdsourcing

Lecture: Crowdsourcing and annotation (reuse lecture15)

Type Paper Link Summary
Required HumEval 2024 Workshop Proceedings ACL Anthology Collection of papers on human evaluation of NLP systems.
Required Capturing Perspectives of Crowdsourced Annotators NAACL 2024 Proposes AART to learn individual annotator representations rather than majority voting. Addresses fairness concerns for underrepresented perspectives in subjective classification tasks.
Optional On Crowdsourcing Task Design for Discourse Annotation COLING 2025 Best practices for designing crowdsourcing annotation tasks.
Optional Evaluating Saliency Explanations by Crowdsourcing LREC-COLING 2024 Using crowdsourcing to evaluate model explanations.

Week 13: Paper Review

NEW Lecture: How to write good reviews

Type Paper Link Summary
Required Advice for Peer Reviewers (ACL) ACL Reviewer Guidelines Official ACL guidelines for writing constructive, fair peer reviews.
Optional NIPS Experiment (reviewing consistency) arXiv:2109.09774 Famous study showing 25% of papers had inconsistent accept/reject decisions when reviewed by different committees. Highlights subjectivity in peer review.

Week 14-15: Final Push

Lecture: Visualizations (reuse lecture19) Focus on project completion - minimal new readings


Foundational Papers (Reference List)

These are classic papers students should be aware of, organized by topic:

Language Models

| Paper | Link | Summary | |——-|——|———| | GPT-3: Language Models are Few-Shot Learners | arXiv:2005.14165 | Landmark paper demonstrating that scaling to 175B parameters enables strong few-shot performance without fine-tuning. Introduced the in-context learning paradigm. | | Scaling Laws for Neural Language Models | arXiv:2001.08361 | Discovered predictable power-law relationships between model size, data, compute, and loss. Showed optimal training uses large models on modest data with early stopping. | | LLaMA: Open and Efficient Foundation Language Models | arXiv:2302.13971 | Demonstrated SOTA models can be trained using only public data. LLaMA-13B outperforms GPT-3. Made models available to researchers, catalyzing open-source LLM development. |

Benchmarks & Evaluation

| Paper | Link | Summary | |——-|——|———| | MMLU Pro | arXiv:2406.01574 | More challenging MMLU with 10 options instead of 4 and reasoning-focused questions. Accuracy drops 16-33% vs original; reduced prompt sensitivity. Better tracks AI progress. | | SWE-Bench: Evaluating LLMs on Real-World Software Issues | arXiv:2310.06770 | 2,294 real GitHub issues requiring multi-file code changes. Best model (Claude 2) solved only 1.96%, revealing gap between code generation and software engineering capabilities. | | IFEval: Instruction-Following Evaluation | arXiv:2311.07911 | Benchmark using verifiable instructions (word count, keywords) for objective evaluation. 25 instruction types across 500 prompts. Avoids biases of LLM-based evaluation. |

Code Generation

| Paper | Link | Summary | |——-|——|———| | The Stack: Code Dataset | arXiv:2211.15533 | 3.1 TB of permissively licensed code in 30 languages. Shows deduplication improves model performance. Includes opt-out mechanism for developers. | | HumanEval: Evaluating Code Generation | arXiv:2107.03374 | Introduces Codex and HumanEval benchmark for code synthesis from docstrings. Codex achieved 28.8% pass@1; repeated sampling solves 70% of problems. |

Fine-tuning

| Paper | Link | Summary | |——-|——|———| | LoRA: Low-Rank Adaptation of LLMs | arXiv:2106.09685 | Efficient fine-tuning by freezing base weights and training low-rank matrices. Reduces trainable parameters 10,000x and GPU memory 3x while matching or exceeding full fine-tuning. | | DPO: Direct Preference Optimization | arXiv:2305.18290 | Simplifies RLHF to a classification loss by deriving optimal policy in closed form. More stable and efficient training while achieving comparable or better alignment than RLHF. |


Additional Resources

Textbooks

Reading Lists

Workshops & Tutorials


–>

Last updated: January 2026 Source: latent.space 2025 reading list + ACL Anthology 2024-2025