Data Engineering at the University of Florida
| Assignment | Due Date | Points | Description |
|---|---|---|---|
| HiPerGator Training | Jan 23, 8:30 AM | 5 | Complete UF RC training |
| Assignment 0 | Jan 28, 11:59 PM | 50 | Python, GitHub, LLM basics |
| Assignment 1.5 | Feb 11, In-Class | 12 | MCP on HiPerGator (in-class) |
| Assignment 1 | Feb 18, 11:59 PM | 100 | MCP data pipeline |
| Quiz: ML Fundamentals | Feb 18, 8:30 AM | 10 | sklearn API, metrics |
| Assignment 2 | Mar 10, 11:59 PM | 60 | RAG system implementation |
Repository: cis6930sp26-project (individual)
| Milestone | Due Date | Points | Description |
|---|---|---|---|
| Project Proposal | Feb 25, 11:59 PM | 100 | Research question + initial design |
| Design Review | Mar 2, 11:59 PM | 100 | Detailed architecture |
| Code Checkpoint | Mar 23, 11:59 PM | 150 | Working prototype |
| Draft Paper | Mar 30, 11:59 PM | 100 | Complete draft |
| Final Paper | Apr 13, 11:59 PM | 400 | Full evaluation |
| Presentation | Week of Apr 20 | 150 | 10-min presentation |
More assignments will be released as the semester progresses.
| Day | Topic | Activity |
|---|---|---|
| Mon | Course Introduction | Syllabus, expectations |
| Wed | Git/GitHub Crash Course | Live demo |
| Fri | Topics Overview | Data engineering + LLMs |
Assigned: Assignment 0
Monday Jan 19 - MLK Day, no class
| Day | Topic | Activity |
|---|---|---|
| Wed | MCP Fundamentals | Architecture, primitives |
| Fri | HiPerGator Setup | Matt Gitzendanner guest talk |
Due: Assignment 0 (Jan 28), HiPerGator Training (Jan 23)
Assigned: Assignment 0.5 (Skipped)
Readings: MCP Specification, MCP Quickstart Guide (see lecture readings)
| Day | Topic | Activity |
|---|---|---|
| Mon | Prompting Fundamentals | Zero-shot, few-shot, prompt structure |
| Wed | Chain-of-Thought | CoT theory, Zero-shot CoT, Tree/Graph of Thoughts |
| Fri | Reading Papers + Structured Outputs | 3-pass method, JSON mode |
Due: Assignment 0 (Jan 28) Assigned: Assignment 1, Assignment 2
Readings: Chain-of-Thought Prompting, How to Read a Paper (see lecture readings)
Tools: Navigator CLI - Command-line tool for querying NavigatorAI
| Day | Topic | Activity |
|---|---|---|
| Mon | Prompt Engineering Lab | Structured outputs |
| Wed | Data Integration | Schema mapping |
| Fri | Entity Resolution | Traditional methods |
Assigned: Assignment 1 (due Feb 18)
Readings: LinkTransformer (Arora & Dell, ACL 2024) (see lecture readings)
| Day | Topic | Activity |
|---|---|---|
| Mon | Guest Speaker: Dr. Shiree Hughes | Big Data at General Motors |
| Wed | Assignment 1.5: MCP on HiPerGator | In-class hands-on activity |
| Fri | Guest Speaker: Mikhail Sinanan | Data Engineering at Spotify |
Due: Assignment 1.5 (Feb 11, In-Class)
Readings: Apache Kafka, Spark, Hadoop introductions; Spotify Data Platform (see lecture readings)
Lecture: Slides from Dr. Hughes are available here.
| Day | Topic | Activity |
|---|---|---|
| Mon | ML Fundamentals | Classification, regression, clustering; sklearn API |
| Wed | Evaluation Metrics | ROC/AUC, cross-validation, hyperparameter tuning |
| Fri | Project Proposal Workshop | Example proposals, research question refinement |
Due: Assignment 1 (Feb 18), Quiz: ML Fundamentals (Feb 18, 8:30 AM), Peer Reviews (Feb 21) Assigned: Project Proposal (due Feb 25)
Friday Workshop:
Topics Covered:
| Day | Topic | Activity |
|---|---|---|
| Sun | RAG Architecture | Components, retrieval methods, augmentation strategies |
| Wed | Vector Databases | Chroma, FAISS hands-on demo |
| Fri | Chunking Strategies | Design Review discussion |
Due: Project Proposal (Feb 25), Design Review (Mar 2) Assigned: Assignment 2 (RAG System, 60 pts, due Mar 10)
Readings: Lewis et al. (2020) RAG paper, COLING 2025 RAG Best Practices (see lecture readings)
Supplementary: Embeddings Primer - Background on text embeddings for students who need a refresher
More weeks will be revealed as the semester progresses.