Data Engineering at the University of Florida
Due: Monday, March 2, 2026 at 11:59 PM
Points: 100
Submission: Push to cis6930sp26-project repository in design/ directory
The design review demonstrates that you have a detailed, implementable plan. You should have moved beyond the proposal stage to a concrete architecture with preliminary evidence of feasibility. This milestone ensures you are ready to begin full implementation.
Submit a design document (4-6 pages) that includes:
cis6930sp26-project/
├── proposal/
│ └── proposal.md
├── design/
│ └── design.md (or design.pdf)
├── src/
│ └── (preliminary code)
└── ...
| Criterion | Weight | Description |
|---|---|---|
| System Architecture | 25% | Is the overall system design clear and appropriate? |
| Data Pipeline | 25% | Is the data flow well-defined and appropriate? |
| Implementation Plan | 20% | Is there a realistic plan to complete the project? |
| Preliminary Results | 20% | Is there evidence of progress and feasibility? |
| Presentation Quality | 10% | Is the document clear and professional? |
| Score | Meaning |
|---|---|
| 5 | Excellent - Ready for implementation |
| 4 | Good - Minor clarifications needed |
| 3 | Satisfactory - Some gaps to address |
| 2 | Needs Work - Significant redesign required |
| 1 | Incomplete - Not ready for implementation |
| Score | Description |
|---|---|
| 5 | Elegant architecture; components well-defined; clear interfaces |
| 4 | Good architecture; appropriate component breakdown |
| 3 | Architecture understandable but some ambiguity |
| 2 | Architecture unclear or inappropriate for the problem |
| 1 | No coherent architecture |
Guiding Questions:
| Score | Description |
|---|---|
| 5 | Complete data flow; appropriate transformations; handles edge cases |
| 4 | Good data pipeline; clear transformations |
| 3 | Pipeline understandable but some steps unclear |
| 2 | Data flow unclear or problematic |
| 1 | No data pipeline defined |
Guiding Questions:
| Score | Description |
|---|---|
| 5 | Detailed plan; realistic milestones; risk mitigation |
| 4 | Good plan with clear milestones |
| 3 | Plan present but lacks detail or realism |
| 2 | Vague plan or unrealistic timeline |
| 1 | No implementation plan |
Guiding Questions:
| Score | Description |
|---|---|
| 5 | Substantial progress; demonstrates feasibility; interesting findings |
| 4 | Good progress; basic functionality working |
| 3 | Some progress; proof of concept exists |
| 2 | Minimal progress; feasibility uncertain |
| 1 | No evidence of implementation work |
Guiding Questions:
| Score | Description |
|---|---|
| 5 | Excellent diagrams; clear explanations; professional quality |
| 4 | Good visuals; well-organized |
| 3 | Adequate presentation; some improvements possible |
| 2 | Poor organization or confusing presentation |
| 1 | Unprofessional or incomprehensible |
System Architecture:
┌──────────────────────────────────────────────────────────────────┐
│ LLM Orchestrator │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Planner │ │ Executor │ │ Validator │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
└─────────┼────────────────┼────────────────┼─────────────────────┘
│ │ │
┌─────▼────────────────▼────────────────▼─────┐
│ MCP Server Layer │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Transit │ │Utilities│ │ 311 │ │
│ │ Server │ │ Server │ │ Server │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
└───────┼────────────┼────────────┼───────────┘
│ │ │
┌───────▼────┐ ┌─────▼─────┐ ┌────▼─────┐
│ Transit API│ │Utilities │ │ 311 API │
│ (Socrata) │ │ API │ │(Socrata) │
└────────────┘ └───────────┘ └──────────┘
MCP Server Interface Example:
# Transit MCP Server Tools
@mcp.tool()
def get_bus_routes() -> list[dict]:
"""Fetch all active bus routes with stops and schedules."""
@mcp.tool()
def get_route_ridership(route_id: str, date_range: str) -> dict:
"""Get ridership statistics for a specific route."""
@mcp.tool()
def validate_transit_record(record: dict) -> ValidationResult:
"""Validate a transit record against schema."""
Preliminary Results:
Data Pipeline:
┌─────────────┐ ┌─────────────┐
│ Abt-Buy │ │ Ground │
│ Dataset │ │ Truth │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌──────────────────────────────────┐
│ Preprocessing │
│ - Tokenization │
│ - Blocking (shared tokens) │
└──────────────┬───────────────────┘
│
┌───────┴───────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Magellan │ │ LLM-ER │
│ Pipeline │ │ Pipeline │
│ │ │ │
│ - Features │ │ - Prompts │
│ - RF Model │ │ - GPT-4 │
│ - Threshold │ │ - Parsing │
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
┌──────────────────────────────────┐
│ Evaluation │
│ - Precision, Recall, F1 │
│ - Token cost per pair │
│ - Runtime comparison │
└──────────────────────────────────┘
Preliminary Results:
Architecture Layers:
| Layer | Components | Responsibility |
|---|---|---|
| Monitoring | Log collector, metric aggregator | Continuous observation |
| Diagnosis | LLM agent, pattern matcher | Failure classification |
| Remediation | Fix suggester, auto-recovery | Resolution execution |
| Learning | Incident database, feedback loop | Improvement over time |
Failure Injection Framework:
| Failure Type | Injection Method | Expected Detection |
|---|---|---|
| Schema drift | Modify source schema | Schema validator alert |
| API rate limit | Reduce quota | HTTP 429 pattern |
| Data quality | Inject nulls/outliers | Quality check failure |
| Timeout | Add network delay | Timeout exception |
Preliminary Results:
Your design document should show evolution from the proposal:
| Proposal | Design |
|---|---|
| High-level architecture | Detailed component diagram with interfaces |
| Planned dataset | Data accessed and explored |
| Evaluation plan | Preliminary baseline results |
| Timeline | Detailed task breakdown |
| “I will build…” | “I have built…” (for core components) |
design/ directory