CIS 6930 Spring 26

Logo

Data Engineering at the University of Florida

Design Review

Due: Monday, March 2, 2026 at 11:59 PM Points: 100 Submission: Push to cis6930sp26-project repository in design/ directory


Overview

The design review demonstrates that you have a detailed, implementable plan. You should have moved beyond the proposal stage to a concrete architecture with preliminary evidence of feasibility. This milestone ensures you are ready to begin full implementation.

Deliverables

Submit a design document (4-6 pages) that includes:

  1. Refined Research Question - Updated based on proposal feedback
  2. System Architecture - Detailed component diagram with interfaces
  3. Data Pipeline - Complete data flow from sources to outputs
  4. Implementation Plan - Task breakdown with weekly milestones
  5. Preliminary Results - Evidence that core components work
  6. Risk Assessment - Potential blockers and mitigation strategies

File Structure

cis6930sp26-project/
├── proposal/
│   └── proposal.md
├── design/
│   └── design.md (or design.pdf)
├── src/
│   └── (preliminary code)
└── ...

Rubric

Criterion Weight Description
System Architecture 25% Is the overall system design clear and appropriate?
Data Pipeline 25% Is the data flow well-defined and appropriate?
Implementation Plan 20% Is there a realistic plan to complete the project?
Preliminary Results 20% Is there evidence of progress and feasibility?
Presentation Quality 10% Is the document clear and professional?

Scoring Scale

Score Meaning
5 Excellent - Ready for implementation
4 Good - Minor clarifications needed
3 Satisfactory - Some gaps to address
2 Needs Work - Significant redesign required
1 Incomplete - Not ready for implementation

Detailed Criteria

System Architecture (25%)

Score Description
5 Elegant architecture; components well-defined; clear interfaces
4 Good architecture; appropriate component breakdown
3 Architecture understandable but some ambiguity
2 Architecture unclear or inappropriate for the problem
1 No coherent architecture

Guiding Questions:

Data Pipeline (25%)

Score Description
5 Complete data flow; appropriate transformations; handles edge cases
4 Good data pipeline; clear transformations
3 Pipeline understandable but some steps unclear
2 Data flow unclear or problematic
1 No data pipeline defined

Guiding Questions:

Implementation Plan (20%)

Score Description
5 Detailed plan; realistic milestones; risk mitigation
4 Good plan with clear milestones
3 Plan present but lacks detail or realism
2 Vague plan or unrealistic timeline
1 No implementation plan

Guiding Questions:

Preliminary Results (20%)

Score Description
5 Substantial progress; demonstrates feasibility; interesting findings
4 Good progress; basic functionality working
3 Some progress; proof of concept exists
2 Minimal progress; feasibility uncertain
1 No evidence of implementation work

Guiding Questions:

Presentation Quality (10%)

Score Description
5 Excellent diagrams; clear explanations; professional quality
4 Good visuals; well-organized
3 Adequate presentation; some improvements possible
2 Poor organization or confusing presentation
1 Unprofessional or incomprehensible

Example Design Documents

Example 1: Smart City Data Pipeline

System Architecture:

┌──────────────────────────────────────────────────────────────────┐
│                        LLM Orchestrator                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Planner   │  │  Executor   │  │  Validator  │              │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘              │
└─────────┼────────────────┼────────────────┼─────────────────────┘
          │                │                │
    ┌─────▼────────────────▼────────────────▼─────┐
    │              MCP Server Layer                │
    │  ┌─────────┐  ┌─────────┐  ┌─────────┐      │
    │  │ Transit │  │Utilities│  │   311   │      │
    │  │ Server  │  │ Server  │  │ Server  │      │
    │  └────┬────┘  └────┬────┘  └────┬────┘      │
    └───────┼────────────┼────────────┼───────────┘
            │            │            │
    ┌───────▼────┐ ┌─────▼─────┐ ┌────▼─────┐
    │ Transit API│ │Utilities  │ │ 311 API  │
    │ (Socrata)  │ │  API      │ │(Socrata) │
    └────────────┘ └───────────┘ └──────────┘

MCP Server Interface Example:

# Transit MCP Server Tools
@mcp.tool()
def get_bus_routes() -> list[dict]:
    """Fetch all active bus routes with stops and schedules."""

@mcp.tool()
def get_route_ridership(route_id: str, date_range: str) -> dict:
    """Get ridership statistics for a specific route."""

@mcp.tool()
def validate_transit_record(record: dict) -> ValidationResult:
    """Validate a transit record against schema."""

Preliminary Results:


Example 2: Entity Resolution Comparison

Data Pipeline:

┌─────────────┐     ┌─────────────┐
│  Abt-Buy    │     │  Ground     │
│  Dataset    │     │  Truth      │
└──────┬──────┘     └──────┬──────┘
       │                   │
       ▼                   ▼
┌──────────────────────────────────┐
│         Preprocessing            │
│  - Tokenization                  │
│  - Blocking (shared tokens)      │
└──────────────┬───────────────────┘
               │
       ┌───────┴───────┐
       ▼               ▼
┌──────────────┐ ┌──────────────┐
│   Magellan   │ │   LLM-ER     │
│   Pipeline   │ │   Pipeline   │
│              │ │              │
│ - Features   │ │ - Prompts    │
│ - RF Model   │ │ - GPT-4      │
│ - Threshold  │ │ - Parsing    │
└──────┬───────┘ └──────┬───────┘
       │                │
       ▼                ▼
┌──────────────────────────────────┐
│           Evaluation             │
│  - Precision, Recall, F1         │
│  - Token cost per pair           │
│  - Runtime comparison            │
└──────────────────────────────────┘

Preliminary Results:


Example 3: Self-Healing Pipeline

Architecture Layers:

Layer Components Responsibility
Monitoring Log collector, metric aggregator Continuous observation
Diagnosis LLM agent, pattern matcher Failure classification
Remediation Fix suggester, auto-recovery Resolution execution
Learning Incident database, feedback loop Improvement over time

Failure Injection Framework:

Failure Type Injection Method Expected Detection
Schema drift Modify source schema Schema validator alert
API rate limit Reduce quota HTTP 429 pattern
Data quality Inject nulls/outliers Quality check failure
Timeout Add network delay Timeout exception

Preliminary Results:


What Changed from Proposal

Your design document should show evolution from the proposal:

Proposal Design
High-level architecture Detailed component diagram with interfaces
Planned dataset Data accessed and explored
Evaluation plan Preliminary baseline results
Timeline Detailed task breakdown
“I will build…” “I have built…” (for core components)

Submission Checklist


Resources


back