Data Engineering at the University of Florida
Due: Monday, April 13, 2026 at 11:59 PM
Points: 400
Submission: Push to cis6930sp26-project repository in paper/ directory with tag final
The final paper is the primary deliverable for your course project. It should be a polished research paper that presents your work with rigorous evaluation. This paper represents 40% of your total project grade.
Submit a complete, polished paper (8-10 pages) that:
cis6930sp26-project/
├── paper/
│ ├── paper.pdf # Compiled paper
│ ├── paper.tex # Source (or paper.md)
│ ├── figures/
│ └── references.bib
└── ...
git tag -a final -m "Final paper submission"
git push origin final
The final paper is evaluated using conference-style peer review criteria.
| Criterion | Weight | Points | Description |
|---|---|---|---|
| Originality | 20% | 80 | Does the paper make a novel contribution? |
| Technical Quality | 25% | 100 | Is the methodology sound and evaluation rigorous? |
| Clarity | 20% | 80 | Is the paper well-written and easy to understand? |
| Significance | 20% | 80 | Does this work address an important problem? |
| Reproducibility | 15% | 60 | Can the results be reproduced by others? |
| Total | 100% | 400 |
| Score | Meaning | Conference Equivalent |
|---|---|---|
| 5 | Excellent | Strong Accept |
| 4 | Good | Accept |
| 3 | Satisfactory | Weak Accept / Borderline |
| 2 | Below Average | Weak Reject |
| 1 | Poor | Reject |
| Score | Description |
|---|---|
| 5 | Highly original; significant new insights or methods |
| 4 | Good novelty; clear contribution beyond prior work |
| 3 | Some novelty; incremental contribution |
| 2 | Limited novelty; mostly replicates existing work |
| 1 | No apparent novelty |
What Makes a Strong Contribution:
| Score | Description |
|---|---|
| 5 | Rigorous methodology; comprehensive evaluation; solid results |
| 4 | Sound methodology; good evaluation |
| 3 | Reasonable approach; evaluation has some gaps |
| 2 | Methodology has flaws; evaluation insufficient |
| 1 | Fundamentally flawed approach |
What Makes Strong Technical Quality:
| Score | Description |
|---|---|
| 5 | Exceptionally clear; well-organized; engaging |
| 4 | Clear writing; good organization |
| 3 | Understandable but could be clearer |
| 2 | Difficult to follow; organizational issues |
| 1 | Incomprehensible |
What Makes Strong Clarity:
| Score | Description |
|---|---|
| 5 | Addresses critical problem; high potential impact |
| 4 | Important problem; good potential impact |
| 3 | Moderately important; some practical value |
| 2 | Limited significance; narrow scope |
| 1 | Trivial problem or no clear value |
What Makes Strong Significance:
| Score | Description |
|---|---|
| 5 | Fully reproducible; code/data available; detailed methods |
| 4 | Mostly reproducible; minor details missing |
| 3 | Partially reproducible; some gaps |
| 2 | Difficult to reproduce; key details missing |
| 1 | Not reproducible |
What Makes Strong Reproducibility:
Your final paper should address the feedback from peer reviews of your draft.
Review Comment: “The evaluation only uses one dataset. How do we know the results generalize?”
Response in Paper:
To evaluate generalization, we conduct additional experiments on two supplementary datasets: the NYC 311 complaint data and the Chicago building permit data. Results (Table 3) show consistent performance across all three datasets, with F1 scores ranging from 0.91 to 0.94.
Your abstract should be a standalone summary that:
A strong introduction:
Organize related work into categories:
Include:
Present:
A strong conclusion:
| System | Dataset | P | R | F1 | Tokens | Cost | Time |
|---|---|---|---|---|---|---|---|
| Baseline | Transit | 0.96 | 0.94 | 0.95 | - | $0.00 | 0.2s |
| Baseline | Utilities | 0.94 | 0.91 | 0.93 | - | $0.00 | 0.3s |
| Baseline | 311 | 0.93 | 0.90 | 0.91 | - | $0.00 | 0.2s |
| TransitLLM | Transit | 0.94 | 0.92 | 0.93 | 1,240 | $0.02 | 3.8s |
| TransitLLM | Utilities | 0.92 | 0.90 | 0.91 | 1,180 | $0.02 | 4.1s |
| TransitLLM | 311 | 0.91 | 0.89 | 0.90 | 1,320 | $0.02 | 4.5s |
| Configuration | F1 | Description |
|---|---|---|
| Full system | 0.93 | All components enabled |
| No validation | 0.88 | Remove data quality checks |
| No schema hints | 0.85 | Remove schema metadata from prompts |
| Rule-based mapping | 0.79 | Replace LLM with string matching |
| Error Type | Count | % | Example |
|---|---|---|---|
| Ambiguous fields | 9 | 64% | “datetime” mapped to wrong timestamp |
| Nested structures | 3 | 21% | Array not flattened correctly |
| Format mismatch | 2 | 14% | Date format not detected |
final