CIS 6930 Spring 26

Logo

Data Engineering at the University of Florida

Draft Paper

Due: Monday, March 30, 2026 at 11:59 PM Points: 100 Submission: Push to cis6930sp26-project repository in paper/ directory


Overview

The draft paper is a complete first version of your research paper. All sections should be present with substantive content. This draft receives peer review feedback that you will address in the final paper.

Deliverables

Submit a complete draft paper (6-8 pages) with:

  1. Abstract - Summary of problem, approach, and findings
  2. Introduction - Problem statement and contributions
  3. Related Work - Context and prior approaches
  4. Methodology - System design and implementation details
  5. Evaluation - Experiments, results, and analysis
  6. Conclusion - Summary and future work
  7. References - Properly formatted citations

File Structure

cis6930sp26-project/
├── paper/
│   ├── paper.tex (or paper.md)
│   ├── figures/
│   │   ├── architecture.pdf
│   │   └── results.pdf
│   └── references.bib
└── ...

Paper Structure

Abstract (150-200 words)

The abstract should:

1. Introduction (1-1.5 pages)

3. Methodology (1.5-2 pages)

4. Evaluation (2-2.5 pages)

5. Conclusion (0.5 page)


Rubric

For the draft, reviewers evaluate completeness and direction. The final paper uses the full conference-style rubric.

Criterion Weight Description
Completeness 30% Are all sections present with substantive content?
Technical Soundness 25% Is the methodology appropriate and evaluation reasonable?
Clarity 25% Is the writing clear and well-organized?
Progress 20% Does the paper reflect significant project progress?

Scoring Scale

Score Meaning
5 Excellent - Ready for final polish
4 Good - On track with minor gaps
3 Satisfactory - Needs work but salvageable
2 Below Average - Significant sections incomplete
1 Incomplete - Major revision needed

Example Paper Sections

Example Abstract

We present TransitLLM, an LLM-augmented data pipeline for integrating heterogeneous smart city transit data. Current approaches to transit data integration require extensive manual schema mapping and custom ETL code for each data source. Our system uses MCP servers to expose transit APIs and an LLM orchestrator to perform automatic schema mapping and data validation. We evaluate TransitLLM on three Gainesville data portals, comparing against hand-coded baseline pipelines. Results show that our approach achieves 94% schema mapping accuracy while reducing development effort by 60%. Our findings suggest that LLM-orchestrated pipelines offer a promising alternative for data integration tasks with moderate complexity.

Example Introduction Contributions

This paper makes the following contributions:

  • A system architecture for LLM-orchestrated data integration using MCP servers as a modular abstraction layer
  • An empirical comparison of LLM-based schema mapping against manual approaches on real smart city data
  • A cost-benefit analysis examining the trade-off between LLM token cost and development time savings
  • An open-source implementation with MCP servers for three Gainesville data portals

Example Results Table

System Precision Recall F1 Tokens Time (s)
Baseline (hand-coded) 0.96 0.94 0.95 - 0.2
TransitLLM (GPT-4) 0.94 0.92 0.93 1,240 3.8
TransitLLM (Claude) 0.93 0.91 0.92 1,180 4.1

Example Error Analysis

We examined the 14 schema mapping errors made by TransitLLM. The majority (9/14) occurred when source fields had ambiguous names. For example, the field “datetime” in the 311 API was incorrectly mapped to “created_date” instead of “resolved_date” because the LLM lacked context about the data semantics. Three errors occurred with nested JSON structures where the LLM failed to flatten arrays correctly. The remaining two errors were due to inconsistent date formats that the LLM did not detect.


Peer Review Process

Your draft will receive reviews from 2-3 classmates using the paper rubric.

What Reviewers Will Evaluate

  1. Originality - Does the paper make a contribution?
  2. Technical Quality - Is the methodology sound?
  3. Clarity - Is the paper well-written?
  4. Significance - Does the problem matter?
  5. Reproducibility - Can results be replicated?

Using Review Feedback

When you receive reviews:

  1. Read all reviews carefully
  2. Identify common themes across reviewers
  3. Prioritize major concerns over minor comments
  4. Address every point in your final paper
  5. If you disagree with feedback, explain why in the paper

Writing Tips

Clarity

Figures and Tables

Citations

Common Mistakes

  1. Vague claims - “Our system performs well” → “Our system achieves 94% F1”
  2. Missing baselines - Always compare against something
  3. Overclaiming - Be honest about limitations
  4. Burying the lede - State contributions early and clearly
  5. Wall of text - Use structure, headings, and whitespace

LaTeX Template

Use a standard conference format. Example using article class:

\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\usepackage{cleveref}

\title{Your Paper Title}
\author{Your Name}
\date{}

\begin{document}
\maketitle

\begin{abstract}
Your abstract here.
\end{abstract}

\section{Introduction}
Your introduction here.

\section{Related Work}
Prior work discussion.

\section{Methodology}
System description.

\section{Evaluation}
Experiments and results.

\section{Conclusion}
Summary and future work.

\bibliographystyle{plain}
\bibliography{references}

\end{document}

Submission Checklist


Resources


back