CIS 6930 Spring 26

Data Engineering at the University of Florida

Draft Paper

Due: Monday, March 30, 2026 at 11:59 PM Points: 100 Submission: Push to cis6930sp26-project repository in paper/ directory

Overview

The draft paper is a complete first version of your research paper. All sections should be present with substantive content. This draft receives peer review feedback that you will address in the final paper.

Deliverables

Submit a complete draft paper (6-8 pages) with:

Abstract - Summary of problem, approach, and findings
Introduction - Problem statement and contributions
Related Work - Context and prior approaches
Methodology - System design and implementation details
Evaluation - Experiments, results, and analysis
Conclusion - Summary and future work
References - Properly formatted citations

File Structure

cis6930sp26-project/
├── paper/
│   ├── paper.tex (or paper.md)
│   ├── figures/
│   │   ├── architecture.pdf
│   │   └── results.pdf
│   └── references.bib
└── ...

Paper Structure

Abstract (150-200 words)

The abstract should:

State the problem
Describe your approach
Summarize key results
Highlight the contribution

1. Introduction (1-1.5 pages)

Motivate the problem with concrete examples
State the research question
List your contributions (3-4 bullet points)
Outline the paper structure

Discuss relevant prior work in categories
Position your work relative to existing approaches
Identify the gap you are addressing

3. Methodology (1.5-2 pages)

Describe your system architecture
Explain each component’s role
Include implementation details
Add architecture diagram

4. Evaluation (2-2.5 pages)

Describe experimental setup
Present results with tables and figures
Analyze what worked and what didn’t
Discuss limitations

5. Conclusion (0.5 page)

Summarize findings
Restate contributions
Suggest future work

Rubric

For the draft, reviewers evaluate completeness and direction. The final paper uses the full conference-style rubric.

Criterion	Weight	Description
Completeness	30%	Are all sections present with substantive content?
Technical Soundness	25%	Is the methodology appropriate and evaluation reasonable?
Clarity	25%	Is the writing clear and well-organized?
Progress	20%	Does the paper reflect significant project progress?

Scoring Scale

Score	Meaning
5	Excellent - Ready for final polish
4	Good - On track with minor gaps
3	Satisfactory - Needs work but salvageable
2	Below Average - Significant sections incomplete
1	Incomplete - Major revision needed

Example Paper Sections

Example Abstract

We present TransitLLM, an LLM-augmented data pipeline for integrating heterogeneous smart city transit data. Current approaches to transit data integration require extensive manual schema mapping and custom ETL code for each data source. Our system uses MCP servers to expose transit APIs and an LLM orchestrator to perform automatic schema mapping and data validation. We evaluate TransitLLM on three Gainesville data portals, comparing against hand-coded baseline pipelines. Results show that our approach achieves 94% schema mapping accuracy while reducing development effort by 60%. Our findings suggest that LLM-orchestrated pipelines offer a promising alternative for data integration tasks with moderate complexity.

Example Introduction Contributions

This paper makes the following contributions:

A system architecture for LLM-orchestrated data integration using MCP servers as a modular abstraction layer

An empirical comparison of LLM-based schema mapping against manual approaches on real smart city data

A cost-benefit analysis examining the trade-off between LLM token cost and development time savings

An open-source implementation with MCP servers for three Gainesville data portals

Example Results Table

System	Precision	Recall	F1	Tokens	Time (s)
Baseline (hand-coded)	0.96	0.94	0.95	-	0.2
TransitLLM (GPT-4)	0.94	0.92	0.93	1,240	3.8
TransitLLM (Claude)	0.93	0.91	0.92	1,180	4.1

Example Error Analysis

We examined the 14 schema mapping errors made by TransitLLM. The majority (9/14) occurred when source fields had ambiguous names. For example, the field “datetime” in the 311 API was incorrectly mapped to “created_date” instead of “resolved_date” because the LLM lacked context about the data semantics. Three errors occurred with nested JSON structures where the LLM failed to flatten arrays correctly. The remaining two errors were due to inconsistent date formats that the LLM did not detect.

Peer Review Process

Your draft will receive reviews from 2-3 classmates using the paper rubric.

What Reviewers Will Evaluate

Originality - Does the paper make a contribution?
Technical Quality - Is the methodology sound?
Clarity - Is the paper well-written?
Significance - Does the problem matter?
Reproducibility - Can results be replicated?

Using Review Feedback

When you receive reviews:

Read all reviews carefully
Identify common themes across reviewers
Prioritize major concerns over minor comments
Address every point in your final paper
If you disagree with feedback, explain why in the paper

Writing Tips

Clarity

Use active voice: “We implemented…” not “It was implemented…”
Define terms before using them
One idea per paragraph
Start paragraphs with topic sentences

Figures and Tables

Every figure/table must be referenced in the text
Captions should be self-contained
Use consistent formatting
Make figures readable at print size

Citations

Cite relevant work, not just to fill space
Discuss cited work, don’t just list it
Use consistent citation style
Include recent work (last 3-5 years)

Common Mistakes

Vague claims - “Our system performs well” → “Our system achieves 94% F1”
Missing baselines - Always compare against something
Overclaiming - Be honest about limitations
Burying the lede - State contributions early and clearly
Wall of text - Use structure, headings, and whitespace

LaTeX Template

Use a standard conference format. Example using article class:

\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\usepackage{cleveref}

\title{Your Paper Title}
\author{Your Name}
\date{}

\begin{document}
\maketitle

\begin{abstract}
Your abstract here.
\end{abstract}

\section{Introduction}
Your introduction here.

\section{Related Work}
Prior work discussion.

\section{Methodology}
System description.

\section{Evaluation}
Experiments and results.

\section{Conclusion}
Summary and future work.

\bibliographystyle{plain}
\bibliography{references}

\end{document}

Submission Checklist

Resources

Project Overview - Full project description
Code Checkpoint - Previous milestone
Final Paper - Next milestone
Paper Rubric - Detailed evaluation criteria

back