Data Engineering at the University of Florida
Due: Monday, February 2, 2026 at 11:59 PM Late Deadline: Wednesday, February 4, 2026 at 11:59 PM Points: 50 (part of Infrastructure Assignments - 5% of course grade) Submission: GitHub repository + Canvas link Peer Review: Due Friday, February 6, 2026 at 11:59 PM Meta Review: Due Saturday, February 7, 2026 at 11:59 PM
This assignment introduces the foundational tools you will use throughout the semester: Python packaging with uv, GitHub workflows, testing with pytest, and LLM APIs. You will build a command-line tool that fetches data from a public API and uses NavigatorAI to process and summarize that data.
This is a warm-up assignment designed to ensure everyone is comfortable with the development environment before we tackle more complex MCP-based pipelines in Assignment 1.
By completing this assignment, you will:
pyproject.toml and uvBuild a Python command-line tool that:
# Fetch weather data and summarize it
uv run python -m assignment0 --source weather --location "Gainesville, FL"
# Fetch news headlines and analyze sentiment
uv run python -m assignment0 --source news --query "artificial intelligence"
# Fetch earthquake data and summarize recent activity
uv run python -m assignment0 --source usgs --days 7
| API | Description | Auth Required | Documentation |
|---|---|---|---|
| USGS Earthquakes | Recent earthquake data worldwide | No | earthquake.usgs.gov |
| Open-Meteo | Weather forecasts and historical data | No | open-meteo.com |
| NASA APOD | Astronomy Picture of the Day | Free API key | api.nasa.gov |
| News API | Headlines from various sources | Free API key | newsapi.org |
| OpenLibrary | Book information | No | openlibrary.org |
| PokeAPI | Pokemon data (for fun!) | No | pokeapi.co |
You may propose a different public API. The API must be freely accessible (free tier acceptable) and return structured data (JSON preferred).
Implementation:
requests libraryExample:
import os
import requests
def fetch_earthquake_data(days: int = 7) -> dict:
"""Fetch earthquake data from USGS API."""
url = "https://earthquake.usgs.gov/fdsnws/event/1/query"
params = {
"format": "geojson",
"starttime": calculate_start_date(days),
"minmagnitude": 4.0
}
response = requests.get(url, params=params, timeout=30)
response.raise_for_status()
return response.json()
Use NavigatorAI to process and summarize the API data.
Setup:
Implementation:
Example prompt strategies:
Use argparse to handle command-line arguments:
--source: The data source/API to use--help: Usage instructionsWrite tests in the tests/ directory. See the Testing Guide section below for details on proper test structure.
Required tests:
Configure GitHub Actions to run tests on every push.
Create .github/workflows/pytest.yml:
name: PyTest
on: push
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 10
env:
NAVIGATOR_API_KEY: $
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: astral-sh/setup-uv@v5
- run: uv sync
- run: uv run pytest -v
See the Environment Variables and Secrets section for how to add secrets to your repository.
This course uses uv for Python package management. Do not use pip directly.
When you need a new package, use uv add:
# Add a runtime dependency
uv add requests
# Add a development dependency (testing, linting)
uv add --dev pytest
# Add multiple packages
uv add requests python-dotenv
The uv add command:
pyproject.toml under [project.dependencies]uv.lock with the exact resolved versionsWhen cloning a repository or after pulling changes:
# Install all dependencies from pyproject.toml
uv sync
Always use uv run to execute Python:
# Run a script
uv run python script.py
# Run as a module (preferred for packages)
uv run python -m assignment0 --help
# Run pytest
uv run pytest -v
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "assignment0"
version = "1.0.0"
description = "CIS 6930 Assignment 0 - API Data Collection with LLM"
authors = [{name = "Your Name", email = "your.email@ufl.edu"}]
requires-python = ">=3.11"
dependencies = [
"requests>=2.31",
"python-dotenv>=1.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0",
]
[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["."]
API keys and other sensitive credentials must never be committed to git. This section explains how to manage secrets locally and in GitHub Actions.
Use a .env file to store your API keys locally:
# .env (DO NOT COMMIT THIS FILE)
NAVIGATOR_API_KEY=your-api-key-here
NEWS_API_KEY=your-news-api-key-here
Create a .env.example file to document required variables (commit this file):
# .env.example (safe to commit - no real values)
NAVIGATOR_API_KEY=your-navigator-api-key
NEWS_API_KEY=your-news-api-key-if-using-news-api
Use python-dotenv to load your .env file:
# assignment0/config.py
import os
from dotenv import load_dotenv
# Load .env file if it exists
load_dotenv()
# Access environment variables
NAVIGATOR_API_KEY = os.getenv("NAVIGATOR_API_KEY")
if not NAVIGATOR_API_KEY:
raise ValueError("NAVIGATOR_API_KEY environment variable is required")
Add python-dotenv to your project:
uv add python-dotenv
Your .gitignore must include .env to prevent accidental commits:
# .gitignore
# Environment variables - NEVER COMMIT
.env
.env.local
.env.*.local
# Python
__pycache__/
*.py[cod]
.venv/
*.egg-info/
# IDE
.vscode/
.idea/
For tests that require API keys, use GitHub repository secrets.
Adding a secret to your repository:
NAVIGATOR_API_KEYUsing secrets in GitHub Actions:
Update your workflow to pass secrets as environment variables:
# .github/workflows/pytest.yml
name: PyTest
on: push
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 10
env:
NAVIGATOR_API_KEY: $
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: astral-sh/setup-uv@v5
- run: uv sync
- run: uv run pytest -v
Best practice for tests:
Most tests should mock API calls rather than use real credentials. Only use real API keys in CI/CD when you need integration tests:
# tests/test_integration.py
import os
import pytest
# Skip if no API key available
pytestmark = pytest.mark.skipif(
not os.getenv("NAVIGATOR_API_KEY"),
reason="NAVIGATOR_API_KEY not set"
)
def test_real_api_call():
"""Integration test using real API."""
# This test only runs if NAVIGATOR_API_KEY is set
...
| File | Commit? | Contains |
|---|---|---|
.env |
NO | Real API keys |
.env.example |
Yes | Variable names only |
.gitignore |
Yes | Patterns to ignore |
| Code files | Yes | os.getenv() calls |
Tests must be runnable from the project root without modifying sys.path:
cis6930sp26-assignment0/
├── assignment0/ # Your package directory
│ ├── __main__.py # Entry point for python -m assignment0
│ ├── api.py # API fetching functions
│ ├── llm.py # LLM processing functions
│ └── cli.py # Command-line interface
├── tests/
│ ├── test_api.py
│ ├── test_llm.py
│ └── test_cli.py
├── pyproject.toml
└── ...
Note: No __init__.py files are needed in Python 3.3+ due to namespace packages (PEP 420).
Tests import from your package directly:
# tests/test_api.py
from assignment0.api import parse_earthquake_data
def test_parse_earthquake_data():
sample_data = {
"features": [
{"properties": {"mag": 5.2, "place": "California"}}
]
}
result = parse_earthquake_data(sample_data)
assert len(result) == 1
assert result[0]["magnitude"] == 5.2
Always run pytest from the project root:
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run a specific test file
uv run pytest tests/test_api.py
# Run tests matching a pattern
uv run pytest -k "earthquake"
The pyproject.toml configuration ensures pytest can find your code:
[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["."]
Never do this:
# BAD - Do not modify sys.path in your code
import sys
sys.path.insert(0, os.path.dirname(__file__))
Do this instead:
# GOOD - Use standard imports
from assignment0.api import fetch_data
Use unittest.mock to avoid calling real APIs in tests. This is essential because:
MagicMockMagicMock creates a mock object that accepts any attribute or method call:
from unittest.mock import MagicMock
# Create a mock that simulates a requests.Response object
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {"data": "test"}
# The mock accepts any method call
print(mock_response.json()) # Returns {"data": "test"}
print(mock_response.anything()) # Returns another MagicMock
patch as a Context Managerpatch temporarily replaces a function or object during a test:
# tests/test_api.py
from unittest.mock import patch, MagicMock
from assignment0.api import fetch_earthquake_data
def test_fetch_earthquake_data():
# Create mock response
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"features": [{"properties": {"mag": 5.0, "place": "Test"}}]
}
# Patch requests.get in the module where it's used
with patch("assignment0.api.requests.get", return_value=mock_response) as mock_get:
result = fetch_earthquake_data(days=7)
# Verify the function was called
mock_get.assert_called_once()
# Verify the result
assert len(result["features"]) == 1
patch as a Decoratorfrom unittest.mock import patch, MagicMock
from assignment0.llm import summarize_with_llm
@patch("assignment0.llm.requests.post")
def test_summarize_with_llm(mock_post):
# Configure the mock
mock_response = MagicMock()
mock_response.json.return_value = {
"choices": [{"message": {"content": "This is a summary"}}]
}
mock_post.return_value = mock_response
# Call the function
result = summarize_with_llm("Some data to summarize")
# Assertions
assert result == "This is a summary"
mock_post.assert_called_once()
from unittest.mock import patch, MagicMock
from assignment0.cli import main
@patch("assignment0.cli.fetch_data")
@patch("assignment0.cli.summarize_with_llm")
def test_main_flow(mock_summarize, mock_fetch):
# Note: decorators are applied bottom-up, so mock_fetch is the inner patch
mock_fetch.return_value = {"data": "test"}
mock_summarize.return_value = "Summary of test data"
result = main(["--source", "test"])
mock_fetch.assert_called_once()
mock_summarize.assert_called_once_with({"data": "test"})
# Was the mock called?
mock.assert_called()
# Was it called exactly once?
mock.assert_called_once()
# Was it called with specific arguments?
mock.assert_called_with(arg1, arg2, kwarg=value)
# Was it called once with specific arguments?
mock.assert_called_once_with(arg1, arg2)
# How many times was it called?
assert mock.call_count == 3
cis6930sp26-assignment0/
├── .github/
│ └── workflows/
│ └── pytest.yml
├── assignment0/
│ ├── __main__.py # Entry point for `python -m assignment0`
│ ├── api.py # API data fetching
│ ├── llm.py # LLM processing
│ └── cli.py # Argument parsing
├── tests/
│ ├── test_api.py
│ ├── test_llm.py
│ └── test_cli.py
├── .env.example # Example environment variables (no secrets!)
├── .gitignore
├── COLLABORATORS.md
├── LICENSE
├── README.md
└── pyproject.toml
Note: No __init__.py files are required. Python 3.3+ supports namespace packages (PEP 420).
__main__.py FileThe __main__.py file makes your package runnable as a module using python -m:
# Instead of: python assignment0/some_script.py
# You can run: python -m assignment0
uv run python -m assignment0 --help
How it works:
python -m assignment0, Python looks for assignment0/__main__.pyif __name__ == "__main__": guard ensures code only runs when executed directlyExample __main__.py:
# assignment0/__main__.py
from assignment0.cli import main
if __name__ == "__main__":
main()
Why use -m instead of running a script directly?
sys.path automaticallycis6930sp26-assignment0cegme git tag v1.0
git push origin v1.0
Due date: Monday, February 2, 2026 at 11:59 PM Late deadline: Wednesday, February 4, 2026 at 11:59 PM
You may submit after the due date until grading begins (typically 1-3 days after due date). The exact grading start time will not be announced. No submissions accepted after grading begins.
I strongly encourage submitting by the due date to avoid:
You will be assigned 2 classmates’ repositories to review. Completing your reviews is worth 5 points of your assignment grade.
Review assignments will be posted on Canvas by Thursday, February 5.
git clone <assigned-repo-url>
cd <repo-name>
uv sync
cp .env.example .env
# Add your NavigatorAI API key to .env
uv run python -m assignment0 --help
uv run pytest -v
Check GitHub Actions: Look for the green checkmark (or red X) on the repository’s Actions tab.
Score using the rubric: Use the Assignment 0 Peer Review Rubric to evaluate each criterion.
For each repository you review:
uv sync installed dependencies without errorsuv run pytest -v.env filemain.py:45-60 into a separate function”Submit your peer reviews through Canvas by Friday, February 6 at 11:59 PM.
After peer reviews are submitted, you will evaluate the quality of reviews you received. This helps ensure reviewers provide thoughtful, constructive feedback.
For each review you received, rate:
Submit your meta reviews through Canvas by Saturday, February 7 at 11:59 PM.
See the full Peer Review Rubric for detailed scoring criteria and examples.
| Component | Points | Graded By |
|---|---|---|
| API Data Collection | 15 | Peers |
| LLM Processing | 15 | Peers |
| Command-Line Interface | 10 | Peers |
| Testing | 5 | Peers |
| CI/CD & Project Structure | 5 | Peers |
| Implementation Subtotal | 45 | |
| Completing Peer Reviews | 5 | Instructor |
| Total | 50 |
Your final grade is the median of your peer review scores (for the 45-point implementation portion), plus points for completing your assigned reviews.
This assignment prepares you for the MCP Data Pipeline assignment where you will:
The skills you learn here (API calls, LLM integration, testing, packaging) will be essential.
This is an individual assignment. You may discuss concepts with classmates, but all code must be your own. Document any external resources or AI assistance in COLLABORATORS.md.