CIS 6930 Spring 26

Logo

Data Engineering at the University of Florida

Assignment 0: API Data Collection with LLM Processing

Due: Monday, February 2, 2026 at 11:59 PM Late Deadline: Wednesday, February 4, 2026 at 11:59 PM Points: 50 (part of Infrastructure Assignments - 5% of course grade) Submission: GitHub repository + Canvas link Peer Review: Due Friday, February 6, 2026 at 11:59 PM Meta Review: Due Saturday, February 7, 2026 at 11:59 PM


Overview

This assignment introduces the foundational tools you will use throughout the semester: Python packaging with uv, GitHub workflows, testing with pytest, and LLM APIs. You will build a command-line tool that fetches data from a public API and uses NavigatorAI to process and summarize that data.

This is a warm-up assignment designed to ensure everyone is comfortable with the development environment before we tackle more complex MCP-based pipelines in Assignment 1.


Learning Objectives

By completing this assignment, you will:

  1. Set up a Python project with pyproject.toml and uv
  2. Fetch and parse data from a public API
  3. Use LLMs to process and summarize real-world data
  4. Write and run tests using pytest (from the project root)
  5. Configure GitHub Actions for continuous integration
  6. Participate in peer code review

Task

Build a Python command-line tool that:

  1. Fetches data from a public API of your choice. You may not use a weather API.
  2. Processes the data using NavigatorAI
  3. Displays a summary or analysis of the data

Example Usage

# Fetch weather data and summarize it
uv run python -m assignment0 --source weather --location "Gainesville, FL"

# Fetch news headlines and analyze sentiment
uv run python -m assignment0 --source news --query "artificial intelligence"

# Fetch earthquake data and summarize recent activity
uv run python -m assignment0 --source usgs --days 7

Suggested APIs (Choose One or Propose Your Own)

API Description Auth Required Documentation
USGS Earthquakes Recent earthquake data worldwide No earthquake.usgs.gov
Open-Meteo Weather forecasts and historical data No open-meteo.com
NASA APOD Astronomy Picture of the Day Free API key api.nasa.gov
News API Headlines from various sources Free API key newsapi.org
OpenLibrary Book information No openlibrary.org
PokeAPI Pokemon data (for fun!) No pokeapi.co

You may propose a different public API. The API must be freely accessible (free tier acceptable) and return structured data (JSON preferred).


Requirements

1. API Data Collection (15 points)

Implementation:

Example:

import os
import requests

def fetch_earthquake_data(days: int = 7) -> dict:
    """Fetch earthquake data from USGS API."""
    url = "https://earthquake.usgs.gov/fdsnws/event/1/query"
    params = {
        "format": "geojson",
        "starttime": calculate_start_date(days),
        "minmagnitude": 4.0
    }
    response = requests.get(url, params=params, timeout=30)
    response.raise_for_status()
    return response.json()

2. LLM Processing (15 points)

Use NavigatorAI to process and summarize the API data.

Setup:

  1. Log in to NavigatorAI with your UF credentials
  2. Access the API through the Navigator Toolkit
  3. Store your API key in an environment variable

Implementation:

Example prompt strategies:

3. Command-Line Interface (10 points)

Use argparse to handle command-line arguments:

4. Testing with pytest (5 points)

Write tests in the tests/ directory. See the Testing Guide section below for details on proper test structure.

Required tests:

5. CI/CD Setup (5 points)

Configure GitHub Actions to run tests on every push.

Create .github/workflows/pytest.yml:

name: PyTest
on: push

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    env:
      NAVIGATOR_API_KEY: $
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - uses: astral-sh/setup-uv@v5
      - run: uv sync
      - run: uv run pytest -v

See the Environment Variables and Secrets section for how to add secrets to your repository.


Package Management with uv

This course uses uv for Python package management. Do not use pip directly.

Adding Dependencies

When you need a new package, use uv add:

# Add a runtime dependency
uv add requests

# Add a development dependency (testing, linting)
uv add --dev pytest

# Add multiple packages
uv add requests python-dotenv

The uv add command:

  1. Adds the package to pyproject.toml under [project.dependencies]
  2. Updates uv.lock with the exact resolved versions
  3. Installs the package in your virtual environment

Installing from pyproject.toml

When cloning a repository or after pulling changes:

# Install all dependencies from pyproject.toml
uv sync

Running Python Code

Always use uv run to execute Python:

# Run a script
uv run python script.py

# Run as a module (preferred for packages)
uv run python -m assignment0 --help

# Run pytest
uv run pytest -v

Example pyproject.toml

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "assignment0"
version = "1.0.0"
description = "CIS 6930 Assignment 0 - API Data Collection with LLM"
authors = [{name = "Your Name", email = "your.email@ufl.edu"}]
requires-python = ">=3.11"
dependencies = [
    "requests>=2.31",
    "python-dotenv>=1.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=8.0",
]

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["."]

Environment Variables and Secrets

API keys and other sensitive credentials must never be committed to git. This section explains how to manage secrets locally and in GitHub Actions.

Local Development with .env Files

Use a .env file to store your API keys locally:

# .env (DO NOT COMMIT THIS FILE)
NAVIGATOR_API_KEY=your-api-key-here
NEWS_API_KEY=your-news-api-key-here

Create a .env.example file to document required variables (commit this file):

# .env.example (safe to commit - no real values)
NAVIGATOR_API_KEY=your-navigator-api-key
NEWS_API_KEY=your-news-api-key-if-using-news-api

Loading Environment Variables in Python

Use python-dotenv to load your .env file:

# assignment0/config.py
import os

from dotenv import load_dotenv

# Load .env file if it exists
load_dotenv()

# Access environment variables
NAVIGATOR_API_KEY = os.getenv("NAVIGATOR_API_KEY")
if not NAVIGATOR_API_KEY:
    raise ValueError("NAVIGATOR_API_KEY environment variable is required")

Add python-dotenv to your project:

uv add python-dotenv

Protecting Secrets with .gitignore

Your .gitignore must include .env to prevent accidental commits:

# .gitignore

# Environment variables - NEVER COMMIT
.env
.env.local
.env.*.local

# Python
__pycache__/
*.py[cod]
.venv/
*.egg-info/

# IDE
.vscode/
.idea/

GitHub Secrets for CI/CD

For tests that require API keys, use GitHub repository secrets.

Adding a secret to your repository:

  1. Go to your repository on GitHub
  2. Click Settings > Secrets and variables > Actions
  3. Click New repository secret
  4. Name: NAVIGATOR_API_KEY
  5. Value: Your actual API key
  6. Click Add secret

Using secrets in GitHub Actions:

Update your workflow to pass secrets as environment variables:

# .github/workflows/pytest.yml
name: PyTest
on: push

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    env:
      NAVIGATOR_API_KEY: $
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - uses: astral-sh/setup-uv@v5
      - run: uv sync
      - run: uv run pytest -v

Best practice for tests:

Most tests should mock API calls rather than use real credentials. Only use real API keys in CI/CD when you need integration tests:

# tests/test_integration.py
import os

import pytest

# Skip if no API key available
pytestmark = pytest.mark.skipif(
    not os.getenv("NAVIGATOR_API_KEY"),
    reason="NAVIGATOR_API_KEY not set"
)

def test_real_api_call():
    """Integration test using real API."""
    # This test only runs if NAVIGATOR_API_KEY is set
    ...

Summary: What to Commit

File Commit? Contains
.env NO Real API keys
.env.example Yes Variable names only
.gitignore Yes Patterns to ignore
Code files Yes os.getenv() calls

Testing Guide

Project Structure for Tests

Tests must be runnable from the project root without modifying sys.path:

cis6930sp26-assignment0/
├── assignment0/              # Your package directory
│   ├── __main__.py          # Entry point for python -m assignment0
│   ├── api.py               # API fetching functions
│   ├── llm.py               # LLM processing functions
│   └── cli.py               # Command-line interface
├── tests/
│   ├── test_api.py
│   ├── test_llm.py
│   └── test_cli.py
├── pyproject.toml
└── ...

Note: No __init__.py files are needed in Python 3.3+ due to namespace packages (PEP 420).

Writing Tests

Tests import from your package directly:

# tests/test_api.py
from assignment0.api import parse_earthquake_data

def test_parse_earthquake_data():
    sample_data = {
        "features": [
            {"properties": {"mag": 5.2, "place": "California"}}
        ]
    }
    result = parse_earthquake_data(sample_data)
    assert len(result) == 1
    assert result[0]["magnitude"] == 5.2

Running Tests

Always run pytest from the project root:

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run a specific test file
uv run pytest tests/test_api.py

# Run tests matching a pattern
uv run pytest -k "earthquake"

Why This Structure Works

The pyproject.toml configuration ensures pytest can find your code:

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["."]

Never do this:

# BAD - Do not modify sys.path in your code
import sys
sys.path.insert(0, os.path.dirname(__file__))

Do this instead:

# GOOD - Use standard imports
from assignment0.api import fetch_data

Mocking External Services

Use unittest.mock to avoid calling real APIs in tests. This is essential because:

Using MagicMock

MagicMock creates a mock object that accepts any attribute or method call:

from unittest.mock import MagicMock

# Create a mock that simulates a requests.Response object
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {"data": "test"}

# The mock accepts any method call
print(mock_response.json())  # Returns {"data": "test"}
print(mock_response.anything())  # Returns another MagicMock

Using patch as a Context Manager

patch temporarily replaces a function or object during a test:

# tests/test_api.py
from unittest.mock import patch, MagicMock
from assignment0.api import fetch_earthquake_data

def test_fetch_earthquake_data():
    # Create mock response
    mock_response = MagicMock()
    mock_response.status_code = 200
    mock_response.json.return_value = {
        "features": [{"properties": {"mag": 5.0, "place": "Test"}}]
    }

    # Patch requests.get in the module where it's used
    with patch("assignment0.api.requests.get", return_value=mock_response) as mock_get:
        result = fetch_earthquake_data(days=7)

        # Verify the function was called
        mock_get.assert_called_once()

        # Verify the result
        assert len(result["features"]) == 1

Using patch as a Decorator

from unittest.mock import patch, MagicMock
from assignment0.llm import summarize_with_llm

@patch("assignment0.llm.requests.post")
def test_summarize_with_llm(mock_post):
    # Configure the mock
    mock_response = MagicMock()
    mock_response.json.return_value = {
        "choices": [{"message": {"content": "This is a summary"}}]
    }
    mock_post.return_value = mock_response

    # Call the function
    result = summarize_with_llm("Some data to summarize")

    # Assertions
    assert result == "This is a summary"
    mock_post.assert_called_once()

Mocking Multiple Functions

from unittest.mock import patch, MagicMock
from assignment0.cli import main

@patch("assignment0.cli.fetch_data")
@patch("assignment0.cli.summarize_with_llm")
def test_main_flow(mock_summarize, mock_fetch):
    # Note: decorators are applied bottom-up, so mock_fetch is the inner patch
    mock_fetch.return_value = {"data": "test"}
    mock_summarize.return_value = "Summary of test data"

    result = main(["--source", "test"])

    mock_fetch.assert_called_once()
    mock_summarize.assert_called_once_with({"data": "test"})

Common Mock Assertions

# Was the mock called?
mock.assert_called()

# Was it called exactly once?
mock.assert_called_once()

# Was it called with specific arguments?
mock.assert_called_with(arg1, arg2, kwarg=value)

# Was it called once with specific arguments?
mock.assert_called_once_with(arg1, arg2)

# How many times was it called?
assert mock.call_count == 3

Project Structure

cis6930sp26-assignment0/
├── .github/
│   └── workflows/
│       └── pytest.yml
├── assignment0/
│   ├── __main__.py          # Entry point for `python -m assignment0`
│   ├── api.py               # API data fetching
│   ├── llm.py               # LLM processing
│   └── cli.py               # Argument parsing
├── tests/
│   ├── test_api.py
│   ├── test_llm.py
│   └── test_cli.py
├── .env.example             # Example environment variables (no secrets!)
├── .gitignore
├── COLLABORATORS.md
├── LICENSE
├── README.md
└── pyproject.toml

Note: No __init__.py files are required. Python 3.3+ supports namespace packages (PEP 420).

The __main__.py File

The __main__.py file makes your package runnable as a module using python -m:

# Instead of: python assignment0/some_script.py
# You can run: python -m assignment0
uv run python -m assignment0 --help

How it works:

  1. When you run python -m assignment0, Python looks for assignment0/__main__.py
  2. Python executes that file as the entry point
  3. The if __name__ == "__main__": guard ensures code only runs when executed directly

Example __main__.py:

# assignment0/__main__.py
from assignment0.cli import main

if __name__ == "__main__":
    main()

Why use -m instead of running a script directly?


Submission

  1. Create a private repository named cis6930sp26-assignment0
  2. Add collaborators: cegme (and TAs to be announced)
  3. Tag your final submission:
    git tag v1.0
    git push origin v1.0
    
  4. Submit the repository URL to Canvas

Late Policy

Due date: Monday, February 2, 2026 at 11:59 PM Late deadline: Wednesday, February 4, 2026 at 11:59 PM

You may submit after the due date until grading begins (typically 1-3 days after due date). The exact grading start time will not be announced. No submissions accepted after grading begins.

I strongly encourage submitting by the due date to avoid:


Peer Review

You will be assigned 2 classmates’ repositories to review. Completing your reviews is worth 5 points of your assignment grade.

Review assignments will be posted on Canvas by Thursday, February 5.

How to Review

  1. Clone the repository:
    git clone <assigned-repo-url>
    cd <repo-name>
    
  2. Install and test:
    uv sync
    cp .env.example .env
    # Add your NavigatorAI API key to .env
    uv run python -m assignment0 --help
    uv run pytest -v
    
  3. Check GitHub Actions: Look for the green checkmark (or red X) on the repository’s Actions tab.

  4. Score using the rubric: Use the Assignment 0 Peer Review Rubric to evaluate each criterion.

  5. Submit your review: Copy the review template from the rubric, fill it out, and submit to Canvas.

Review Checklist

For each repository you review:

Writing Good Reviews

Submit your peer reviews through Canvas by Friday, February 6 at 11:59 PM.


Meta Review

After peer reviews are submitted, you will evaluate the quality of reviews you received. This helps ensure reviewers provide thoughtful, constructive feedback.

For each review you received, rate:

Submit your meta reviews through Canvas by Saturday, February 7 at 11:59 PM.


Grading

See the full Peer Review Rubric for detailed scoring criteria and examples.

Point Breakdown

Component Points Graded By
API Data Collection 15 Peers
LLM Processing 15 Peers
Command-Line Interface 10 Peers
Testing 5 Peers
CI/CD & Project Structure 5 Peers
Implementation Subtotal 45  
Completing Peer Reviews 5 Instructor
Total 50  

Your final grade is the median of your peer review scores (for the 45-point implementation portion), plus points for completing your assigned reviews.


Tips


Preparing for Assignment 1

This assignment prepares you for the MCP Data Pipeline assignment where you will:

The skills you learn here (API calls, LLM integration, testing, packaging) will be essential.


Resources


Academic Integrity

This is an individual assignment. You may discuss concepts with classmates, but all code must be your own. Document any external resources or AI assistance in COLLABORATORS.md.