Assignment 0: API Data Collection with LLM Processing

Due: Monday, February 2, 2026 at 11:59 PM Late Deadline: Wednesday, February 4, 2026 at 11:59 PM Points: 50 (part of Infrastructure Assignments - 5% of course grade) Submission: GitHub repository + Canvas link Peer Review: Due Friday, February 6, 2026 at 11:59 PM Meta Review: Due Saturday, February 7, 2026 at 11:59 PM

Overview

This assignment introduces the foundational tools you will use throughout the semester: Python packaging with uv, GitHub workflows, testing with pytest, and LLM APIs. You will build a command-line tool that fetches data from a public API and uses NavigatorAI to process and summarize that data.

This is a warm-up assignment designed to ensure everyone is comfortable with the development environment before we tackle more complex MCP-based pipelines in Assignment 1.

Learning Objectives

By completing this assignment, you will:

Set up a Python project with pyproject.toml and uv
Fetch and parse data from a public API
Use LLMs to process and summarize real-world data
Write and run tests using pytest (from the project root)
Configure GitHub Actions for continuous integration
Participate in peer code review

Task

Build a Python command-line tool that:

Fetches data from a public API of your choice. You may not use a weather API.
Processes the data using NavigatorAI
Displays a summary or analysis of the data

Example Usage

# Fetch weather data and summarize it
uv run python -m assignment0 --source weather --location "Gainesville, FL"

# Fetch news headlines and analyze sentiment
uv run python -m assignment0 --source news --query "artificial intelligence"

# Fetch earthquake data and summarize recent activity
uv run python -m assignment0 --source usgs --days 7

Suggested APIs (Choose One or Propose Your Own)

API	Description	Auth Required	Documentation
USGS Earthquakes	Recent earthquake data worldwide	No	earthquake.usgs.gov
Open-Meteo	Weather forecasts and historical data	No	open-meteo.com
NASA APOD	Astronomy Picture of the Day	Free API key	api.nasa.gov
News API	Headlines from various sources	Free API key	newsapi.org
OpenLibrary	Book information	No	openlibrary.org
PokeAPI	Pokemon data (for fun!)	No	pokeapi.co

You may propose a different public API. The API must be freely accessible (free tier acceptable) and return structured data (JSON preferred).

Requirements

1. API Data Collection (15 points)

Implementation:

Fetch data from your chosen API using the requests library
Handle network errors gracefully (timeouts, connection errors, HTTP errors)
Parse the JSON response and extract relevant fields
Store API keys securely using environment variables (never hardcode)

Example:

import os
import requests

def fetch_earthquake_data(days: int = 7) -> dict:
    """Fetch earthquake data from USGS API."""
    url = "https://earthquake.usgs.gov/fdsnws/event/1/query"
    params = {
        "format": "geojson",
        "starttime": calculate_start_date(days),
        "minmagnitude": 4.0
    }
    response = requests.get(url, params=params, timeout=30)
    response.raise_for_status()
    return response.json()

2. LLM Processing (15 points)

Use NavigatorAI to process and summarize the API data.

Setup:

Log in to NavigatorAI with your UF credentials
Access the API through the Navigator Toolkit
Store your API key in an environment variable

Implementation:

Send the collected data to NavigatorAI with a well-crafted prompt
Request a meaningful summary or analysis (not just reformatting)
Handle API errors gracefully
Format and display the LLM response

Example prompt strategies:

“Summarize the key trends in this earthquake data…”
“Analyze the sentiment of these news headlines and explain…”
“Compare these weather forecasts and identify patterns…”

3. Command-Line Interface (10 points)

Use argparse to handle command-line arguments:

--source: The data source/API to use
Additional arguments specific to your chosen API
--help: Usage instructions

4. Testing with pytest (5 points)

Write tests in the tests/ directory. See the Testing Guide section below for details on proper test structure.

Required tests:

Test argument parsing
Test data parsing functions (use sample data, not live API calls)
Test output formatting
Mock API responses for LLM calls

5. CI/CD Setup (5 points)

Configure GitHub Actions to run tests on every push.

Create .github/workflows/pytest.yml:

name: PyTest
on: push

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    env:
      NAVIGATOR_API_KEY: $
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - uses: astral-sh/setup-uv@v5
      - run: uv sync
      - run: uv run pytest -v

See the Environment Variables and Secrets section for how to add secrets to your repository.

Package Management with uv

This course uses uv for Python package management. Do not use pip directly.

Adding Dependencies

When you need a new package, use uv add:

# Add a runtime dependency
uv add requests

# Add a development dependency (testing, linting)
uv add --dev pytest

# Add multiple packages
uv add requests python-dotenv

The uv add command:

Adds the package to pyproject.toml under [project.dependencies]
Updates uv.lock with the exact resolved versions
Installs the package in your virtual environment

Installing from pyproject.toml

When cloning a repository or after pulling changes:

# Install all dependencies from pyproject.toml
uv sync

Running Python Code

Always use uv run to execute Python:

# Run a script
uv run python script.py

# Run as a module (preferred for packages)
uv run python -m assignment0 --help

# Run pytest
uv run pytest -v

Example pyproject.toml

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "assignment0"
version = "1.0.0"
description = "CIS 6930 Assignment 0 - API Data Collection with LLM"
authors = [{name = "Your Name", email = "your.email@ufl.edu"}]
requires-python = ">=3.11"
dependencies = [
    "requests>=2.31",
    "python-dotenv>=1.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=8.0",
]

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["."]

Environment Variables and Secrets

API keys and other sensitive credentials must never be committed to git. This section explains how to manage secrets locally and in GitHub Actions.

Local Development with .env Files

Use a .env file to store your API keys locally:

# .env (DO NOT COMMIT THIS FILE)
NAVIGATOR_API_KEY=your-api-key-here
NEWS_API_KEY=your-news-api-key-here

Create a .env.example file to document required variables (commit this file):

# .env.example (safe to commit - no real values)
NAVIGATOR_API_KEY=your-navigator-api-key
NEWS_API_KEY=your-news-api-key-if-using-news-api

Loading Environment Variables in Python

Use python-dotenv to load your .env file:

# assignment0/config.py
import os

from dotenv import load_dotenv

# Load .env file if it exists
load_dotenv()

# Access environment variables
NAVIGATOR_API_KEY = os.getenv("NAVIGATOR_API_KEY")
if not NAVIGATOR_API_KEY:
    raise ValueError("NAVIGATOR_API_KEY environment variable is required")

Add python-dotenv to your project:

uv add python-dotenv

Protecting Secrets with .gitignore

Your .gitignore must include .env to prevent accidental commits:

# .gitignore

# Environment variables - NEVER COMMIT
.env
.env.local
.env.*.local

# Python
__pycache__/
*.py[cod]
.venv/
*.egg-info/

# IDE
.vscode/
.idea/

GitHub Secrets for CI/CD

For tests that require API keys, use GitHub repository secrets.

Adding a secret to your repository:

Go to your repository on GitHub
Click Settings > Secrets and variables > Actions
Click New repository secret
Name: NAVIGATOR_API_KEY
Value: Your actual API key
Click Add secret

Using secrets in GitHub Actions:

Update your workflow to pass secrets as environment variables:

# .github/workflows/pytest.yml
name: PyTest
on: push

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    env:
      NAVIGATOR_API_KEY: $
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - uses: astral-sh/setup-uv@v5
      - run: uv sync
      - run: uv run pytest -v

Best practice for tests:

Most tests should mock API calls rather than use real credentials. Only use real API keys in CI/CD when you need integration tests:

# tests/test_integration.py
import os

import pytest

# Skip if no API key available
pytestmark = pytest.mark.skipif(
    not os.getenv("NAVIGATOR_API_KEY"),
    reason="NAVIGATOR_API_KEY not set"
)

def test_real_api_call():
    """Integration test using real API."""
    # This test only runs if NAVIGATOR_API_KEY is set
    ...

Summary: What to Commit

File	Commit?	Contains
`.env`	NO	Real API keys
`.env.example`	Yes	Variable names only
`.gitignore`	Yes	Patterns to ignore
Code files	Yes	`os.getenv()` calls

Testing Guide

Project Structure for Tests

Tests must be runnable from the project root without modifying sys.path:

cis6930sp26-assignment0/
├── assignment0/              # Your package directory
│   ├── __main__.py          # Entry point for python -m assignment0
│   ├── api.py               # API fetching functions
│   ├── llm.py               # LLM processing functions
│   └── cli.py               # Command-line interface
├── tests/
│   ├── test_api.py
│   ├── test_llm.py
│   └── test_cli.py
├── pyproject.toml
└── ...

Note: No __init__.py files are needed in Python 3.3+ due to namespace packages (PEP 420).

Writing Tests

Tests import from your package directly:

# tests/test_api.py
from assignment0.api import parse_earthquake_data

def test_parse_earthquake_data():
    sample_data = {
        "features": [
            {"properties": {"mag": 5.2, "place": "California"}}
        ]
    }
    result = parse_earthquake_data(sample_data)
    assert len(result) == 1
    assert result[0]["magnitude"] == 5.2

Running Tests

Always run pytest from the project root:

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run a specific test file
uv run pytest tests/test_api.py

# Run tests matching a pattern
uv run pytest -k "earthquake"

Why This Structure Works

The pyproject.toml configuration ensures pytest can find your code:

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["."]

Never do this:

# BAD - Do not modify sys.path in your code
import sys
sys.path.insert(0, os.path.dirname(__file__))

Do this instead:

# GOOD - Use standard imports
from assignment0.api import fetch_data

Mocking External Services

Use unittest.mock to avoid calling real APIs in tests. This is essential because:

Tests should be fast and not depend on network
Tests should be deterministic (same input = same output)
You don’t want to hit rate limits or incur API costs

Using `MagicMock`

MagicMock creates a mock object that accepts any attribute or method call:

from unittest.mock import MagicMock

# Create a mock that simulates a requests.Response object
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {"data": "test"}

# The mock accepts any method call
print(mock_response.json())  # Returns {"data": "test"}
print(mock_response.anything())  # Returns another MagicMock

Using `patch` as a Context Manager

patch temporarily replaces a function or object during a test:

# tests/test_api.py
from unittest.mock import patch, MagicMock
from assignment0.api import fetch_earthquake_data

def test_fetch_earthquake_data():
    # Create mock response
    mock_response = MagicMock()
    mock_response.status_code = 200
    mock_response.json.return_value = {
        "features": [{"properties": {"mag": 5.0, "place": "Test"}}]
    }

    # Patch requests.get in the module where it's used
    with patch("assignment0.api.requests.get", return_value=mock_response) as mock_get:
        result = fetch_earthquake_data(days=7)

        # Verify the function was called
        mock_get.assert_called_once()

        # Verify the result
        assert len(result["features"]) == 1

Using `patch` as a Decorator

from unittest.mock import patch, MagicMock
from assignment0.llm import summarize_with_llm

@patch("assignment0.llm.requests.post")
def test_summarize_with_llm(mock_post):
    # Configure the mock
    mock_response = MagicMock()
    mock_response.json.return_value = {
        "choices": [{"message": {"content": "This is a summary"}}]
    }
    mock_post.return_value = mock_response

    # Call the function
    result = summarize_with_llm("Some data to summarize")

    # Assertions
    assert result == "This is a summary"
    mock_post.assert_called_once()

Mocking Multiple Functions

from unittest.mock import patch, MagicMock
from assignment0.cli import main

@patch("assignment0.cli.fetch_data")
@patch("assignment0.cli.summarize_with_llm")
def test_main_flow(mock_summarize, mock_fetch):
    # Note: decorators are applied bottom-up, so mock_fetch is the inner patch
    mock_fetch.return_value = {"data": "test"}
    mock_summarize.return_value = "Summary of test data"

    result = main(["--source", "test"])

    mock_fetch.assert_called_once()
    mock_summarize.assert_called_once_with({"data": "test"})

Common Mock Assertions

# Was the mock called?
mock.assert_called()

# Was it called exactly once?
mock.assert_called_once()

# Was it called with specific arguments?
mock.assert_called_with(arg1, arg2, kwarg=value)

# Was it called once with specific arguments?
mock.assert_called_once_with(arg1, arg2)

# How many times was it called?
assert mock.call_count == 3

Project Structure

cis6930sp26-assignment0/
├── .github/
│   └── workflows/
│       └── pytest.yml
├── assignment0/
│   ├── __main__.py          # Entry point for `python -m assignment0`
│   ├── api.py               # API data fetching
│   ├── llm.py               # LLM processing
│   └── cli.py               # Argument parsing
├── tests/
│   ├── test_api.py
│   ├── test_llm.py
│   └── test_cli.py
├── .env.example             # Example environment variables (no secrets!)
├── .gitignore
├── COLLABORATORS.md
├── LICENSE
├── README.md
└── pyproject.toml

Note: No __init__.py files are required. Python 3.3+ supports namespace packages (PEP 420).

The `main.py` File

The __main__.py file makes your package runnable as a module using python -m:

# Instead of: python assignment0/some_script.py
# You can run: python -m assignment0
uv run python -m assignment0 --help

How it works:

When you run python -m assignment0, Python looks for assignment0/__main__.py
Python executes that file as the entry point
The if __name__ == "__main__": guard ensures code only runs when executed directly

Example __main__.py:

# assignment0/__main__.py
from assignment0.cli import main

if __name__ == "__main__":
    main()

Why use -m instead of running a script directly?

Ensures proper package imports work correctly
Adds the current directory to sys.path automatically
Standard way to run Python packages

Submission

Create a private repository named cis6930sp26-assignment0
Add collaborators: cegme ~~(and TAs to be announced)~~
Tag your final submission:
```
git tag v1.0
git push origin v1.0
```
Submit the repository URL to Canvas

Late Policy

Due date: Monday, February 2, 2026 at 11:59 PM Late deadline: Wednesday, February 4, 2026 at 11:59 PM

You may submit after the due date until grading begins (typically 1-3 days after due date). The exact grading start time will not be announced. No submissions accepted after grading begins.

I strongly encourage submitting by the due date to avoid:

Missing the grading window
Mounting assignment burden
Delayed peer review assignment

Peer Review

You will be assigned 2 classmates’ repositories to review. Completing your reviews is worth 5 points of your assignment grade.

Review assignments will be posted on Canvas by Thursday, February 5.

How to Review

Clone the repository:

git clone <assigned-repo-url>
cd <repo-name>

Install and test:

uv sync
cp .env.example .env
# Add your NavigatorAI API key to .env
uv run python -m assignment0 --help
uv run pytest -v

Check GitHub Actions: Look for the green checkmark (or red X) on the repository’s Actions tab.
Score using the rubric: Use the Assignment 0 Peer Review Rubric to evaluate each criterion.
Submit your review: Copy the review template from the rubric, fill it out, and submit to Canvas.

Review Checklist

For each repository you review:

Successfully cloned the repository
uv sync installed dependencies without errors
Application runs and fetches data from an API
LLM processing produces meaningful output
Tests pass when running uv run pytest -v
GitHub Actions shows passing tests (green checkmark)
README explains how to run the code
Code is organized as a proper Python package
No hardcoded API keys or committed .env file

Writing Good Reviews

Be specific: Instead of “code is messy,” say “consider extracting the API logic from main.py:45-60 into a separate function”
Be constructive: Focus on how to improve, not just what’s wrong
Be thorough: Actually run the code, don’t just read it
Be fair: Use the rubric consistently

Submit your peer reviews through Canvas by Friday, February 6 at 11:59 PM.

Meta Review

After peer reviews are submitted, you will evaluate the quality of reviews you received. This helps ensure reviewers provide thoughtful, constructive feedback.

For each review you received, rate:

Was the review thorough? (Did they actually run your code?)
Was the feedback specific and actionable?
Was the scoring fair and consistent with the rubric?

Submit your meta reviews through Canvas by Saturday, February 7 at 11:59 PM.

Grading

See the full Peer Review Rubric for detailed scoring criteria and examples.

Point Breakdown

Component	Points	Graded By
API Data Collection	15	Peers
LLM Processing	15	Peers
Command-Line Interface	10	Peers
Testing	5	Peers
CI/CD & Project Structure	5	Peers
Implementation Subtotal	45
Completing Peer Reviews	5	Instructor
Total	50

Your final grade is the median of your peer review scores (for the 45-point implementation portion), plus points for completing your assigned reviews.

Tips

Start early - API setup and NavigatorAI configuration may take time
Use environment variables - Never commit API keys to git
Test incrementally - Get API fetching working before adding LLM processing
Keep prompts focused - Clear prompts produce better LLM outputs
Check the project structure - Proper packaging prevents import errors

Preparing for Assignment 1

This assignment prepares you for the MCP Data Pipeline assignment where you will:

Build MCP servers that expose tools to LLMs
Use LLMs to orchestrate multi-step data processing
Compare LLM-driven vs. traditional approaches

The skills you learn here (API calls, LLM integration, testing, packaging) will be essential.

Resources

Academic Integrity

This is an individual assignment. You may discuss concepts with classmates, but all code must be your own. Document any external resources or AI assistance in COLLABORATORS.md.