CIS 6930 Spring 25

Logo

This is the web page for Data Engineering at the University of Florida.

Assignment 1 - Gainesville Crime Reports

CIS 6930 Spring 2025

This assignment will be practicing extracting data from an online source and reformatting the data. You will create a python package that takes a search paramter and outputs information about incidents in Gainesville, Florida.



About the City of Gainesville Data Portal

The City of Gainesville’s Open Data Portal, known as dataGNV, is a comprehensive platform designed to provide public access to a wide range of datasets and information about the city. Managed by the Office of Strategic Initiatives, this portal aims to enhance transparency, encourage civic engagement, and support data-driven decision-making.

The Gainesville Police Department (GPD) provides a dataset called Crime Responses that documents initial details of incidents from 2011 to the present. Due to “Marsy’s Law,” incident locations are rounded off. In 2021, Florida transitioned from the Summary Reporting System (SRS) to the National Incident-Based Reporting System (NIBRS), which captures more detailed crime data. This change can cause an apparent increase in crime statistics, particularly property crimes, due to the more comprehensive reporting in NIBRS compared to SRS. The increase is usually not greater than 2.7%. More details on the impact of NIBRS on crime statistics can be found here.

For developers and data enthusiasts, dataGNV offers robust API capabilities. Users can access raw data programmatically, enabling them to integrate city data into their own applications. In paraticular, we are interested in the Crime Responses API. The API allows people to access the data 14 different attributes crimes incidents that have been recorded by the police department. The first 1000 records in JSON format can be retreived through a RESTful request to the resource endpoint https://data.cityofgainesville.org/resource/gvua-xt9q.json.

Assignment Task

You assignment is to build a Python function that collects elements from a given page and represents them in the requested format.

In your Python package, you should create a command line Python file that takes as parameters a --url, --offset, and a --limit parameter. In return, the package should return the the incident type, report date, offense date, latitude, and longitude. Alternatively, you can supply a --file flag for testing. The information should be printed to STDOUT in a thorn separated format.

{incident_type}{thorn}{report_date}{thorn}{offense_date}{thorn}{latitude}{thorn}{longitude}

In the output, you will use the lowercase thorn character (þ) to separate data fields. Fields with multiple entries should be separated by commas. Fields with null or empty entries should remain blank. Below is an example execution followed and example output.

pipenv run python main.py \
    --url https://data.cityofgainesville.org/resource/gvua-xt9q.json \
    --offset 0 \
    --limit 5
Drug Violationþ2025-01-20T20:35:26.000þ2025-01-20T19:35:25.000þ29.66545þ-82.3245
Theft Petit - Retailþ2025-01-20T20:23:15.000þ2025-01-20T19:53:00.000þ29.67054þ-82.33914
Stolen Vehicle (auto)þ2025-01-20T18:59:32.000þ2025-01-20T18:15:31.000þ29.68845þ-82.30986
Domestic Aggravated Assaultþ2025-01-20T18:19:00.000þ2025-01-20T18:18:00.000þ29.66116þ-82.33311
Informationþ2025-01-20T17:58:52.000þ2025-01-20T17:58:51.000þ29.6604þ-82.41168

You will also create simple test cases to show that you have tested your features.

Offset and Limit

Suppose this is the current data contained in the endpoint.

[ { "record": "0"},
  { "record": "1"},
  { "record": "2"},
    "...",
  { "record": "N"} ]

A request with offset=0 means a curser is placed before the initial record. Passing in offset=0 and limit=3 will produce a result based on the first three records:

[ { "record": "0"},
  { "record": "1"},
  { "record": "2"} ]

and when given the parameters offset = 2 and limit = 3, your a result cursor is placed just before the third record as shown below:

[ { "record": "2"},
  { "record": "3"},
  { "record": "4"} ]

Tasks Checklist

Project Submission

Create a private repository called cis6930sp25-assignment1. Please ensure you use this exact repository name, all lowercase. Add collaborators cegme, tzhan024, and abbasidaniyal by going to Settings > Collaborators and teams > add people.

Create a Python package

Please follow the packaging structure below. We explicitly use the flat layout instead of the src layout because we will be running the submission as top-level code through the main1. The docs folder can remain empty or with only a place holder file. This direcory could be used to hold autogenerate docs; auto generate docs are not required for this project. Please include your Pipfile but you do not need to commit a Pipfile.lock. The entry point for your code should be the main.py file.

cis6930sp25-assignment1/
├── COLLABORATORS.md
├── LICENSE
├── Pipfile
├── README.md
├── main.py
├── docs
├── pyproject.toml
└── tests
    ├── test_download.py
    ├── ...
    └── test_randompage.py

pyproject.toml

This is a template for the standard pyptoject file. Please adjust as needed.

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
authors = [{name = "Your name", email = "your.name@ufl.edu"}]
name = "assignment1"
description = "Assignment One from <github id> -- Spring 2025"
version = "1.0"
readme = "README.md"

[project.urls]
Task = "https://ufdatastudio.com/cis6930sp25/assignments/1"
repository = "https://github.com/<userid>/cis6930sp25-assignment1"

[tool.setuptools]
py-modules = []

[tool.pytest.ini_options]
testpaths = ["tests"]

README.md

The README.md file should be all uppercase with .md extension. You should write your name in it, and an example of how to run it including any bugs that should be expected. You should describe all features of your code. The README.md file should contain a list of any bugs or assumptions made while writing the program. You should include directions on how to install and use the Python package. We know your code will not be perfect, be sure to include any assumptions you make for your solution. Note: You should not be copying code from any website not provided by the instructor.

Below is an example template:

# cis6930sp25 -- Assignment1

Name:

## Assignment Description

In your own words...

## How to install

pipenv install -e .

## How to run

pipenv run python ...

## Example

![video](video)

## Features and functions

#### main.py

downloaddata() - this function...
...other functions

#### parsefile.py

dojsonparse() - this function ...

## Bugs and Assumptions

...

COLLABORATORS.md

This file should contain a pipe-separated list describing who you worked with and a small text description describing the nature of the collaboration. If you visited a website for inspiration, including the website. This information should be listed in three fields as in the example is below:

Katherine Johnson | kj@nasa.gov | Helped me understand calculations
Dorothy Vaughan | doro@dod.gov | Helped me with multiplexed time management
Stackoverflow | https://example | helped me with a compilation of python test

The collaborator file is mainly used to ensure that code similarities are coincidental. Be sure to abide by the acadenmic integrity guidelines outlined in the syllabus. Generative AI tools may result in code that is very similar to other student submission and should be avoided.

main.py

Here is an example main.py file we expect that yours will be different. This snippet shows an outline of the expected functionality. Calling the main function should initiate the project functionality. Below is an outline; it shows how to use argparse to pass parameters to the code. You will also need to ensure that your code works with linux based systems. We will use the pipenv environment to run your code.

# -*- coding: utf-8 -*-
# Abridged example main.py
import argparse
import json
import sys

def main(...):
    # Download data
    if page is not None:
        # TODO Download data function

    elif thefile is not None:
        # TODO Retrieve the file contents

    # TODO call formating data function

    # TODO print the thron separated fule


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--url", type=str, required=False, help="The source location on the web.")
    parser.add_argument("--file", type=str, required=False, help="The source location locally.")
    parser.add_argument("--offset", type=int, required=False, help="The offset to jump forward.")
    parser.add_argument("--limit", type=int, required=False, help="The number of records you want to retrive.")
    # ...

    args = parser.parse_args()
    if args.url:
        # ...
    else:
        parser.print_help(sys.stderr)

Tests

Tests should be runnable by using pipenv run python -m pytest -v. The tests should show that all the functionality works. We are not necessarily looking for bullet proof code. Visit the pytest docs for details.

All tests should go in the tests/ folder. The files names containing the tests functions should be prefixed with the word test. For example, data size tests could go in a file with the name test_download.py. Functions in the test file that should run as tests must be prefixed with the string test. We will run your tests from the root directory with the line pipenv run python -m pytest -v .. It is important to know that running pytest using the method in the previous sentence adds the current path to the sys.path and so you do not have to hack the run path in your test files.

An example test file is below

# -*- coding: utf-8 -*-
# Example test_first_sanity.py
import main

def test_sanity() -> None
    assert main.somefunction() == EXPECTED RESULT

...

Submitting your code

When ready to submit, create a tag on your repository using git tag on the latest commit:

git tag v1.0
git push origin v1.0

Version v1.0 lets us know when and what version of code you would like us to grade. If you would like to submit a second version before, use the tag v2.0. If you need to update a tag, view the commands in the following StackOverflow post.

You will also submit your repository to GradeScope. You will have to submit your whole repository – do not upload the files. A link will appear on canvas when submissions are available. Consider also adding a yaml file to your repository file for continuous integration in your GitHub project.

Grading

Grades will be assessed according to the following distribution:

Addenda


Back to CIS6930