CIS 6930 Spring 25

Logo

This is the web page for Data Engineering at the University of Florida.

Assignment 2 - Exploring a Dataset

CIS 6930 Spring 2025

The a U.S. Senate Commerce Committee Chairman uploaded a spreadsheet that highlights many National Science Foundation funded projects. In this assignment, it is your job to perform form exploratory analysis on the data in the spreadsheet.



In this assignment, you will use leading tools to look for statistics, trends, duplicate entries, and other interesting artifacts and information in the data. Prepare a PDF submission that includes include your name, and a through description of the steps you took to perform the analyis. You should upload yor submission to Canvas.

Below are the tools you can use to perform the analysis.

Data Prep

An open-source tool called DataPrep is available for you to use in python https://dataprep.ai. Only a pip install is needed to get started. The video below is an introduction to the tool.

Trifacta/Google Cloud Dataprep

Your Google Cloud Credits give you access to Google Data Prep also called Trifacta https://console.cloud.google.com/dataprep. You can enable the tool when you are logged into your Google Cloud account. The video below is an introduction to the tool.

Submission

The goal of the PDF is to demonstrate that your were able to use one of the tool above. Create visualization of your analyzed data and include a description of the steps you took to perform the analysis. You are expected to make heavy use of each tool’s documentation to understand the capabilities. Add a section describing your review of the data preparation tool that you used. The submission of the PDF should be done in Canvas.

Grading

Below are the point that are weighted equally (15 points each) for a total of 45 points.

Addenda


Back to CIS6930