This is the web page for Data Engineering at the University of Florida.
The a U.S. Senate Commerce Committee Chairman uploaded a spreadsheet that highlights many National Science Foundation funded projects. In this assignment, it is your job to perform form exploratory analysis on the data in the spreadsheet.
In this assignment, you will use leading tools to look for statistics, trends, duplicate entries, and other interesting artifacts and information in the data. Prepare a PDF submission that includes include your name, and a through description of the steps you took to perform the analyis. You should upload yor submission to Canvas.
Below are the tools you can use to perform the analysis.
An open-source tool called DataPrep is available for you to use in python https://dataprep.ai. Only a pip install is needed to get started. The video below is an introduction to the tool.
Your Google Cloud Credits give you access to Google Data Prep also called Trifacta https://console.cloud.google.com/dataprep. You can enable the tool when you are logged into your Google Cloud account. The video below is an introduction to the tool.
The goal of the PDF is to demonstrate that your were able to use one of the tool above. Create visualization of your analyzed data and include a description of the steps you took to perform the analysis. You are expected to make heavy use of each tool’s documentation to understand the capabilities. Add a section describing your review of the data preparation tool that you used. The submission of the PDF should be done in Canvas.
Below are the point that are weighted equally (15 points each) for a total of 45 points.
pyenv install 3.10.16
to install the needed version. Set the version with pyenv local 3.10.16
in the directory where you are working. Then install pipenv with pip instal pipenv
. You can then install the tool with pipenv install dataprep ipython
.Back to CIS6930