This is the web page for Data Engineering at the University of Florida.
In your final task as a member of the MIB, you have been tasked with creating a system that can intercept, process, classify, and respond to potential alien communications being broadcast across Earth. You will build a containerized application that connects to a central UFO Sighting Detection Network. It should take signals from other agents ealier in the pipeline, processes the incoming data, and collaborates with other agents (your classmates) to identify patterns that might indicate extraterrestrial activity. (Given the time crunch, you will not be working directly with other agents.)
In this project, you will develop a the end stages of a data engineering pipeline that spans from data ingestion to final analysis. You will implement asynchronous message processing using RabbitMQ and apply machine learning techniques to classify data in real time. The project also emphasizes building containerized applications using Apptainer and incorporating structured logging with Loguru. As with the previous project, you will manage dependencies using the uv Python package manager and create data visualizations to effectively communicate their findings.
Following your successful management of the Extra Terrestrial incidents (Project 1) and your analysis of UFO sighting records (Project 2), you have been promoted to a field agent in the Extraterrestrial Data Analysis Department (EDAD). Recent intelligence suggests that extraterrestrial entities are attempting to communicate with each other using Earth’s communication infrastructure, disguising their messages as regular network traffic. The central EDAD command (Dr. Grant) has set up a network monitoring station (RabbitMQ server) that captures potential alien communications and broadcasts them to all field agents. Your mission is to develop a system that can:
Command central will be hosted on a machine in the NVIDIA DGX Cloud block.
The possible machines will be cpu[001-002],gpu[001-022]
with a host name ending in cm.cluster.
Authentication should not be necessary but you can use the super secret user name and password pair guest/guest
to access the RabbitMQ server.
RabbitMQ is a message broker that facilitates communication distributed communication. We will host a RabbitMQ server on a machine in the NVIDIA DGX Cloud to act as a Pub/Sub system. It is your task to subscribe to the message queue in order to receive and process the messages.
We will provide the host for the cluster, you will need to create a consumer for the content.
We will have an exchange name ufo
and the exchange_type is fanout
.
Becasue we are using fanout, ew will not be using a routing key.
channel.exchange_declare(exchange='ufo', exchange_type='fanout')
Command messages will be sent as strings in json format. Below is a table for each key in the json message and a description of its context. Messages that are intelligible are non-extrerrestrial messages.
Key | Description |
---|---|
id |
(str) The id of the message |
lat |
(numeric) The latitude of a sighting |
lon |
(numeric) The longitude of a sighting |
time |
(str) The time of the sighting in Unix timestamp format |
frequency |
(numeric) Number of times the object was seen before message sent |
shape |
(str) circle, square, triangle, light, etc. |
msg |
(str) associate intercepted message that could be an alien communication |
{ "id": "msg-001", "lat": 37.7749, "lon": -122.4194, "time": "1712841600", "frequency": 3, "shape": "triangle", "msg": "zorblax nenu flargh!" }
{ "id": "msg-002", "lat": 40.7128, "lon": -74.0060, "time": "1712938000", "frequency": 1, "shape": "light", "msg": "🛸 bleep bloop! Initiating K99 protocol" }
{ "id": "msg-003", "lat": 34.0522, "lon": -118.2437, "time": "1713024000", "frequency": 5, "shape": "circle", "msg": "⚠️ H'gro thwak norz plen!" }
{ "id": "msg-004", "lat": 41.8781, "lon": -87.6298, "time": "1713110400", "frequency": 2, "shape": "square", "msg": "Enc0ded#msg_778w: gl!zn@" }
{ "id": "msg-005", "lat": 29.7604, "lon": -95.3698, "time": "1713196800", "frequency": 7, "shape": "unknown", "msg": "!!!—zrg quoo zrg quoo" }
Collect and store all of the data received from the RabbitMQ server. In addition to storage, log the progress of your code with Loguru.
Using a method of your chooing, use the msg
parameter to identify if the intercepted message is Alien text or not.
If the message is not compeltely intelligibell, it is likely an alien message.
It is up to you to discover intelligibility but typically words that can be spoken in English are considered intelligible.
Log the labels that you give to the collected msgs.
The labels should be alien👽
or human.
Generate four visualization and write them to a pdf file. The first two visualizations are manditory:
The two other visualization you may create yourself.
You should use your creativity to consider the types of of graphs that would be useful and appropriate to show the data.
Save the visualization as report{1,2,3,4}.pdf
.
Create a private repository called cis6930sp25-project3
.
Please ensure you use this exact repository name, all lowercase.
Add collaborators cegme
, tzhan024
, and abbasidaniyal
by going to Settings > Collaborators and teams > add people
.
When ready to submit, create a tag on your repository using git tag on the latest commit:
git tag v1.0
git push origin v1.0
The version v1.0 lets us know when and what version of code you would like us to grade.
If you need to submit an updated version, you can use the tag v1.1
.
We will also ask you to submit all code files on Gradescope.
Use uv
and pyproject.toml
to create a Python package for your project.
Follow the standards of the previous assignments.
We expect a main.py
file to be able to pass in the command location and the command port.
E.g.,
uv run python main.py --command 'cpu002.cm.cluster' --port 5672
In addtion include a apptainer.def
file that will allow execution of the queue program from a container.
# Build local container
apptainer build edad.sif apptainer.def
apptainer run edad.sif --command 'cpu002.cm.cluster' --port 5672
Write all appropriate files to a folder called output/
.
The README.md file should be all uppercase with .md
extension.
You should write your name in it, and an example of how to run it including any bugs that should be expected.
You should describe all features of your code.
The README.md file should contain a list of any bugs or assumptions made while writing the program.
You should include directions on how to install and use the Python package.
We know your code will not be perfect, be sure to include any assumptions you make for your solution.
Note: You should not be copying code from any website not provided by the instructor.
This file should contain a pipe-separated list describing who you worked with and a small text description describing the nature of the collaboration. If you visited a website for inspiration, including the website. This information should be listed in three fields as in the example is below:
Katherine Johnson | kj@nasa.gov | Helped me understand calculations
Dorothy Vaughan | doro@dod.gov | Helped me with multiplexed time management
Stackoverflow | https://example | helped me with a compilation of python test
The collaborator file is mainly used to ensure that code similarities are coincidental. Be sure to abide by the acadenmic integrity guidelines outlined in the syllabus. Generative AI tools may result in code that is very similar to other student submission and should be avoided.
You should have your own test data set that you can use to test your code.
Add test flags as appropriate for you.
Tests should be runnable by using uv run python -m pytest -v
.
The tests should show that all the functionality works.
We are not necessarily looking for bullet proof code.
Visit the pytest docs for details.
All tests should go in the tests/
folder.
The files names containing the tests functions should be prefixed with the word test
.
For example, data size tests could go in a file with the name test_download.py
.
Functions in the test file that should run as tests must be prefixed with the string test
.
We will run your tests from the root directory with the line uv run python -m pytest -v .
.
It is important to know that running pytest using the method in the previous sentence adds the current path to the sys.path and so you do not have to hack the run path in your test files.
Consider installing the pytest-cov
package to measure the code coverage of your tests.
The code will be tested live during the final exam. The professor will pull your code and execute the code in the class cluster.
Percentage | |
---|---|
README.md and documentation | 20% |
Code functionality logging, container, stat gathering | 30% |
Quality and appropriateness visualizatons | 40% |
100% |