CIS 6930 Spring 25

Logo

This is the web page for Data Engineering at the University of Florida.

Project 3 - Analyzing Sighting

CIS 6930 Spring 2025

In your final task as a member of the MIB, you have been tasked with creating a system that can intercept, process, classify, and respond to potential alien communications being broadcast across Earth. You will build a containerized application that connects to a central UFO Sighting Detection Network. It should take signals from other agents ealier in the pipeline, processes the incoming data, and collaborates with other agents (your classmates) to identify patterns that might indicate extraterrestrial activity. (Given the time crunch, you will not be working directly with other agents.)



Project Overview

In this project, you will develop a the end stages of a data engineering pipeline that spans from data ingestion to final analysis. You will implement asynchronous message processing using RabbitMQ and apply machine learning techniques to classify data in real time. The project also emphasizes building containerized applications using Apptainer and incorporating structured logging with Loguru. As with the previous project, you will manage dependencies using the uv Python package manager and create data visualizations to effectively communicate their findings.

Following your successful management of the Extra Terrestrial incidents (Project 1) and your analysis of UFO sighting records (Project 2), you have been promoted to a field agent in the Extraterrestrial Data Analysis Department (EDAD). Recent intelligence suggests that extraterrestrial entities are attempting to communicate with each other using Earth’s communication infrastructure, disguising their messages as regular network traffic. The central EDAD command (Dr. Grant) has set up a network monitoring station (RabbitMQ server) that captures potential alien communications and broadcasts them to all field agents. Your mission is to develop a system that can:

  1. Receive these broadcasts
  2. Process and analyze the content
  3. Identify potential alien communications using machine learning
  4. Generate a findings visuaization report for central command
  5. Collaborate with other field agents to improve detection capabilities

Command Central Receiving Messages

Command central will be hosted on a machine in the NVIDIA DGX Cloud block. The possible machines will be cpu[001-002],gpu[001-022] with a host name ending in cm.cluster. Authentication should not be necessary but you can use the super secret user name and password pair guest/guest to access the RabbitMQ server.

RabbitMQ

RabbitMQ is a message broker that facilitates communication distributed communication. We will host a RabbitMQ server on a machine in the NVIDIA DGX Cloud to act as a Pub/Sub system. It is your task to subscribe to the message queue in order to receive and process the messages.

We will provide the host for the cluster, you will need to create a consumer for the content. We will have an exchange name ufo and the exchange_type is fanout. Becasue we are using fanout, ew will not be using a routing key.

You will need to read on how to use RabbigMQ.
channel.exchange_declare(exchange='ufo', exchange_type='fanout')

Command messages

Command messages will be sent as strings in json format. Below is a table for each key in the json message and a description of its context. Messages that are intelligible are non-extrerrestrial messages.

Key Description
id (str) The id of the message
lat (numeric) The latitude of a sighting
lon (numeric) The longitude of a sighting
time (str) The time of the sighting in Unix timestamp format
frequency (numeric) Number of times the object was seen before message sent
shape (str) circle, square, triangle, light, etc.
msg (str) associate intercepted message that could be an alien communication
{ "id": "msg-001", "lat": 37.7749, "lon": -122.4194, "time": "1712841600", "frequency": 3, "shape": "triangle", "msg": "zorblax nenu flargh!" }
{ "id": "msg-002", "lat": 40.7128, "lon": -74.0060, "time": "1712938000", "frequency": 1, "shape": "light", "msg": "🛸 bleep bloop! Initiating K99 protocol" }
{ "id": "msg-003", "lat": 34.0522, "lon": -118.2437, "time": "1713024000", "frequency": 5, "shape": "circle", "msg": "⚠️ H'gro thwak norz plen!" }
{ "id": "msg-004", "lat": 41.8781, "lon": -87.6298, "time": "1713110400", "frequency": 2, "shape": "square", "msg": "Enc0ded#msg_778w: gl!zn@" }
{ "id": "msg-005", "lat": 29.7604, "lon": -95.3698, "time": "1713196800", "frequency": 7, "shape": "unknown", "msg": "!!!—zrg quoo zrg quoo" }

Analyzing messages

Collect and store all of the data received from the RabbitMQ server. In addition to storage, log the progress of your code with Loguru.

Alien detection

Using a method of your chooing, use the msg parameter to identify if the intercepted message is Alien text or not. If the message is not compeltely intelligibell, it is likely an alien message. It is up to you to discover intelligibility but typically words that can be spoken in English are considered intelligible. Log the labels that you give to the collected msgs. The labels should be alien👽 or human.

Visualizations

Generate four visualization and write them to a pdf file. The first two visualizations are manditory:

  1. Show the disribution of sighting locations on a map
  2. Show the frequency of the shapes over time

The two other visualization you may create yourself. You should use your creativity to consider the types of of graphs that would be useful and appropriate to show the data. Save the visualization as report{1,2,3,4}.pdf.

Project Submission

Create a private repository called cis6930sp25-project3. Please ensure you use this exact repository name, all lowercase. Add collaborators cegme, tzhan024, and abbasidaniyal by going to Settings > Collaborators and teams > add people.

When ready to submit, create a tag on your repository using git tag on the latest commit:

git tag v1.0
git push origin v1.0

The version v1.0 lets us know when and what version of code you would like us to grade. If you need to submit an updated version, you can use the tag v1.1.

We will also ask you to submit all code files on Gradescope.

Create a Python Package

Use uv and pyproject.toml to create a Python package for your project. Follow the standards of the previous assignments. We expect a main.py file to be able to pass in the command location and the command port. E.g.,

uv run python main.py --command 'cpu002.cm.cluster' --port 5672

In addtion include a apptainer.def file that will allow execution of the queue program from a container.

# Build local container
apptainer build edad.sif apptainer.def
apptainer run edad.sif --command 'cpu002.cm.cluster' --port 5672

Write all appropriate files to a folder called output/.

README.md

The README.md file should be all uppercase with .md extension. You should write your name in it, and an example of how to run it including any bugs that should be expected. You should describe all features of your code. The README.md file should contain a list of any bugs or assumptions made while writing the program. You should include directions on how to install and use the Python package. We know your code will not be perfect, be sure to include any assumptions you make for your solution. Note: You should not be copying code from any website not provided by the instructor.

COLLABORATORS.md

This file should contain a pipe-separated list describing who you worked with and a small text description describing the nature of the collaboration. If you visited a website for inspiration, including the website. This information should be listed in three fields as in the example is below:

Katherine Johnson | kj@nasa.gov | Helped me understand calculations
Dorothy Vaughan | doro@dod.gov | Helped me with multiplexed time management
Stackoverflow | https://example | helped me with a compilation of python test

The collaborator file is mainly used to ensure that code similarities are coincidental. Be sure to abide by the acadenmic integrity guidelines outlined in the syllabus. Generative AI tools may result in code that is very similar to other student submission and should be avoided.

Tests

You should have your own test data set that you can use to test your code. Add test flags as appropriate for you. Tests should be runnable by using uv run python -m pytest -v. The tests should show that all the functionality works. We are not necessarily looking for bullet proof code. Visit the pytest docs for details.

All tests should go in the tests/ folder. The files names containing the tests functions should be prefixed with the word test. For example, data size tests could go in a file with the name test_download.py. Functions in the test file that should run as tests must be prefixed with the string test. We will run your tests from the root directory with the line uv run python -m pytest -v .. It is important to know that running pytest using the method in the previous sentence adds the current path to the sys.path and so you do not have to hack the run path in your test files.

Consider installing the pytest-cov package to measure the code coverage of your tests.

Grading

The code will be tested live during the final exam. The professor will pull your code and execute the code in the class cluster.

  Percentage
README.md and documentation 20%
Code functionality logging, container, stat gathering 30%
Quality and appropriateness visualizatons 40%
  100%

Notes and Links