Data Engineering at the University of Florida
Date: Wednesday, February 11, 2026 Due: Before next class (Friday, February 13, 2026 at 8:30 AM) Points: 12 (Infrastructure Assignment) Submission: Canvas (GitHub repo URL + reflection)
In this hands-on activity, you will set up and run an MCP server on HiPerGator that processes data from a Hugging Face dataset. This activity is designed as a live walkthrough during class. The instructor will demonstrate each step, share configuration details, and help troubleshoot issues in real-time.
This exercise directly prepares you for Assignment 1: MCP Data Pipeline by ensuring you can run MCP servers on HiPerGator’s compute nodes.
Complete these steps before arriving to class:
cis6930sp26-assignment1.5cegme as an Admin collaborator on your repositoryssh YOUR_GATORLINK@hpg.rc.ufl.edu
cd ~
git clone https://github.com/YOUR_USERNAME/cis6930sp26-assignment1.5.git
cd cis6930sp26-assignment1.5
The specific module versions are critical. We will demonstrate the exact commands during class.
# Module commands will be provided during class walkthrough
module load ____________
module load ____________
module load ____________
In-class note: The instructor will share the specific module versions that are tested and working on HiPerGator. These versions may differ from the default modules.
# Navigate to project directory
cd ~/cis6930sp26-assignment1.5
# Create virtual environment using uv
# The exact uv installation method for HiPerGator will be demonstrated
____________
____________
Create a .env file with your credentials:
HUGGINGFACE_TOKEN=____________
In-class note: Additional configuration details will be shared during the live walkthrough.
We will use the dair-ai/emotion dataset from Hugging Face, which contains text samples labeled with emotions (sadness, joy, love, anger, fear, surprise).
This dataset is ideal for practicing MCP because:
| Column | Type | Description |
|---|---|---|
text |
string | The text sample |
label |
int | Emotion label (0-5) |
Labels mapping:
Open server.py and examine the structure:
from mcp.server.fastmcp import FastMCP
from datasets import load_dataset
from loguru import logger
import os
# Load environment variables
from dotenv import load_dotenv
load_dotenv()
# Initialize MCP server
mcp = FastMCP("EmotionDataProcessor")
# Dataset will be loaded once when server starts
_dataset = None
def get_dataset():
"""Lazy-load the emotion dataset."""
global _dataset
if _dataset is None:
logger.info("Loading emotion dataset from Hugging Face...")
_dataset = load_dataset("dair-ai/emotion", split="train")
logger.info(f"Loaded {len(_dataset)} samples")
return _dataset
@mcp.tool()
def get_sample(n: int = 5) -> str:
"""Get n random samples from the emotion dataset.
Args:
n: Number of samples to retrieve (default: 5, max: 20)
Returns:
JSON string with samples including text and emotion label
"""
# Implementation during class
pass
@mcp.tool()
def count_by_emotion(emotion: str) -> str:
"""Count samples for a specific emotion.
Args:
emotion: One of 'sadness', 'joy', 'love', 'anger', 'fear', 'surprise'
Returns:
JSON string with count and percentage
"""
# Implementation during class
pass
@mcp.tool()
def search_text(query: str, limit: int = 10) -> str:
"""Search for samples containing specific text.
Args:
query: Text to search for (case-insensitive)
limit: Maximum results to return (default: 10)
Returns:
JSON string with matching samples
"""
# Implementation during class
pass
@mcp.tool()
def analyze_emotion_distribution() -> str:
"""Get the distribution of emotions in the dataset.
Returns:
JSON string with counts and percentages for each emotion
"""
# Implementation during class
pass
During class, we will implement each tool together. The instructor will:
Your task: Follow along and implement the tools in your server.py file.
We will use SLURM to request compute resources:
# SLURM command with specific parameters for our class allocation
srun --account=____________ \
--qos=____________ \
--ntasks=1 \
--cpus-per-task=2 \
--mem=4gb \
--time=00:30:00 \
--pty bash -i
In-class note: The account and QOS values specific to our class allocation will be provided during the walkthrough.
# Activate environment and start server
source .venv/bin/activate
uv run python server.py
You should see output like:
2026-02-11 08:45:23.456 | INFO | Loading emotion dataset from Hugging Face...
2026-02-11 08:45:28.123 | INFO | Loaded 16000 samples
2026-02-11 08:45:28.124 | INFO | MCP server 'EmotionDataProcessor' starting...
In a new terminal (keeping the server running), connect to your running job:
# SSH to HiPerGator
ssh YOUR_GATORLINK@hpg.rc.ufl.edu
# Find your job ID and connect to it
squeue -u $USER
srun --pty --overlap --jobid ____________ bash
# Navigate to project and run inspector
cd ~/cis6930sp26-assignment1.5
source .venv/bin/activate
uv run mcp dev server.py
For submission, you need to capture the following outputs from your MCP server:
Tool List Output: Run the list_tools command in MCP Inspector and capture the output showing all 4 tools.
Sample Data Output: Call get_sample(n=3) and capture the returned samples.
Emotion Distribution: Call analyze_emotion_distribution() and capture the distribution statistics.
Custom Search: Call search_text(query="happy") and capture the results.
# Save outputs to files for submission
# Commands will be demonstrated during class
Submit the following to Canvas before Friday, February 13, 2026 at 8:30 AM:
cis6930sp26-assignment1.5 repository containing:
server.py with all 4 tools implementedpyproject.toml with dependencies.env.example (do NOT commit your actual .env file)outputs.txt with your tool call outputsoutputs.txt must contain:
c0701a-s17)git add server.py pyproject.toml .env.example outputs.txt
git commit -m "feat: complete MCP in-class activity"
git push origin main
| Criterion | Points |
|---|---|
| MCP server runs without errors | 3 |
| All 4 tools implemented and functional | 5 |
| Outputs captured and submitted | 2 |
| Reflection demonstrates understanding | 2 |
| Total | 12 |
| Issue | Solution |
|---|---|
ModuleNotFoundError: No module named 'mcp' |
____ |
Connection refused on MCP Inspector |
____ |
CUDA out of memory |
____ |
| HuggingFace rate limit | ____ |
In-class note: Solutions to these common issues will be demonstrated live. The specific fixes depend on HiPerGator’s current configuration.
If you completed this activity successfully, you are ready for Assignment 1. Consider:
This is an individual in-class activity. You may:
You may not:
This activity is designed as a live walkthrough. Students who attend class will receive step-by-step guidance, while those who miss class will need to figure out the HiPerGator-specific configuration details independently.
Last updated: February 2026