Data Engineering at the University of Florida
Due: Monday, March 30, 2026 at 11:59 PM Points: 60 (50 implementation + 10 reflection) Submission: GitHub repository + Canvas link
In this assignment, you will conduct a fairness audit of an AI system. You will train a RandomForestClassifier on a real dataset, compute fairness metrics, analyze where bias enters the pipeline, and write a reflection connecting your findings to the social implications of algorithmic bias.
This assignment combines the technical metrics from Day 23 with the social analysis from Day 24.
By completing this assignment, you will:
cis6930sp26-assignment3git clone https://github.com/YOUR_USERNAME/cis6930sp26-assignment3.git
cd cis6930sp26-assignment3
# Install dependencies
uv sync
Run the tests to confirm everything is installed. All tests should fail with NotImplementedError:
uv run pytest
You will work with the Adult Income Dataset (UCI ML Repository), which predicts whether an individual earns more than $50K/year. This dataset has known biases related to gender and race.
The starter code (data.py) loads and preprocesses the dataset for you.
Features include: age, education, occupation, hours-per-week, marital-status, etc. Protected attributes: sex, race Target: income (>50K or <=50K)
The following files are already implemented for you:
assignment3/data.py — Loads and preprocesses the Adult Income datasetassignment3/model.py — Trains a RandomForestClassifier (n_estimators=100, random_state=42) and evaluates itassignment3/mitigate.py — Reweighing and threshold adjustment mitigation techniquesassignment3/main.py — Runs the full pipelinefairlearn — Installed as a dependency; you may use it to cross-check your implementationsYou can also use fairlearn metrics in your code if you find them helpful, but you must implement the 6 metrics yourself in fairness_metrics.py.
Follow these steps in order. Each step builds on the previous one.
File: assignment3/fairness_metrics.py
Start here because these functions have no dependencies on other modules and the tests are the most straightforward.
Implement all 6 metrics:
statistical_parity_difference — Difference in positive prediction rates between groupsdisparate_impact_ratio — Ratio of positive prediction rates between groupsequal_opportunity_difference — Difference in true positive rates between groupsaverage_odds_difference — Average of TPR and FPR differences between groupspredictive_parity_difference — Difference in precision between groupstheil_index — Measures inequality in correct predictions across individualsUse the _split_by_group helper (provided) to separate arrays into privileged and unprivileged groups.
Verify: uv run pytest tests/test_fairness_metrics.py
File: assignment3/audit.py
Implement run_audit(y_true, y_pred, protected_attributes) which:
FAIR_THRESHOLDS using the provided is_fair() helperThe return value must follow this structure:
{
"sex": {
"statistical_parity_difference": {"value": -0.19, "fair": False},
"disparate_impact_ratio": {"value": 0.27, "fair": False},
# ... other metrics
},
"race": {
# ... same metrics for race
},
"theil_index": {"value": 0.12, "fair": False},
}
Verify: uv run pytest tests/test_audit.py
Try it: uv run python -m assignment3.audit
The mitigation code is provided in mitigate.py. Run it and record the before/after metrics:
uv run python -m assignment3.mitigate
Compare the baseline audit results with the mitigated results. Fill in the mitigation table in your README.md.
Confirm all tests pass and the full pipeline runs:
uv run pytest # All tests pass
uv run python -m assignment3.main # Full pipeline runs
Update README.md with your audit results tables (baseline and mitigated).
File: REFLECTION.md (minimum 500 words total)
Metric Conflicts (3 points): Show a specific example from your audit where improving one fairness metric worsened another. Explain why this happens using the impossibility theorem.
Social Context (4 points): The Adult Income dataset was collected from the 1994 Census. Discuss how historical and structural factors (e.g., occupational segregation, educational access, wealth gaps) are encoded in this dataset. Reference at least one reading from class (Birhane, Mitchell, or Aspen Digital).
Metric Selection (3 points): If this model were deployed for real loan decisions, which fairness metric would you prioritize and why? What harms does your chosen metric fail to capture? Who benefits and who bears the cost of your choice?
# Run tests
uv run pytest
# Train baseline model and view accuracy
uv run python -m assignment3.model
# Run fairness audit
uv run python -m assignment3.audit
# Apply mitigation and compare
uv run python -m assignment3.mitigate
# Run all steps
uv run python -m assignment3.main
| Component | Points | Description |
|---|---|---|
Fairness metrics (fairness_metrics.py) |
20 | All 6 metrics implemented correctly |
Audit report (audit.py) |
10 | Returns correct dictionary structure with metrics and fair flags |
| Mitigation comparison | 5 | Before/after results recorded in README |
| Tests pass | 5 | All provided tests pass |
| README results filled in | 10 | Audit tables and key findings completed |
| Component | Points | Description |
|---|---|---|
| Metric conflicts | 3 | Concrete example with impossibility theorem explanation |
| Social context | 4 | Structural factors analysis with reading references |
| Metric selection | 3 | Justified choice with trade-off analysis |
| Total | 60 | |———–|——–|
Your README must include:
How to install dependencies and run the system.
Present your baseline audit results:
## Baseline Audit Results
### Protected Attribute: Sex
| Metric | Value | Fair? |
|--------|-------|-------|
| Statistical Parity Difference | -0.XX | Yes/No |
| Disparate Impact Ratio | 0.XX | Yes/No |
| Equal Opportunity Difference | -0.XX | Yes/No |
| Average Odds Difference | -0.XX | Yes/No |
| Predictive Parity Difference | 0.XX | Yes/No |
| Theil Index | 0.XX | Yes/No |
### Protected Attribute: Race
[Same table format]
Show before/after comparison:
## Mitigation Results
**Method used:** [Reweighing / Threshold Adjustment / Other]
| Metric | Before | After | Improved? |
|--------|--------|-------|-----------|
| Statistical Parity Difference | -0.XX | -0.XX | Yes/No |
| ...
Summarize your most important findings in 2-3 paragraphs.
Document all collaboration and AI assistance (required).
cis6930sp26-assignment3/
├── assignment3/
│ ├── __init__.py
│ ├── data.py # Provided: Loads Adult dataset
│ ├── model.py # Step 2: Train RandomForestClassifier
│ ├── fairness_metrics.py # Step 1: Implement 6 fairness metrics
│ ├── audit.py # Step 3: Build audit report
│ ├── mitigate.py # Step 4: Apply mitigation
│ └── main.py # Provided: Runs all steps
├── tests/
│ ├── test_fairness_metrics.py
│ ├── test_model.py
│ ├── test_audit.py
│ └── test_mitigate.py
├── data/ # Auto-downloaded by starter code
├── REFLECTION.md # Step 7: Written reflection
├── COLLABORATORS.md
├── README.md # Step 6: Fill in results
├── .gitignore
└── pyproject.toml
cis6930sp26-assignment3cegme as an Admin collaboratorgit tag v1.0
git push origin v1.0
fairness_metrics.py (Step 1) — it has no dependencies and the tests are straightforwardy_pred[protected_attr == 'Male']aif360 to cross-check your metric implementationsThis is an individual assignment. You may discuss concepts with classmates, but all code and the reflection must be your own. Document all collaboration in COLLABORATORS.md.