Assignment 3: Fairness Audit

Due: Monday, March 30, 2026 at 11:59 PM Points: 60 (50 implementation + 10 reflection) Submission: GitHub repository + Canvas link

Overview

In this assignment, you will conduct a fairness audit of an AI system. You will train a RandomForestClassifier on a real dataset, compute fairness metrics, analyze where bias enters the pipeline, and write a reflection connecting your findings to the social implications of algorithmic bias.

This assignment combines the technical metrics from Day 23 with the social analysis from Day 24.

Learning Objectives

By completing this assignment, you will:

Compute and interpret multiple fairness metrics on a real dataset
Identify sources of bias in an ML pipeline
Apply bias mitigation techniques and measure their effects
Articulate the trade-offs between competing fairness definitions
Connect technical metrics to social consequences

Setup

1. Create Your Repository from the Template

Go to the starter template: cegme/cis6930sp26-assignment3-starter
Click “Use this template” → “Create a new repository”
Name your repository cis6930sp26-assignment3
Set it to Private
Clone your new repository:

git clone https://github.com/YOUR_USERNAME/cis6930sp26-assignment3.git
cd cis6930sp26-assignment3

# Install dependencies
uv sync

2. Verify Setup

Run the tests to confirm everything is installed. All tests should fail with NotImplementedError:

uv run pytest

3. The Dataset

You will work with the Adult Income Dataset (UCI ML Repository), which predicts whether an individual earns more than $50K/year. This dataset has known biases related to gender and race.

The starter code (data.py) loads and preprocesses the dataset for you.

Features include: age, education, occupation, hours-per-week, marital-status, etc. Protected attributes: sex, race Target: income (>50K or <=50K)

What Is Provided

The following files are already implemented for you:

assignment3/data.py — Loads and preprocesses the Adult Income dataset
assignment3/model.py — Trains a RandomForestClassifier (n_estimators=100, random_state=42) and evaluates it
assignment3/mitigate.py — Reweighing and threshold adjustment mitigation techniques
assignment3/main.py — Runs the full pipeline
fairlearn — Installed as a dependency; you may use it to cross-check your implementations

You can also use fairlearn metrics in your code if you find them helpful, but you must implement the 6 metrics yourself in fairness_metrics.py.

Steps to Complete the Assignment

Follow these steps in order. Each step builds on the previous one.

Step 1: Implement fairness metrics (20 points)

File: assignment3/fairness_metrics.py

Start here because these functions have no dependencies on other modules and the tests are the most straightforward.

Implement all 6 metrics:

statistical_parity_difference — Difference in positive prediction rates between groups
disparate_impact_ratio — Ratio of positive prediction rates between groups
equal_opportunity_difference — Difference in true positive rates between groups
average_odds_difference — Average of TPR and FPR differences between groups
predictive_parity_difference — Difference in precision between groups
theil_index — Measures inequality in correct predictions across individuals

Use the _split_by_group helper (provided) to separate arrays into privileged and unprivileged groups.

Verify: uv run pytest tests/test_fairness_metrics.py

Step 2: Build the audit (10 points)

File: assignment3/audit.py

Implement run_audit(y_true, y_pred, protected_attributes) which:

For each protected attribute (e.g., sex, race), computes all 5 group fairness metrics using your functions from Step 1
Computes the Theil index once (it is not group-specific)
Checks each metric against FAIR_THRESHOLDS using the provided is_fair() helper
Returns a nested dictionary (see the docstring for the exact structure)

The return value must follow this structure:

{
    "sex": {
        "statistical_parity_difference": {"value": -0.19, "fair": False},
        "disparate_impact_ratio": {"value": 0.27, "fair": False},
        # ... other metrics
    },
    "race": {
        # ... same metrics for race
    },
    "theil_index": {"value": 0.12, "fair": False},
}

Verify: uv run pytest tests/test_audit.py

Try it: uv run python -m assignment3.audit

Step 3: Run the mitigation comparison (5 points)

The mitigation code is provided in mitigate.py. Run it and record the before/after metrics:

uv run python -m assignment3.mitigate

Compare the baseline audit results with the mitigated results. Fill in the mitigation table in your README.md.

Step 4: Run the full pipeline (5 points)

Confirm all tests pass and the full pipeline runs:

uv run pytest                          # All tests pass
uv run python -m assignment3.main      # Full pipeline runs

Step 5: Fill in your results (0 points, but required)

Update README.md with your audit results tables (baseline and mitigated).

Step 6: Write the reflection (10 points)

File: REFLECTION.md (minimum 500 words total)

Metric Conflicts (3 points): Show a specific example from your audit where improving one fairness metric worsened another. Explain why this happens using the impossibility theorem.
Social Context (4 points): The Adult Income dataset was collected from the 1994 Census. Discuss how historical and structural factors (e.g., occupational segregation, educational access, wealth gaps) are encoded in this dataset. Reference at least one reading from class (Birhane, Mitchell, or Aspen Digital).
Metric Selection (3 points): If this model were deployed for real loan decisions, which fairness metric would you prioritize and why? What harms does your chosen metric fail to capture? Who benefits and who bears the cost of your choice?

Running the System

# Run tests
uv run pytest

# Train baseline model and view accuracy
uv run python -m assignment3.model

# Run fairness audit
uv run python -m assignment3.audit

# Apply mitigation and compare
uv run python -m assignment3.mitigate

# Run all steps
uv run python -m assignment3.main

Grading

Implementation (50 points)

Component	Points	Description
Fairness metrics (`fairness_metrics.py`)	20	All 6 metrics implemented correctly
Audit report (`audit.py`)	10	Returns correct dictionary structure with metrics and fair flags
Mitigation comparison	5	Before/after results recorded in README
Tests pass	5	All provided tests pass
README results filled in	10	Audit tables and key findings completed

Reflection (10 points)

Component	Points	Description
Metric conflicts	3	Concrete example with impossibility theorem explanation
Social context	4	Structural factors analysis with reading references
Metric selection	3	Justified choice with trade-off analysis

| Total | 60 | |———–|——–|

README.md

Your README must include:

1. Setup Instructions

How to install dependencies and run the system.

2. Audit Results

Present your baseline audit results:

## Baseline Audit Results

### Protected Attribute: Sex

| Metric | Value | Fair? |
|--------|-------|-------|
| Statistical Parity Difference | -0.XX | Yes/No |
| Disparate Impact Ratio | 0.XX | Yes/No |
| Equal Opportunity Difference | -0.XX | Yes/No |
| Average Odds Difference | -0.XX | Yes/No |
| Predictive Parity Difference | 0.XX | Yes/No |
| Theil Index | 0.XX | Yes/No |

### Protected Attribute: Race
[Same table format]

3. Mitigation Results

Show before/after comparison:

## Mitigation Results

**Method used:** [Reweighing / Threshold Adjustment / Other]

| Metric | Before | After | Improved? |
|--------|--------|-------|-----------|
| Statistical Parity Difference | -0.XX | -0.XX | Yes/No |
| ...

4. Key Findings

Summarize your most important findings in 2-3 paragraphs.

COLLABORATORS.md

Document all collaboration and AI assistance (required).

Project Structure

cis6930sp26-assignment3/
├── assignment3/
│   ├── __init__.py
│   ├── data.py               # Provided: Loads Adult dataset
│   ├── model.py              # Step 2: Train RandomForestClassifier
│   ├── fairness_metrics.py   # Step 1: Implement 6 fairness metrics
│   ├── audit.py              # Step 3: Build audit report
│   ├── mitigate.py           # Step 4: Apply mitigation
│   └── main.py               # Provided: Runs all steps
├── tests/
│   ├── test_fairness_metrics.py
│   ├── test_model.py
│   ├── test_audit.py
│   └── test_mitigate.py
├── data/                     # Auto-downloaded by starter code
├── REFLECTION.md             # Step 7: Written reflection
├── COLLABORATORS.md
├── README.md                 # Step 6: Fill in results
├── .gitignore
└── pyproject.toml

Submission

Create a private repository named cis6930sp26-assignment3
Add cegme as an Admin collaborator
Tag your final submission:
```
git tag v1.0
git push origin v1.0
```
Submit the repository URL to Canvas

Tips

Start with fairness_metrics.py (Step 1) — it has no dependencies and the tests are straightforward
Use numpy boolean indexing — computing metrics per group is clean with y_pred[protected_attr == 'Male']
Handle edge cases — division by zero when a group has no positive predictions
Verify against AIF360 — optionally install aif360 to cross-check your metric implementations
The reflection matters — it is worth 10 points and requires engagement with the course readings
Run tests after each step — do not move to the next step until the current tests pass

Resources

IBM AI Fairness 360 Documentation
Fairlearn Documentation
Google PAIR: Measuring Fairness
Adult Income Dataset
scikit-learn RandomForestClassifier
Birhane (2026) “Algorithmic Bias” — https://oecs.mit.edu/pub/b61joemo
Kleinberg et al. (2016) “Inherent Trade-Offs in the Fair Determination of Risk Scores”
Course Lectures: Day 23-24

Academic Integrity

This is an individual assignment. You may discuss concepts with classmates, but all code and the reflection must be your own. Document all collaboration in COLLABORATORS.md.