Workshop Trip to Chicago

By Chibuzor Okocha Joseph on Sep 8, 2025
Discussing the AfriVox poster during the TTIC 2025 workshop in Chicago.

Workshop Trip to Chicago

I spent a few energizing days at the Toyota Technological Institute at Chicago (TTIC) for a workshop centered on speech foundation models (SFMs) and audio-language models (ALMs). I presented my AfriVox poster and soaked in a whirlwind of ideas about the future of speech AI.

Overview

  • Dates: September 2025 (2-day workshop)
  • Venue: TTIC, Hyde Park — Chicago, IL
  • Purpose: Share AfriVox—a practical path toward fairer evaluation for African-accented speech—and learn from cutting-edge sessions on SFMs/ALMs, evaluation, and reproducibility to take back to UF Data Studio.

My Poster: AfriVox

AfriVox: Toward Fairer Benchmarks for African-Accented Speech in ALMs

  • Problem: Many ALM evaluations underrepresent African-accented English; this masks real-world performance gaps.
  • Approach: Curate diverse accent coverage, align tasks (ASR, QA, instruction-following), and pair human-centric judgments with automated metrics that actually correlate with perceived quality.
  • Early Lessons from Feedback:
    • Prioritize task relevance over metric abundance: pick fewer, better-aligned metrics.
    • Report accent-wise error spreads and confidence intervals—not just overall means.
    • Include minimal reproducibility kits (data cards, eval scripts, seeds, and versions) for easy reruns.
  • Next Steps: Add stronger reliability checks (inter-rater agreement), expand beyond English-only speech where feasible, and publish a light, reproducible eval harness.

Highlights

  • Keynote Themes that Stuck

    • Scaling isn’t everything: smarter data selection and targeted instruction tuning can rival sheer parameter count.
    • Representation gaps: accent, code-switching, and noisy field recordings remain Achilles’ heels for many SFMs/ALMs.
    • Evaluation ≠ leaderboard: we need task-grounded, user-centered measures—especially for safety-critical use cases.
  • Evaluation & Reproducibility Panel

    • Advocated transparent data cards and deterministic eval scripts (version-pinned, seed-set).
    • Emphasized error analyses by accent, environment, and prompt style—not just aggregate WER or BLEU.
    • Encouraged shared baselines and small, “lift-and-run” starter repos to reduce setup friction.
  • Modeling Sessions

    • Multimodal alignment: tighter audio-text alignment boosts instruction-following without overfitting to synthetic prompts.
    • Safety & bias: strong case for accent-aware evaluations, red-teaming prompts, and reporting disparities by subgroups.
    • Latency vs. quality trade-offs: practical recipes (quantization, streaming chunks, caching) for real-time scenarios.
  • 1:1 Conversations & Collabs

    • Feedback on AfriVox’s sampling strategy (ensure balanced accent/task pairing).
    • Ideas for UF Data Studio: standardize eval scripts across models (e.g., Whisper, Qwen2-Audio, SALMONN, GAMA) with a common interface; log accent-wise metrics by default.

Takeaways I’m Bringing Back to UF Data Studio

Trip Notes

  • Chicago in September = perfect workshop weather ☀️
  • Hyde Park coffee chats turned into mini-brainstorms on data cards and accent coverage.
  • Poster hour was packed—lots of interest in how to make fairness measurements easy to adopt.

Photos

AfriVox poster close-up at TTIC

Explaining AfriVox during the poster session

Chibuzor at the poster in Hyde Park

Acknowledgments

Huge thanks to the TTIC organizers and all the speakers/panelists for an inspiring program. Grateful to UF Data Studio and my mentors for the guidance that shaped AfriVox. also thanks to travel support/sponsors that helped make this trip possible.


If you’re interested in collaborating on accent-aware evaluation or trying the AfriVox starter kit, reach out—I’m happy to share scripts and notes.

Proudly Funded By

© Copyright 2025 by UF Data Studio. Built with ♥ by ceg.me (via CreativeDesignsGuru!).