Comic Book Interpretation

By Christopher William Driggers-Ellis on Jul 31, 2023
An example of a comic book page in our dataset and its interpretation.

Comics and manga are popular storytelling media, and research indicates that the popularity and sales of these types of book are only poised to grow. Meanwhile, no system exists to scan and read these popular media to visually impaired users, posing an accessibility problem for them.

To address this need among visually impaired and blind users, we investigate the ability of Vision Language Models (VLMs) to interpret comic book media gathered from sources in the public domain. Our goal is to inform design decisions for screen reading software with a feature that will scan comic books on a user's screen and read them a VLM's interpretation of the book.

We have constructed a benchmarking dataset to evaluate VLMs' ability to scan and interpret comic books for visually impaired users. The image at the top of this page is a cropped version of a sample image from the dataset and the human-written groud truth interpretation against which machine interpretations are compared. T The dataset and the results gathered. The full image is shown below.

An example of a comic book page in our dataset and its interpretation.

Demo

Forthcoming

Links and Resources

People

Repo(s) (Contact UF Data Studio admin. for access.)

Publications

  • Forthcoming
© Copyright 2025 by UF Data Studio. Built with ♥ by ceg.me (via CreativeDesignsGuru!).