The team spent several months investigating and studying Mixtec Codices. This work releases an initial data set of segmented and labeled figures from three Mixtec Codices.
Figure 1 - AmericasNLP research poster by Webber et al.
In addition to data set creation, our paper describes a comparison of Visual Transormer models and CNN models in identifying important properties of the figures. We find the Transformer models perform well and are able to identify important features that scholars identify as distingushing. This data set is perfect for anyone looking to study the intricacies of Mixtec codices.
We hope to meet native speakers, domain experts, and others interested in working with us to explore the wonder Mixtec language and stories. Check out the other fantastic papers that the AmericasNLP workshop.
In addition, feel free to investigate our model through the Streamlit demo of our classifiers.