Loading...
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation
Villa-Cueva, Emilio ; Bolatzhanova, Sholpan ; Turmakhan, Diana ; Elzeky, Kareem ; Ademtew, Henok Biadglign ; Aji, Alham Fikri ; Araujo, Vladimir ; Azime, Israel Abebe ; Baek, Jinheon ; Belcavello, Frederico ... show 10 more
Villa-Cueva, Emilio
Bolatzhanova, Sholpan
Turmakhan, Diana
Elzeky, Kareem
Ademtew, Henok Biadglign
Aji, Alham Fikri
Araujo, Vladimir
Azime, Israel Abebe
Baek, Jinheon
Belcavello, Frederico
Files
Author
Villa-Cueva, Emilio
Bolatzhanova, Sholpan
Turmakhan, Diana
Elzeky, Kareem
Ademtew, Henok Biadglign
Aji, Alham Fikri
Araujo, Vladimir
Azime, Israel Abebe
Baek, Jinheon
Belcavello, Frederico
Cristobal, Fermin
Cruz, Jan Christian Blaise
Dabre, Mary
Dabre, Raj
Ehsan, Toqeer
Etori, Naome A
Farooqui, Fauzan
Geng, Jiahui
Ivetta, Guido
Jayakumar, Thanmay
Jeong, Soyeong
Lim, Zheng Wei
Mandal, Aishik
Martinelli, Sofía
Mihaylov, Mihail Minkov
Orel, Daniil
Pramanick, Aniket
Purkayastha, Sukannya
Salazar, Israfel
Song, Haiyue
Timponi Torrent, Tiago
Yadeta, Debela Desalegn
Hamed, Injy
Tonja, Atnafu Lambebo
Solorio, Thamar
Bolatzhanova, Sholpan
Turmakhan, Diana
Elzeky, Kareem
Ademtew, Henok Biadglign
Aji, Alham Fikri
Araujo, Vladimir
Azime, Israel Abebe
Baek, Jinheon
Belcavello, Frederico
Cristobal, Fermin
Cruz, Jan Christian Blaise
Dabre, Mary
Dabre, Raj
Ehsan, Toqeer
Etori, Naome A
Farooqui, Fauzan
Geng, Jiahui
Ivetta, Guido
Jayakumar, Thanmay
Jeong, Soyeong
Lim, Zheng Wei
Mandal, Aishik
Martinelli, Sofía
Mihaylov, Mihail Minkov
Orel, Daniil
Pramanick, Aniket
Purkayastha, Sukannya
Salazar, Israfel
Song, Haiyue
Timponi Torrent, Tiago
Yadeta, Debela Desalegn
Hamed, Injy
Tonja, Atnafu Lambebo
Solorio, Thamar
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Translating cultural content poses challenges for machine translation systems due to the differences in conceptualizations between cultures, where language alone may fail to convey sufficient context to capture region-specific meanings. In this work, we investigate whether images can act as cultural context in multimodal translation. We introduce CaMMT, a human-curated benchmark of over 5,800 triples of images along with parallel captions in English and regional languages. Using this dataset, we evaluate five Vision Language Models (VLMs) in text-only and text+image settings. Through automatic and human evaluations, we find that visual context generally improves translation quality, especially in handling Culturally-Specific Items (CSIs), disambiguation, and correct gender marking. By releasing CaMMT, our objective is to support broader efforts to build and evaluate multimodal translation systems that are better aligned with cultural nuance and regional variations.
Citation
E. Villa-Cueva, S. Bolatzhanova, D. Turmakhan, K. Elzeky, H.B. Ademtew, A.F. Aji, V. Araujo, I.A. Azime, J. Baek, F. Belcavello, F. Cristobal, J.C.B. Cruz, M. Dabre, R. Dabre, T. Ehsan, N.A. Etori, F. Farooqui, J. Geng, G. Ivetta, T. Jayakumar, S. Jeong, Z.W. Lim, A. Mandal, S. Martinelli, M.M. Mihaylov, D. Orel, A. Pramanick, S. Purkayastha, I. Salazar, H. Song, T. Timponi Torrent, D.D. Yadeta, I. Hamed, A.L. Tonja, T. Solorio, "CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation," 2025, pp. 22423-22441.
Source
Conference
Findings of the Association for Computational Linguistics: EMNLP 2025
Keywords
Subjects
Source
Findings of the Association for Computational Linguistics: EMNLP 2025
Publisher
Association for Computational Linguistics
