Colour in Translation: Data, Models, and Benchmarking for Cross-Linguistic Colour Naming

Mylonas, Dimitris, Ahmed, Rafique, Sinkeviciute, Akvile and Koliousis, Alexandros (2026) Colour in Translation: Data, Models, and Benchmarking for Cross-Linguistic Colour Naming. In: CHI '26: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, April 2026, Barcelona, Spain.

Abstract

Colour naming links vision and language. Yet, effective cross linguistic colour communication is limited by the lack of multilingual data and computational models for comprehensive colour name translation. We collected 6,408 unique colour naming responses in five languages using online experiments and fieldwork. For each language, we train a \emph{spin colour forest}, a novel partially rotated decision trees model that accurately estimate colour naming distributions across the full gamut, consistently outperforming existing methods. Unlike prior work that assumed 11 universal colour categories, our results reveal cross-linguistic variation in naming granularity: American English uses 47 indispensable colour names, British English 32, French 27, Greek 32, and the Himba 7 to categorise the same perceptually uniform colour space. Building on these findings, we develop a colour translation benchmark, which we demonstrate by evaluating both the lexical and perceptual accuracy of a large language model. Our evaluation reveals a critical lexical-perceptual disconnect, demonstrating that language models lack perceptual grounding in colour translation. Our data, models, and benchmark provide an empirical foundation for inclusive design that reflects how people communicate colour across cultures.

Actions (login required)

Edit Item Edit Item