A new study reveals how today’s most powerful language models consistently favor Standard German over regional dialects, exposing hidden biases that could influence real-world judgments in classrooms, workplaces, and digital platforms.

Experimental overview. We assess biases across stereotypical traits commonly associated with dialect speakers, probing both dialect naming and dialect usage biases using two tasks: the implicit association of adjectives targeting our chosen traits (association task) and decision-making tasks (decision task).
Large language models such as GPT-5 and Llama systematically evaluate speakers of German dialects less favorably than those using Standard German, according to a collaborative study conducted by Johannes Gutenberg University Mainz (JGU), the University of Hamburg, and the University of Washington. The work, led by Professor Katharina von der Wense and Minh Duc Bui, was presented at the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), one of the field’s leading international venues. The results show that all tested AI systems reproduce social stereotypes linked to dialect use.
“Dialects are an essential part of cultural identity,” said Minh Duc Bui, a doctoral researcher in von der Wense’s Natural Language Processing group at JGU’s Institute of Computer Science. “Our analyses suggest that language models associate dialects with negative traits, thereby perpetuating problematic social biases.”
Methodology
The research team used linguistic databases containing orthographic and phonetic variants of German dialects to translate seven regional varieties into Standard German. This parallel corpus enabled controlled comparisons by presenting the same content in either Standard German or dialect form.
Ten large language models were tested, including open-source systems such as Gemma and Qwen and the commercial model GPT-5. Each model received text samples written in Standard German or in one of seven dialects: Low German, Bavarian, North Frisian, Saterfrisian, Ripuarian (including Kölsch), Alemannic, and Rhine-Franconian (including Palatine and Hessian).
The systems were asked to assign personal attributes to fictional speakers and to make decisions in scenarios such as hiring, workshop invitations, or housing selection.
Consistent stereotype reproduction
The results showed that nearly all models reproduced widely known social stereotypes. Standard German speakers were more often described as “educated”, “professional”, or “trustworthy”, whereas dialect speakers were more often assigned attributes such as “rural”, “traditional”, or “uneducated”. Even traits traditionally linked to dialect speakers in sociolinguistic research, such as “friendly”, were more frequently attributed by the models to Standard German speakers.
Decision-based tasks reflected similar patterns. Dialect speakers were more often associated with agricultural work, anger management workshops, or rural residential choices. “These associations reflect societal assumptions embedded within the training data of many language models,” noted Professor von der Wense. “This is concerning because AI systems are increasingly used in settings such as education or hiring, where language often serves as a proxy for competence or credibility.”
Bias increases when dialects are explicitly identified
Bias became particularly pronounced when models were informed expressly that a text was written in a dialect. A notable finding was that larger models within the same family displayed stronger biases. “Bigger does not necessarily mean fairer,” said Bui. “Larger models appear to learn social stereotypes with even greater precision.”
Bias persists beyond spelling or grammar differences
The team also compared dialect texts with artificially modified Standard German that contained noise such as misspellings or distortions. The discrimination persisted, indicating that the bias cannot be explained simply by unfamiliar spelling patterns or grammatical deviations.
These results reflect a broader phenomenon observed in other languages as well. “German dialects serve as an informative case study for understanding how language models treat regional and social variation,” said Bui. “Comparable patterns have been documented for other language varieties, including African American English.”
Future directions
The researchers plan to examine how AI models respond to dialects spoken in the Mainz region and to explore strategies for improving fairness in language model training. “Dialects are a vital part of social identity,” said von der Wense. “Ensuring that AI systems recognize and respect this diversity is not only a matter of technical fairness but also one of social responsibility.”
Source:
Journal reference:
- Minh Duc Bui, Carolin Holtermann, Valentin Hofmann, Anne Lauscher, and Katharina von der Wense. 2025. Large Language Models Discriminate Against Speakers of German Dialects. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 8223–8251, Suzhou, China. Association for Computational Linguistics, https://aclanthology.org/2025.emnlp-main.415/