AI World mapped and visualised the language proficiency of Hugging Face's language model repository, which includes over 1.8 million models, revealing an already known reality in open source AI development—English dominates the landscape overwhelmingly.
This serves as a crucial indicator of which languages receive more attention in AI development and research. There is a significant imbalance favouring English-language models, followed by Chinese, French, German and Spanish.
However, despite English's dominance, there is positive development in multilingual expansion covering many more languages. Why is this important? A broader range and inclusion of different languages across the world can bridge the gap in terms of AI accessibility. Millions of speakers of underrepresented languages might face obstacles accessing AI tools—thus by broadening the linguistic scope of open-source AI models, AI knowledge, outreach, and usage are not only being increased, but AI as a whole becomes more democratised. European initiatives like EuroLLM and the Nordic Lumi models are addressing this challenge by developing dedicated multilingual systems for EU languages. Projects such as OpenLLM France are also creating specialised datasets and models to support French language processing, demonstrating growing efforts to preserve linguistic diversity in AI development.
More information here: https://thenextweb.com/news/making-multilingual-ai-in-europe