Back to Stories

The growing number of non-English Models



Francisco RíosRobert PraasPierre-Alexandre Balland
December 16, 2025 - 1 min read

The evolving language landscape of Open Source models hosted on Hugging Face reveals a big expansion beyond a single dominant language. It shows that AI development is increasingly distributed across multiple linguistic communities, reflecting broader participation in model creation and training, indicating where activity is accelerating most rapidly between 2024 and 2025.

As can be seen in the visualization, the highest year-over-year growth is concentrated in Ukrainian, Swedish, Arabic, Turkish, and Chinese languages, which exhibit the largest percentage increases, signaling a rapid rise in newly released and/or updated models. The data suggests that development momentum is shifting toward languages that were previously less represented, contributing to a more diverse multilingual ecosystem within Open Source AI.

At the same time, English remains the most prevalent language by a huge margin, followed by Chinese, French, Spanish, and German in overall presence. While these languages continue to account for the largest share of models, their relative growth rates are lower than those of several emerging languages.


Scan the QR code to view this story on your mobile device.


HuggingFaceModelsOpen Source