Back to Stories

From the bottom to the top: Robotics datasets lead on Hugging Face



Robert PraasRamón SánchezPierre-Alexandre Balland
December 22, 2025 - 2 min read

Robotics, vision and multimodal models have been on the rise for the past years and so have the datasets for them. The graph above contains both a race chart of the amount of datasets per task category on Hugging Face, as well as a bump chart for comparing the ranking of these categories over time.

In just two years, Hugging Face datasets grew from 11k to over 600k public datasets - and robotics is by far the fastest-growing segment. We went from 1,145 robotics datasets in 2024 to 26,991 in 2025. This means climbing from rank 44 to 1 in only 3 years (see image below). For comparison, text generation, the second-largest category, has only around 5,000 datasets in 2025. The datasets are filtered for having at least 200 downloads, showing how much experimentation is currently done in the robotics domain.

Figure: Bump chart focused on robotics, excerpt from headline visualization.

There is also significant activity beyond robotics. For instance, the availability of datasets for multimodal systems, such as visual question-answering, text to speech, vision-language models and video-language models increased rapidly during the last few years. On the other hand, datasets for fill-mask, graph machine learning and document question-answering decreased substantially in rank, showing AI is very much an evolving field.



Scan the QR code to view this story on your mobile device.


datasetsroboticsopen source