How output length shapes AI’s energy footprint

Katja SpanzFrancisco RíosRobert Praas

December 19, 2025 - 2 min read

Artificial Intelligence augments and automates parts of our everyday life, but it remains surprisingly hard to grasp how much energy a single prompt or generated image actually uses. The Hugging Face AI Energy Score leaderboard addresses this by showing the GPU energy consumption (in kWh per 1,000 queries in the visualisation above), across different AI tasks and model classes, based on which GPU type they use and range from standard text generation models and reasoning models to image generation, text and image classification, captioning and summarisation.

Across the leaderboard, reasoning models typically generate more tokens per answer, producing step-by-step explanations. So even when these models are technically efficient per token, they consume significantly more energy per query. By contrast, standard text-generation models often produce shorter, more direct responses, which keeps total tokens, thus energy use, much lower for the same task. The variation in energy consumption between the two model types is striking. At the high end, reasoning model class B, Exaone-4.0-32B by LGAI-Exaone, uses 18.99 kWh per 1,000 queries which is more than charging a smartphone 700 times (15 kWh) or keeping a 10W LED light bulb continuously on for about 6 weeks (10 kWh). Gpt-oss-120b by OpenAI, within the reasoning model class C, consumes 8.50 kWh per 1,000 queries, enough to boil a full household kettle (1.5L) approximately 70 times.

At the opposite end of the spectrum, some models demonstrate remarkable efficiency. The “lowest ranking” model for image classification (resnet-18 by Microsoft) uses just 0.0026 kWh per 1,000 queries and lowest ranking text generation in model class A (distilgpt2 by distilbert) consumes only 0.00131 kWh per 1,000 queries, equivalent to charging a smartphone to about 6,5% or about one minute of a standard kettle boil (0.12kWh). These efficient models show that not all AI tasks carry a heavy energy burden.

However, comparing models based solely on total energy per request is misleading without taking into account output length and token count. Any serious discussion of “efficient” or “green” AI needs to consider not just which model is used, but how it is used and for how long. Increasingly, it also matters how intelligently queries are routed. Emerging router architectures can allocate compute more selectively, directing simple tasks to lighter models and reserving heavy reasoning systems for genuinely difficult problems. This kind of routing tool could be a practical way forward making AI more energy-efficient.

Scan the QR code to view this story on your mobile device.

AI modelenergy

How output length shapes AI’s energy footprint

Related Stories