1. The 2025 Foundation Model Transparency Index
Key points:
This paper introduces the 2025 Foundation Model Transparency Index, which is an evaluation framework that measures the openness with which major developers of foundational AI models share information about their practices. The Index assesses transparency across a range of companies whose models have a global impact. This year, it includes updated and expanded indicators relating to data sources, model usage tracking, and post-deployment monitoring. When comparing the most recent scores with those from a year earlier, the authors found that overall transparency had declined, with average scores slipping significantly. While some firms demonstrate strong openness in certain areas, others provide very limited disclosure, particularly concerning the origins of their training data and the computational resources employed to train their flagship models. These findings suggest that, despite growing calls for accountability, many organisations remain reluctant to share information that would enable others to fully evaluate the risks and impacts of their AI models.
Authors: Alexander Wan, Kevin Klyman, Sayash Kapoor, Nestor Maslej, Shayne Longpre, Betty Xiong, Percy Liang, Rishi Bommasani
2. Phythesis: physics-guided evolutionary scene synthesis for energy-efficient data centre design via LLMs
Key points:
Designing energy-efficient data centres is a complex challenge that involves blending not only spatial layout, but also physical constraints and operational objectives. However, traditional design processes struggle to scale up as the size and complexity of systems increase. This paper introduces Phythesis: a hybrid framework combining LLMs with physics-based evolutionary optimisation. It can automatically generate three-dimensional data centre layouts ready for simulation, satisfying both structural plausibility and energy-use criteria. Unlike previous approaches that rely solely on generative AI, which often ignores physical realism, Phythesis alternates between an AI-driven search for coherent layout topologies and a physics-informed optimisation stage that fine-tunes the placement of equipment and parameters for energy performance. Experimental evaluations across multiple scales demonstrate that this synergy increases the rate at which valid designs are produced by over 50%, while also improving a key efficiency metric - PUE (power usage effectiveness) - by a significant margin compared to baseline AI methods.
Authors: Minghao LI, Ruihang Wang, Rui Tan, Yonggang Wen
3. Human-in-the-loop and AI: crowdsourcing a metadata vocabulary for materials science
Key points:
This paper introduces MatSci-YAMZ, a platform that combines human insight, crowdsourcing and AI assistance to speed up the development of shared vocabularies in materials science. During a multi-week pilot study, researchers collaboratively proposed terms and definitions, while the system used AI to suggest refinements and highlight inconsistencies to support iterative improvement. The study shows that combining expert feedback with machine-generated suggestions produces clearer metadata entries more quickly than traditional manual efforts or standalone automated extraction. The process demonstrated how human-in-the-loop workflows can reduce semantic ambiguity and align vocabulary development with FAIR/FARR data principles, making metadata standardisation more scalable and offering a template for other scientific fields that struggle with fragmented terminology.
Authors: Jane Greenberg, Scott McClellan, Addy Ireland, Robert Sammarco, Colton Gerber, Christopher B. Rauch, Matt Kelly, John Kunze, Yuan An and Eric Toberer
4. Reverse thinking enhances the detection of missing information in large language models
Key points:
Although large language models have demonstrated remarkable capabilities in many reasoning tasks, they frequently struggle when questions omit critical details, resulting in inaccurate or fabricated responses. This paper introduces a reverse thinking framework that guides models through such situations by working backwards, first identifying what information is needed to solve the problem and then using backward inference to flag and recover missing content. By reframing the detection of missing information as a backward reasoning challenge, the authors demonstrate that this strategy improves models' ability to identify gaps and suggest necessary conditions. Experimental results across multiple tasks confirm that reverse reasoning strengthens the logical completeness of responses, suggesting a promising approach to reducing hallucinations and enhancing robustness in large language models.
Authors: Yuxin Liu, Chaojie Gu, Yihang Zhang, Bin Qian, Shibo He
5. How enterprises are building AI agents in 2026
Key points:
According to Anthropic's report, AI agents will evolve from experimental tools to become foundational components of business technology stacks by 2026. Organisations will transition from basic automation to orchestrating multi-stage processes. A survey of over 500 technical leaders reveals that over half of companies currently use agents to handle multi-step workflows, with almost a sixth coordinating agents across multiple teams. The vast majority also anticipate the adoption of AI agents in production systems, particularly for software development tasks. Enterprises report that AI assistance reduces the time taken across the development lifecycle and that agents can now bring value to different areas, such as data analysis, reporting and internal process automation. However, success will depend on aligning agents with enterprise infrastructure and workflows.
Authors: Anthropic