Back to Stories

Moonshot AI shows open source models keep improving



Robert PraasRamón Sánchez
November 7, 2025 - 1 min read

Moonshot AI's newest model, Kimi-K2-Thinking, achieves impressive results on various benchmarks. The model scores close to GPT-5 on Humanity's Last Exam and significantly higher than Moonshot's previous release, Kimi-K2-Instruct.

Humanity's Last Exam was created by Scale AI and the Center for AI Safety, and analyses technical knowledge and reasoning through more than 2,500 difficult questions. It was created to reduce the saturation of similar benchmarks.

Comparing models can be difficult as they tend to perform differently on the large variety of possible tasks. Humanity's Last Exam is one test, Moonshot self-reported the score for K2-Thinking and the results for other benchmarks differ. Nevertheless, its statistics make it an exciting new release.


Scan the QR code to view this story on your mobile device.


Moonshot AIOpen SourceHumanity's Last Exam