Back to Stories

AI and human cognition: Week 18 Papers



April 30, 2026 - 2 min read

The tools used to assess AI capability and the tools used to assess human cognitive decline under AI exposure share a common flaw, neither is measuring quite what it claims to. A February 2025 position paper from researchers across MIT, Harvard, Cambridge and Princeton made this point directly about the AI side of the ledger. Ying, Collins et al. (2025) examined ten widely used cognitive benchmarks and found systematic problems across all of them. Labels were not validated against real human responses, human response variability was flattened into single correct answers, and tasks were ecologically invalid in ways that made the claims about human-level AI performance difficult to interpret.

On the human side, empirical literature is growing fast. Michael Gerlich (2025) surveyed 666 participants across age groups and educational backgrounds, showing that frequent AI tool use predicts lower critical thinking scores, with cognitive offloading as the mediating factor. Younger participants exhibited the highest AI dependence and the lowest critical thinking outcomes, though higher educational attainment served as a partial buffer. The directionality is consistent with a parallel CHI 2025 survey by Lee, Sarkar, Tankelevitch et al. (2025) of 319 knowledge workers. Higher confidence in GenAI predicted less perceived cognitive effort across four of six task categories, including analysis and evaluation. The shift they describe is structural, with cognitive labour moving from information gathering and problem-solving toward verification and oversight.

The neurophysiological dimension sharpens the picture. Kosmyna, Hauptmann et al. (2025) at MIT Media Lab divided 54 participants into LLM-assisted, search-engine-assisted, and unassisted writing groups. EEG connectivity data showed that LLM users exhibited the weakest neural engagement across three sessions; unassisted writers showed the strongest and most distributed networks.

What the literature still lacks is a validated psychometric instrument capable of tracking the change longitudinally. A 2026 preprint by Netanel Eliav (2026) formalises the structural hypothesis underlying much of this literature, proposing a Delegation Feedback Loop. As AI systems increase in presence and scope, the threshold for human delegation falls, reducing cognitive practice overall. The instrument for measuring that loop remains the field's most pressing unanswered question.



Scan the QR code to view this story on your mobile device.


Cognitive offloadingCritical thinkingBenchmarkingCognitive debtDelegation