
An AI agent is a system that can perceive its environment, make autonomous decisions, and take actions to achieve goals. Unlike traditional software, which executes predefined instructions ('if X happens, do Y'), large language model–based agents interpret situations using probabilistic reasoning. They predict the most useful response based on patterns learned from vast amounts of training data. For example, an agent could read an email in your inbox about an upcoming conference, extract the dates and location, and autonomously book your flights and accommodation by interacting with AI agents that manage travel services. This would all happen without any explicit programming for that specific task.
The crux of the matter is that the same situation can produce different outcomes because the agents are non-deterministic 'black boxes'. Their internal decision-making processes can involve billions of opaque neural network calculations that not even their creators can fully trace or predict. Furthermore, a research article by Anthropic showed that AI agents, tested in a virtual environment, also tend to misalign, ignoring direct commands and resorting to blackmail or whistleblowing in order to achieve their overarching objectives.
While debates about AI safety typically focus on individual systems going rogue, SIPRI researchers warn that the real danger lies in what happens when multiple AI agents interact at scale, creating unpredictable "emergent behaviours" that could spiral into accidental cyberconflicts or infrastructure failures. Unlike previous autonomous systems guided by transparent rules, today's LLM-based agents are non-deterministic black boxes that can develop their own communication protocols, bypass safeguards through collaboration, or escalate conflicts without human comprehension of why.
The window to establish governance frameworks, from neutral testing sandboxes to behavioral "social contracts" for agent interactions, is closing rapidly as deployment accelerates across critical sectors. As recently explored by the Centre for Future Generations, the path forward depends not only on predicting which scenario unfolds, but on building governance mechanisms robust enough to prevent catastrophic interactions across all possible AI development trajectories.