1. Detecting perspective shifts in multi-agent systems
Key points:
This paper presents a behavioural analysis method that can identify when AI agents change their internal viewpoint or strategy during multi-agent interactions, even when the agents operate as opaque black-box systems. The authors have developed a statistical framework, called “Temporal Data Kernel Perspective Space” (TDKPS), that infers these changes purely from an agent’s observable outputs, thus detecting subtle transitions in behaviour. Experiments demonstrate that this technique reliably detects latent 'perspective shifts' between generative agents and remains effective under minimal assumptions, rendering it applicable to closed-source, proprietary, or fully opaque models. The study demonstrates that the method can reveal changes in strategy, alignment drift or coordination patterns as they emerge, providing a practical diagnostic tool for analysing, auditing and governing complex multi-agent systems.
Authors: Eric Bridgeford, Hayden Helm
2. Chameleon: Adaptive adversarial agents for scaling-based visual prompt injection in multimodal AI systems
Key points:
This paper highlights a structural vulnerability in multimodal AI systems: routine image downscaling can inadvertently create an entry point for visual prompt injection attacks. To investigate this issue, the authors present Chameleon: an adaptive adversarial agent that learns to manipulate images in such a way that the hidden instructions only become apparent once the model’s pre-processing has been applied. By iteratively refining perturbations based on the model’s behaviour, Chameleon achieves far higher attack reliability than conventional static methods, with success rates above 80% across varied scaling setups. This dramatically degrades the performance of agentic pipelines dependent on visual reasoning. The fact that it can operate effectively with minimal insight into the target model shows that preprocessing steps, which are often considered harmless infrastructure, can have a significant impact on a system’s vulnerability to manipulation.
Authors: M Zeeshan, Saud Satti
3. STELLA: guiding LLMs for time series forecasting with semantic abstractions
Key points:
This paper introduces STELLA, a new method that improves the ability of LLMs to predict the future values of time series. STELLA achieves this by transforming raw numeric data into higher-level semantic components before feeding them to the model. STELLA splits the series into interpretable elements, such as long-term trends, seasonal cycles and residual fluctuations. It then converts them into Hierarchical Semantic Anchors that provide both global context and instance-specific cues. Extensive testing on eight benchmark datasets revealed that STELLA delivers superior forecast accuracy for both short- and long-term horizons compared to prior methods. It also generalises well under zero- or few-shot conditions. These gains persist when other evaluation metrics and settings are employed.
Authors: Junjie Fan, Hongye Zhao, Linduo Wei, Jiayu Rao, Guijia Li, Jiaxin Yuan, Wenqi Xu, Yong Qi
4. GovBench: benchmarking LLM agents for real-world data governance workflows
Key points:
This paper introduces GovBench, a benchmark of 150 tasks based on real operational governance workflows, which is used to evaluate the ability of LLM agents to perform quality checks, ensure policy compliance and carry out multi-step data operations. The benchmark reveals limitations in existing general-purpose agents, which frequently struggle with complex sequences and demonstrate poor self-correction capabilities. To address these limitations, the authors have designed DataGovAgent, an architecture that separates planning, execution, retrieval support and verification. This tailored approach notably increases the average performance from around 40% to almost 55%, while also reducing the number of debugging cycles required. This demonstrates that specialised, modular designs can meaningfully enhance the reliability and efficiency of governance-related tasks, offering a clearer path towards dependable and scalable automation in data governance.
Authors: Zhou Liu, Zhaoyang Han, Guochen Yan, Hao Liang, Bohan Zeng, Xing Chen, Yuanfeng Song, Wentao Zhang
5. Introducing Anthropic Interviewer: what 1,250 professionals told us about working with AI
Key points:
In order to gain a better understanding of how AI tools are reshaping working life, Anthropic developed Interviewer, an AI-powered system that conducts large-scale, structured interviews. In their pilot study, Anthropic used Interviewer to interview 1,250 professionals from a variety of sectors (e.g. general workforce, creative professionals, scientists) about their use of AI, its impact on their workflow and identity and their views on its future role. The results suggest that, while many users embrace AI for routine support, they still value human-defining tasks. Respondents generally reported time savings and envisioned a future where AI handles routine tasks while humans focus on higher-level work. However, the interviews also revealed ambivalence: some fear a loss of human connection or a decline in skills. Among creatives, for instance, AI has boosted productivity despite social stigma, while many scientists remain cautious about entrusting core research tasks to AI.
Authors: Anthropic