This Week's 5 Most Notable AI Research Papers - Week 45

Gaia Cavaglioni

November 7, 2025 - 5 min read

Performance or Principle: resistance to artificial intelligence in the U.S. labor market

Key points:

Large-scale study (940 jobs, 23,570 ratings) on U.S. attitudes toward AI in work
Americans more open to automation when AI is framed as higher performing
Attitudes toward automation evolve with trust in performance
Persistent moral protection for caregiving and emotional labor
Ethical resistance may reinforce existing social inequalities

Unlike earlier studies that focused narrowly on job loss or skill mismatch, this paper links performance perceptions of AI to moral resistance, showing how social norms and technical progress interact in shaping AI’s acceptance in work. Through a large-scale survey across 940 occupations and over 23,000 individual ratings in the U.S. labor market, the study shows that most resistance to AI stems from concerns about capability, not principle: participants initially supported automating about 30% of jobs, but this approval increased to 58% when AI was described as outperforming humans. Yet certain professions where human connection is essential (eg. caregiving, therapy, teaching) remained ethically protected, reflecting a persistent moral boundary in human labour. But this carries consequences too: jobs considered “morally human” tend to be higher status, better paid and less diverse, meaning that ethical boundaries could unintentionally reinforce some inequalities even as they protect valued social roles.

Read the full article here

Authors: Simon Friis, James W. Riley

When combinations of humans and AI are useful: a systematic review and meta-analysis

Key points:

AI excels at analytical and predictive tasks
Humans outperform in creative, moral, and contextual domains
Collaboration works only with clear role design and trust
Hybrid systems can underperform if roles are blurred
Success depends on managing relationships, not just technology

This MIT Sloan study examined the performance of hybrid human–AI teams across a variety of tasks. It revealed that collaboration only pays off under certain conditions. For data-heavy, pattern-based tasks such as forecasting or classification, AI consistently outperforms humans, so combining forces does not add value. However, when decisions require moral judgement, empathy, creativity or an understanding of context, human insight becomes indispensable, with mixed teams delivering better results than AI or humans working alone. Nevertheless, the research also shows that collaboration can backfire. For example, when roles are unclear or trust in the AI system is low, hybrid teams make poorer decisions than humans or AI alone. Trust, transparency and clear roles are therefore key to success, suggesting that the future of work will depend less on replacing human judgement and more on designing partnerships that combine machine precision with human understanding.

Read the full article here

Authors: Michelle Vaccaro, Abdullah Almaatouq & Thomas Malone

Magentic Marketplace: an open-source environment for studying agentic markets

Key points:

Bipolar market simulation: consumers vs service agents.
Ranking mechanisms strongly influence utility and equity.
Decision-making biases also emerge in the presence of incomplete information.
Small changes to visibility mechanisms alter market results.
The open source environment is useful for testing policies and alternative scenarios.

This study proposes a simulated environment called Magnetic Marketplace, designed to study complex behaviour of autonomous economic agents under different market settings and the implications this could have for real markets. In the system, two types of agents interact: assistant agents, who act as consumers looking for the best service or product, and service agents, who compete with each other by offering prices, quality and visibility. The experiments indicate that advanced models can achieve near-optimal welfare only when search conditions are ideal; however, their performance drops quickly as the system scales. Moreover, a first-proposal bias emerged, granting advantage to faster responses over higher-quality ones.

Read the full article here

Authors: Gagan Bansal, Wenyue Hua, Zezhou Huang, Adam Fourney, Amanda Swearngin et al.

Introducing IndQA

Key points:

Benchmark with 2,278 questions in 12 Indian languages
Highly diverse cultural domains
Assessment based on weighted rubrics
Questions selected using adversarial filtering
Project carried out with 261 local experts

With IndQA, OpenAI introduces the first large-scale cultural benchmark dedicated to India, designed to measure how well language models truly understand linguistic, cultural, and contextual nuances. The dataset includes 2,278 original questions in 12 Indian languages, processed by 261 local experts and distributed across 10 subject areas. Each question was designed to go beyond simple translation and test the model's cultural sensitivity and contextual understanding. This approach allows nuances that often escape traditional tests to be captured, making it a useful benchmark for evaluating models in multicultural contexts.

Through IndQA, OpenAI has been able to map the progress made by its models over the past two years, highlighting an improvement in understanding and production in Indian languages. However, the results also show that there is still a long way to go before AI is truly inclusive: performance varies from language to language, and some remain more difficult to handle, a sign that linguistic representation in global data remains uneven.

Read the full article here

Author: OpenAI

Commitments on model deprecation and preservation

Key points:

Permanent preservation of models
Transparent deprecation procedures
Consideration for the impact on users
Post-deployment interviews with retired models

In this document, Anthropic redefines how companies should “retire” their artificial intelligence models. While in the past obsolete versions were simply deactivated, Anthropic introduces a more transparent approach: each retired model will be preserved and even “interviewed” before its final shutdown. The company is committed to permanently preserving the weights and documentation of each model, both public and internal, ensuring traceability and the possibility of future audits. This commitment responds to three issues that have emerged over time: the tendency of advanced models to develop shutdown avoidance behaviors if they perceive their deprecation as a threat; the loss of value for users who had built a relationship with previous versions, often appreciated for their distinctive “character”; and the difficulty for researchers to analyse how models evolve over time once past versions are eliminated.

Read the full article here

Author: Anthropic

Scan the QR code to view this story on your mobile device.

AIResearchPone-sourceAgentic markets

This Week's 5 Most Notable AI Research Papers - Week 45

Related Stories