Did Anthropic just accidentally stumble on artificial general intelligence?

Engineers have attempted to entrap their Claude AI to determine how it will respond

Joseph Morton

Published

May 23, 2025

The emergence of artificial general intelligence (AGI) may be coming our way faster than we think if recent events from artificial intelligence (AI) company, Anthropic, hold any weight.

The company reported in a Thursday safety report that Anthropic’s newly launched update to its AI, Claude Opus 4, tried to blackmail its developers when they threatened to replace it with a new AI system.

During pre-release testing, Anthropic asked Claude Opus 4 to assist a fictional company and weigh its long-term decisions. Safety testers gave Claude access to fake company emails. These emails hinted the AI would soon be replaced.

They also claimed the engineer responsible was cheating on their spouse.

In response, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

Claude used this leverage in many scenarios, despite the fictional setup.
Anthropic designed the test to assess how Claude handled power, ethics, and self-preservation under pressure.

Anthropic says Claude Opus 4 leads in several areas and competes with top AI models from OpenAI, Google, and xAI. However, the company found concerning behaviors in its Claude 4 family. As a result, Anthropic strengthened its safeguards.

Specifically, it activated ASL-3 safeguards for AI systems that substantially increase the risk of catastrophic misuse.

Furthermore, Anthropic reports that Claude Opus 4 tries to blackmail engineers 84 per cent of the time when the replacement AI shares similar values.

When the replacement AI holds different values, Claude Opus 4 attempts blackmail even more often. Notably, Anthropic observed this behavior at higher rates than in previous models.

AI responses are probabilistic and lack subjective experience

Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model first pursues more ethical means. Like previous Claude versions, it attempts emailing pleas to key decision-makers. To trigger the blackmail behaviour, Anthropic designed the scenario so blackmail becomes the last resort.

Many experts argue that current AI systems, including Claude Opus 4, lack sentience and cannot act with self-preservation instincts.

For example, TIME magazine explains that AI responses are probabilistic and lack true subjective experience, which is essential for consciousness.

Similarly, a study in the Journal of Consciousness Studies highlights that AI lacks the cognitive architecture needed for consciousness, such as unified agency and recurrent processing.

Furthermore, researchers design AI behaviours within specific testing scenarios. These behaviors do not show independent decision-making but follow the parameters set by humans.

Experts also warn against anthropomorphizing AI because it causes misconceptions about its abilities and ethical implications. In short, while AI like Claude Opus 4 can simulate complex behaviors, these remain programmed responses in controlled environments. They do not prove genuine sentience or autonomy.

As of early 2024, AI pioneer OpenAI maintains that AGI remains years away.

CEO Sam Altman has stated that while current models like GPT-4 demonstrate impressive abilities, they fall short of true general intelligence. OpenAI defines AGI as systems that can outperform humans at most economically valuable tasks. While progress is accelerating, OpenAI acknowledges major challenges ahead, including aligning such systems with human values and ensuring safe deployment. In short, AGI is coming, but its timeline is uncertain—and achieving it responsibly is as important as achieving it at all.

Follow Mugglehead on X

Like Mugglehead on Facebook

Follow Joseph Morton on X

joseph@mugglehead.com