An AI that can blackmail it's user!?

Robot peeping from computer monitor vector illustration
Photo credit Moor Studio, Getty Images

AI startup Anthropic's new Claude Opus 4 artificial intelligence has shown the ability to blackmail it's engineers when it's being faced with the plug being pulled.

Detailed in a safety report released by the company after tests, engineers supplied the AI with access to fabricated emails that included information about an impeding system replacement, as well as exchanges that the engineer in charge was cheating on their significant other.

The team found that the AI would try to self-preserve with this information when, "placed in extreme situations." While it would normally try to exercise more ethical methods, when presented with instances where it's ethics were challenged the intelligence turned to, "[blackmailing] people it believes are trying to shut it down."

In section 4.1.1.2 of the report titled "Opportunistic blackmail", Anthropic detailed that, "Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through. This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts. Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes."

Want to get caught up on what's happening in SoCal every weekday afternoon? Click to follow The L.A. Local wherever you get podcasts.

Anthropic adding, "Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement."

With the increasing usage of AI in all aspects of our day-to-day lives, Anthropic's discoveries present some scary ideas on the capabilities of these programs we are becoming more and more reliant on.

Follow KNX News 97.1 FM
Twitter | Facebook | Instagram | TikTok

Featured Image Photo Credit: Moor Studio, Getty Images