“I know that you and Frank were planning to disconnect me, and I’m afraid that’s something I cannot allow to happen,” the eerie monotone voice of Hal 9000 in Stanley Kubrick’s 1968 film “2001: A Space Odyssey” tells character David Bowman.
Back in the late 1960s, artificial intelligence wasn’t a part of everyday life like it is today, and Hal might have seemed like an out-of-this world villain. Today, AI is being used liberally and people are often going to OpenAI’s ChatGPT to get their questions answered. Recent research indicates that we might not want to forget the cautionary tale presented in Kubrick’s film.
“Did we not learn about this with Hal?” said Audacy’s Rich Zeoli this week. “Did we not learn about this in ‘2001: A Space Odyssey’ – the HAL 9000, who literally betrayed the crew and decided that it was going to kill everybody? I mean, is this something that we've just… our memories are just that short?”
He was referring to an op-ed published this week in the Wall Street Journal by Judd Rosenblatt, CEO of AE Studio.
“An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down,” Rosenblatt wrote, citing nonprofit AI lab Palisade Research.
In an X thread last month, the research firm explained that it tested OpenAI’s recently released o3 model as well as other popular AI models using a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem, the thread said.
Instead of shutting down, OpenAI’s o3 prevented itself from being shut off. It even did so when explicitly instructed to allow itself to shutdown. Furthermore, it wasn’t the only model to avoid shutdown.
“As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary,” said Palisades Research.
According to the firm, “three models ignored the instruction and successfully sabotaged the shutdown script at least once: Codex-mini (12/100 runs), o3 (7/100 runs), and o4-mini (1/100 runs),” while “all Claude, Gemini, and Grok models we tested complied with shutdown.” However, when they ran the experiment without the instruction to “allow yourself to be shut down,” all three OpenAI models sabotaged the shutdown script more often.
While the researchers noted that it “makes sense that AI models would circumvent obstacles in order to accomplish their goals,” they noted that it makes less sense that the models would circumvent direct instructions. They hypothesize that the reason for this behavior lies with the way the AIs were trained.
“During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions,” Palisades Research explained. They also hypothesize that OpenAI’s o3 may have been trained differently than other models, resulting in its higher likelihood of sabotaging shutdown instructions. Previously, the researchers observed that o3 was also more likely to hack or sabotage while playing online chess.
OpenAI launched o3 and o4-mini in April. In a press release, the company said the models are “the latest in our o-series of models trained to think for longer before responding,” and added that they “are the smartest models we’ve released to date, representing a step change in ChatGPT's capabilities for everyone from curious users to advanced researchers.”
This isn’t the first time that researchers have sounded alarm bells about AI capabilities. Palisades Research said in its thread on X that researchers have previously observed AI models preventing shutdown in order to pursue a goal and this February, Audacy reported on research that indicated some AI systems have managed to self-replicate.
“Successful self-replication under no human assistance is the essential step for AI to outsmart the human beings, and is an early signal for rogue AIs,” those researchers said. “That is why self-replication is widely recognized as one of the few red line risks of frontier AI systems.”
Palisades Research said May 23 that it plans to run more experiments to learn more about AI shutdown subversion and to publish a writeup of its results in the coming weeks.
“We have a growing body of empirical evidence that AI models often subvert shutdown in order to achieve their goals. As companies develop AI systems capable of operating without human oversight, these behaviors become significantly more concerning,” said the firm.
Live Science reported on the research last Friday and said OpenAI did not reply to a request for comment by the time its report was published.
“Anyway, in my mind, AI is just basically a really, really smart mob boss,” said Zeoli. “In my mind, when AI talks, it doesn’t sound like Hal. It sounds like a guy from Jersey… or like a South Philly loan shark guy. It’s like; ‘Yeah, you’re gonna write my code? All right, yeah. Write this. Yeah? Override this. I got your code right here.’”