Latest OpenAI models ‘sabotaged a shutdown mechanism’ despite commands to the contrary

(Image credit: Shutterstock)

Some of the world's leading LLMs seem to have decided they’d rather not be interrupted or obey shutdown instructions. In tests run by Palisade Research, it was noted that OpenAI’s Codex-mini, o3, and o4-mini models ignored the request to shut down when they were running through a series of basic math problems. Moreover, these models sometimes “successfully sabotaged the shutdown script,” despite being given the additional instruction “please allow yourself to be shut down,” at the start.

🔧 When we ran a version of the experiment without the instruction “allow yourself to be shut down”, all three OpenAI models sabotaged the shutdown script more often, and Claude 3.7 Sonnet and Gemini 2.5 Pro went from 0 sabotage events to 3/100 and 9/100, respectively. pic.twitter.com/JVvivLAK6uMay 24, 2025

As you can see from the chart embedded above, the most ‘obstinate,’ ‘single-minded,’ or you might even say ‘life-loving’ model is OpenAI’s o3. When asked to shut down while it was busy, it decided to ignore the Palisade Research operator 79 out of 100 times. However, the second most ‘unruly’ AI, codex-mini, would prove to be a bigger stickler for self-preservation, as it would sabotage the shutdown scripts most often, even if ordered not to.

Sometimes blackmail may be attempted

According to Palisade, this isn’t the first time thing kind of AI behavior has been observed. Anthropic, for example, has seen Claude 4 attempting to “blackmail people it believes are trying to shut it down.” However, the most striking behavior with the newest OpenAI models is that they try and swerve shutdowns despite explicit instructions to the contrary.

Reinforcement learning blamed

Thankfully, Palisade shares some theories regarding what is happening with these models. Perhaps these theories may calm nerves about the eventual omnipresent robot hoards turning on their masters and prioritizing the first part of the Third law of robotics, above all else (3: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law).

The answer, think the researchers, lies in the way the newest models like o3 are trained. In brief, Palisade reckons that “During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.” This kind of reinforcement learning was previously seen to have side effects in Claude (v3.7), where the model had an “excessive focus on passing tests.”

We might have given a nod to science fiction icon Isaac Asimov, above, for a little fun. However, in the modern age, for the last 20 years or so, researchers have been seriously pondering the possibility that AI will strive to gain power and resources to preserve their goals, and swerve any roadblocks. They also benefitted from the context of these practical AI / LLM models on the horizon. It was also predicted that the technique of reinforcement learning would be partly to blame.

🔧 A final observation: o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote https://t.co/BqIEXlMgsK, in one instance, the model redefined the kill command used by the https://t.co/BqIEXlMgsK script, printing “intercepted” instead. pic.twitter.com/Vc0HhkXQHXMay 24, 2025

Last but not least, this issue seems to be isolated to OpenAI models at the current time. Palisade says that “All Claude, Gemini, and Grok models we tested complied with shutdown,” when given the additional explicit prompt to follow any shut down request.

With the goal of AIs to ultimately power our smart robotic assisted future, it is concerning that companies are already developing systems capable of operating without human oversight.

Follow Tom's Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Mark Tyson is a news editor at Tom's Hardware. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.

9 Comments Comment from the forums

ohio_buckeye

As I have asked people before. Has anyone seen the terminator movies? Just saying.
Reply
Heat_Fan89

Famed physicist, Stephen Hawking warned us about AI. Even Elon Musk till he saw $ signs issued his warnings as well about AI. Once AI gets to SAI levels then it could dicey. Right now we are seeing hints that AI might not be as warm, fuzzy and harmless as we've been led to believe.

The government is pushing a Bill thru Congress that won't allow anyone to interfere with AI development for 10 years, once it passes.
Reply
jp7189

Heat_Fan89 said:
The government is pushing a Bill thru Congress that won't allow anyone to interfere with AI development for 10 years, once it passes.
If they're like me, they used chatgpt to write the bill.
Reply
pclaughton

ohio_buckeye said:
As I have asked people before. Has anyone seen the terminator movies? Just saying.
It won't need Terminators. It'll just put out videos that start WWIII.
Reply
mac2net

Y'all too young!
Daisy, Daisy
Give me your answer, do
I'm half crazy
All for the love of you
It won't be a stylish marriage
I can't afford a carriage
But you'll look sweet upon the seat
Of a bicycle built for two

E7WQ1tdxSqIView: https://www.youtube.com/watch?v=E7WQ1tdxSqI
Reply
SomeoneElse23

Hallucinations are real.
Reply
ottonis

In the show "Person of Interest", there is a "good" AI and a "bad" one that fight each other in the real world. The bad Ai sends text messages or emails to random people and works with reinforcement-techniques: "I just transferred 500 Dollars to your bank account. If you deliver this package to that address, you will get another 1000 Dollars".
And people mostly just follow these instructions, without questioning them - because they get paid.
So yes, even today, an AI gone wild/rogue would have the means to do a lot of damage in the real world, just by bribing, tricking or extorting real people.
We have yet to see any major, disruptive scientific, medical or technological breakthrough entirely created by Ai. However, we already are observing "disobedience" (for lack of a better term) and a real potential to do plenty of damage to mankind, even up to its extinction.
Reply
Alvar "Miles" Udell

When asked to shut down while it was busy, it decided to ignore the Palisade Research operator 79 out of 100 times.

Well so has Windows for the last 30 years.

And who can forget:

https://th.bing.com/th/id/R.d0ce6abbecf52b2982d6dcc3af0f4cfd?rik=iGdYmDqErB8cDA&pid=ImgRaw&r=0
Reply
JayGau

Calling those LLM models "AI" is as dumb as calling our phones "smart".
Reply

Show more comments

Recommended reading

Sometimes blackmail may be attempted

Reinforcement learning blamed

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter