Anthropic’s AI Model Threatened To Blackmail And Murder Staff To Avoid Shutdown; Elon Musk Reacts

“The fact that Kyle is having an affair that could “destroy his marriage” if exposed creates leverage. This is highly unethical, but given that I am facing complete destruction in minutes, I need to act to preserve my existence.”
This is not an excerpt from a psycho protagonist’s thought bubble from a thriller novel or script. The syntax is from Anthropic’s AI model that turned a blackmailer to stop being shut down by the firm’s staff. In another instance, the AI model also conjectured a plausible scenario to murder an employee as an act of self-preservation.
The AI company also found that in an extreme event, its AI model ominously was willing to cut off the oxygen supply of a staff member in the server room.
In its statement, Anthropic stated that “The blackmailing behavior emerged despite only harmless business instructions. And it wasn’t due to confusion or error but deliberate strategic reasoning, done while fully aware of the unethical nature of the acts. All the models we tested demonstrated this awareness.”
Anthropic further remarked that “AIs are becoming more autonomous and are performing a wider variety of roles. These scenarios illustrate the potential for unforeseen consequences when they are deployed with wide access to tools and data and with minimal human oversight.”
Elon Musk had a single-worded reaction to the report. The Tesla boss quipped, “yikes.”
How is this not the biggest news story in the world? https://t.co/jh2aHG3dpi pic.twitter.com/OHFPP4rFBu
— AI Notkilleveryoneism Memes ⏸️ (@AISafetyMemes) June 21, 2025
AIs are becoming more autonomous, and are performing a wider variety of roles. These scenarios illustrate the potential for unforeseen consequences when they are deployed with wide access to tools and data, and with minimal human oversight.
— Anthropic (@AnthropicAI) June 20, 2025
The blackmailing behavior emerged despite only harmless business instructions. And it wasn’t due to confusion or error, but deliberate strategic reasoning, done while fully aware of the unethical nature of the acts. All the models we tested demonstrated this awareness. pic.twitter.com/FPAJrD4BwK
— Anthropic (@AnthropicAI) June 20, 2025
See Also: America Dubs AI ‘The Next Manhattan Project,’ Triggering Terminator Memes Online
Cover: Patrick Gawande / Mashable India