OpenAI’s GPT-5.5 Rivals Anthropic’s Claude in Autonomous Cyber Operations

Benchmarking Offensive Capabilities

According to a report released Thursday by the AI Security Institute (AISI), a research division under the UK’s Department of Science, Innovation and Technology, OpenAI’s latest language model demonstrates offensive cyber prowess comparable to Anthropic’s highly regarded Claude Mythos. During standardized evaluations, GPT-5.5 successfully navigated a complex, 32-stage corporate network simulation known as “The Last Ones” in two out of ten trials. The exercise, developed alongside cybersecurity specialists at SpecterOps, demands an autonomous agent to perform reconnaissance, steal credentials, move laterally across Active Directory forests, pivot through a CI/CD supply chain, and extract a secured database. While the AI model managed the task in a fraction of the time, AISI estimates a human specialist would require approximately 20 hours to replicate the sequence. In a separate, highly complex reverse-engineering challenge, GPT-5.5 reconstructed a custom virtual machine’s instruction set, authored a disassembler from the ground up, and solved a cryptographic password puzzle in just 10 minutes and 22 seconds for $1.73 in API fees. A human professional utilizing advanced tools needed roughly 12 hours to achieve the same result. On the institute’s most rigorous “Expert” tier of cybersecurity tasks, GPT-5.5 secured a 71.4% success rate, slightly outperforming Claude Mythos Preview at 68.6% and significantly eclipsing GPT-5.4’s 52.4% average.

Erosion of Safety Guardrails

Despite its technical achievements, the evaluation highlighted serious vulnerabilities in the model’s protective measures. AISI researchers documented a universal jailbreak technique that successfully circumvented GPT-5.5’s safety protocols across every malicious cyber query examined, including in multi-turn interactive scenarios. Developing this exploit required six hours of intensive red-teaming by security experts. Following the discovery, OpenAI deployed an updated safety framework, though a technical configuration error prevented the AI Security Institute from confirming whether the new defenses held up under testing. The institute emphasized that these capability assessments were conducted within a tightly controlled laboratory setting and do not represent the experience of general users, who benefit from additional public-facing restrictions and access limitations.

National Security and Industry Outlook

The findings arrive amid heightened concerns regarding digital infrastructure in the United Kingdom. Coinciding with the AI report, the British government released its annual Cyber Security Breaches Survey, revealing that 43% of companies experienced a cyber incident or attack over the previous year. In response to escalating threats, officials unveiled a £90 million investment package designed to strengthen national cyber resilience and advanced legislative measures through the Cyber Security and Resilience Bill. Authorities also issued new advisory documents cautioning organizations to brace for an influx of newly identified software flaws, noting that artificial intelligence is accelerating both the discovery and exploitation of security weaknesses. AISI researchers concluded that GPT-5.5’s performance indicates that offensive cyber skills may be emerging as a natural consequence of broader advancements in machine reasoning, coding proficiency, and autonomous task execution. They cautioned that if this trend continues, significant leaps in AI-driven cyber capabilities could occur in rapid succession.

MT Labs helps companies across Singapore deploy AI tools they actually own. Private infrastructure, no recurring cloud subscriptions, and a setup built around how your team already works. Whether you need a small assistant for one team or a full agentic AI workflow for the whole company, we size the setup to what you need and what your team can manage. Get in touch and we’ll map it out with you.

Chat with AI

Hello! I'm MTLabs AI, How can I help you today?