The system, dubbed “MDASH,” was introduced this week along with the revelation of 16 new vulnerabilities it uncovered in various versions of Windows, tech news website GeekWire reported Wednesday (May 13).
According to the report, MDASH was able to surpass Anthropic’s high-profile Mythos model on a “leading cybersecurity benchmark,” employing 100-plus specialized artificial intelligence (AI) agents working in tandem across multiple models to uncover real-world software vulnerabilities.
That metric is called the CyberGym benchmark, created by UC Berkeley researchers to determine how well AI systems can replicate real-world vulnerabilities across 1,507 tasks pulled from 188 open-source software projects. MDASH scored 88.45% on the test, with Mythos at 83.1% and OpenAI’s GPT-5.5 at 81.8%, the report said.
MDASH (“multi-model agentic scanning harness”) works by assigning different agents to do different jobs, the report added. Some scan code for potential vulnerabilities, while another group debates whether each discovery is real and exploitable. A final group puts together proof-of-concept attacks to confirm the bugs are real.
Mythos, on the other hand, is a single AI model operating inside an agent framework, GeekWire said. The startup has limited its release to a small group of companies—Microsoft included—known as “Project Glasswing.”
Advertisement: Scroll to Continue
In the wake of Mythos’ release, OpenAI has introduced Daybreak, its own agentic security offering that works with the company’s Codex coding tool.
“AI is already good and about to get super good at cybersecurity; we’d like to start working with as many companies as possible now to help them continuously secure themselves,” OpenAI CEO Sam Altman wrote on social media platform X earlier this week.
This week also saw reports that French AI startup Mistral was working with banks in Europe—which lack access to Mythos—on its own cybersecurity offering.
In related news, PYMNTS wrote earlier this week about “the industrialization of hacking” after researchers at Google reported they had uncovered what they believe is the first observed case of an AI-created zero-day exploit tied to a planned mass exploitation campaign.
The chief takeaway for businesses is that the “tool kit of hacking tasks” for cyberscammers, including reconnaissance, exploit adaptation, vulnerability discovery and social engineering, no longer need the same level of human expertise.
“On top of that, they are all becoming increasingly automatable,” PYMNTS added. “This first-principles shift matters because cybersecurity is ultimately an economic system. And economic systems change rapidly when the cost of production collapses.”