The AI That Found A Bug In The World’s Most Audited Code

Key Takeaways

AI agents, like OpenAI's Aardvark, are now capable of finding zero-day vulnerabilities in heavily audited software.
Early AI models struggled with security tasks, but GPT-4 demonstrated significant breakthroughs in analyzing complex data like cybercriminal chat logs.
Aardvark mimics human security researchers by reading code, writing tests, and proposing patches for vulnerabilities.
AI is augmenting the cybersecurity workforce, helping to address a significant talent shortage rather than replacing human analysts.
AI tools offer a defensive advantage, particularly for under-resourced open-source projects against sophisticated nation-state attackers.

OpenAI's AI agent, Aardvark, led by Matt Knight, discovered a memory corruption bug in OpenSSH.
OpenSSH is a critical piece of infrastructure software that has undergone extensive auditing for decades, making this discovery significant.
This advancement is viewed as a critical step in providing a defensive advantage and scaling security intelligence to address the estimated 3.5 million unfilled cybersecurity jobs.

In 2020, earlier AI models like GPT-3 could not effectively analyze security logs or code for vulnerabilities, often fabricating results.
During GPT-4's training in summer 2022, a security team tested a mid-training snapshot that successfully analyzed security logs, identifying actions like opening a reverse shell.
A second breakthrough involved GPT-4 analyzing 60,000 messages from a dissolved Russian cybercriminal chat group, written in slang, identifying targets including civilian infrastructure.

Aardvark is an AI agent designed for security research that operates like a human researcher by reading and analyzing code, writing tests, and exploring the codebase.
Its process involves assessing security objectives, modeling the codebase and its security properties, searching for vulnerabilities, and validating them in a secure sandbox.
Aardvark then uses tools like Codex to generate patches for identified vulnerabilities, which are subsequently rescanned by the agent.

Aardvark expanded to scan open-source projects, demonstrating its effectiveness across various languages and stacks, finding memory corruption bugs in highly audited C code.
The tool successfully identified novel zero-day vulnerabilities and generated corresponding patches using generative AI, a key innovation transforming software security practices.
This capability suggests AI models are moving beyond pattern matching to actual novel discovery of bugs in core infrastructure code, a crucial feature for AI code generation products.
Developers found Aardvark's contextual explanations of vulnerabilities and suggested fixes valuable, indicating the project was on the right track.

The discussion addresses the impact of AI on cybersecurity jobs, concluding that AI is currently augmenting, not replacing, human analysts due to a significant talent shortage.
AI tools are seen as enhancing the efficiency of existing cybersecurity professionals and potentially lowering the barrier to entry for new talent.
Aardvark is presented as a proactive, continuous auditing tool that monitors code changes in near real-time, akin to a senior AppSec engineer, scaling security expertise for developers.
While acknowledging the 'cat and mouse' evolution of offensive cyber operations, the conversation emphasizes AI's potential to empower defenders in identifying vulnerabilities.

The XZ Utils backdoor incident is highlighted as an example of the vulnerability of under-resourced open-source communities to sophisticated state-sponsored or criminal actors.
AI tools like Aardvark can help address this security gap in open source by scaling security intelligence, providing developers with resources to combat advanced threats.
OpenAI is launching a private beta for Aardvark for open-source maintainers to solicit feedback and ensure the tool meets their specific needs.
The guest expresses a desire for Aardvark to democratize access to advanced security tools for smaller organizations and individuals, reflecting on his own career enabled by open source.