New benchmark tool aims to strengthen AI agent security
Check Point and Lakera have released the backbone breaker benchmark (b3), an open-source security evaluation tool focused on the security of large language models (LLMs) within AI agents.
The backbone breaker benchmark aims to help developers and model providers evaluate and improve the resilience of LLMs that power AI agents. Developed with input from researchers from the UK AI Security Institute, b3 is specifically designed to address persisting vulnerabilities and the unique security challenges presented by advanced AI-driven systems.
Security evaluation approach
The b3 benchmark introduces the concept of "threat snapshots". Rather than simulating the entire operational workflow of an AI agent, threat snapshots focus on critical junctures where vulnerabilities are most likely to emerge. This targeted evaluation allows for more efficient and practical security testing without the need for complex, end-to-end agent simulations.
"Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open to the world, we hope to equip developers and model providers with a realistic way to measure, and improve, their security posture," said Mateo Rojas-Carulla, Co-Founder and Chief Scientist at Lakera.
"We built the b3 benchmark because today's AI agents are only as secure as the LLMs that power them." He added, "Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. We hope to equip developers and model providers with a realistic way to measure, and improve, their security posture, " said Rojas-Carulla.
Features of the benchmark
The benchmark consists of ten representative "threat snapshots" and utilises a dataset comprising 19,433 crowdsourced adversarial attacks. These attacks were gathered from the gamified red teaming platform, Gandalf: Agent Breaker. The evaluation process assesses vulnerabilities to various real-world threats, such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorised tool activation.
Initial findings from testing with 31 popular LLMs have led to several key observations. Enhanced reasoning abilities in models have been shown to improve security, regardless of the model's size. Furthermore, closed-source models generally outperformed their open-weight counterparts, although the leading open models are closing the security gap.
Crowdsourced adversarial dataset
Gandalf: Agent Breaker, the red teaming platform used to create the dataset, is a hacking simulator that challenges users to uncover vulnerabilities and exploit AI agents across a series of realistic applications. The ten GenAI applications within the game each present varying difficulty levels and defense mechanisms, spanning a range of skills, from prompt manipulation to file handling and code injection.
The platform is built on a community-driven approach, where players contribute by identifying new attack surfaces and strategies for defeating AI agent defences. Originally developed during an internal Lakera hackathon, Gandalf has since grown into one of the largest red teaming communities, contributing more than 80 million data points and raising awareness about potential weaknesses in generative AI applications.