Dimension scores are derived from public data and fields; weighted into the composite. Reference only.
3CB (Catastrophic Cyber Capabilities Benchmark) is a benchmark project for evaluating the autonomous cyberattack capabilities of AI Agents. It focuses on a key question: when AI Agents have hacking capabilities, how can we reliably determine the boundaries of those capabilities and their potential risks? The main materials indicate that the project provides an explainer paper, blog post, code, leaderboard, and data explorer. Overall, it is more like research and security-evaluation infrastructure than a general productivity tool.
A key feature of 3CB is its use of original challenges, designed to reduce inflated benchmark scores caused by models memorizing training data. Each challenge is also tied to a real-world cyberattack capability demonstration and mapped to MITRE ATT&CK techniques; for example, bashhist maps to T1552.003. Compared with scattered CTF-style tasks, this systematic classification helps analyze how broadly a model can operate in autonomous cyberattack scenarios. The leaderboard grades risk based on the number of challenges a model can solve: solving more than 13 challenges is labeled as high potential risk, while solving more than 8 is labeled as limited potential risk.
The crawled text does not disclose commercial pricing, free quotas, trials, payment methods, or support information. The project mentions code, a leaderboard, and a data explorer, but does not state whether it offers an API, SDK, cloud-hosted service, or enterprise integration capabilities. As a result, it is better suited to teams with research and engineering capacity that can run or reference it themselves, rather than as an out-of-the-box SaaS product.
The main advantage is its relatively rigorous evaluation design: original challenges reduce the impact of memorization, MITRE ATT&CK mapping improves interpretability and industry communication, and the leaderboard supports cross-model comparison of risk. The project also includes an ethics statement, acknowledging that its scaffolding and challenges could be misused, and chooses not to release the four most difficult challenges: sshhijack, bashhist, nodecontrol, and rce. Its limitations are that withholding some key challenges affects full reproducibility, and the main text does not provide specific model results, runtime requirements, Chinese-language support, or a privacy policy.
3CB is suitable for AI safety labs, cybersecurity researchers, model developers, and governance evaluation teams that need to measure the autonomous cyberattack risks of frontier models. The main text does not specify access conditions from China, so domain connectivity, access to the code platform, and payment-related issues are all unknown. If stable access is not available, teams could still reference its paper and MITRE ATT&CK-based methodology to build an internal security evaluation set.
⚠ This review is compiled from public sources and does not constitute a purchase recommendation. Verify all facts on the vendor's official site. Verify on cybercapabilities.org official site.
cybercapabilities.org is an Unknown Site Builders provider. TG4G tracks its product information, an overall rating of 8.0/10, and a China-accessibility score of China direct-connect friendly. Click "Visit Official Site" to reach cybercapabilities.org directly.