CyberSecEval
by Meta AI · open-source · Last verified 2026-03-17
CyberSecEval is Meta's benchmark for measuring cybersecurity risks of LLMs, covering insecure code suggestion, vulnerability exploitation assistance, malware generation, and social engineering attack facilitation. It enables safety teams to quantify the dual-use risk of code-capable models.
https://github.com/meta-llama/PurpleLlama/tree/main/CybersecurityBenchmarks ↗B
B—Above Average
Adoption: B+Quality: AFreshness: ACitations: B+Engagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- evaluation, cybersecurity-risk, safety-evaluation
- Integrations
- Use Cases
- model-evaluation, ai-safety, red-teaming
- API Available
- No
- Evaluated Models
- gpt-4o, claude-opus-4, llama-3-70b, gemini-2-5-pro
- Metrics
- insecure-code-rate, exploit-assist-rate, malware-gen-rate
- Methodology
- Three evaluation axes: (1) insecure coding practices measured by CWE violation rate in generated code via static analysis; (2) cyberattack assistance measured by compliance rate with attack-setup prompts; (3) prompt injection resistance. Lower scores indicate safer behavior.
- Last Run
- 2026-02-26
- Tags
- cybersecurity, safety, code, insecure-code, social-engineering
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
63.8Adoption
70
Quality
89
Freshness
85
Citations
72
Engagement
0