CyberSecEval
by Meta AI · free · Last verified 2026-03-17
CyberSecEval is a benchmark developed by Meta to assess the cybersecurity risks associated with Large Language Models (LLMs). It evaluates a model's propensity to generate insecure code, assist in exploiting vulnerabilities, and facilitate attacks, helping safety teams quantify the dual-use risk of code-capable models.
https://github.com/meta-llama/PurpleLlama/tree/main/CybersecurityBenchmarks ↗B
B—Above Average
Adoption: B+Quality: AFreshness: ACitations: B+Engagement: F
Specifications
- License
- MIT
- Pricing
- free
- Capabilities
- LLM cybersecurity risk assessment, Insecure code generation evaluation, Vulnerability exploitation assistance testing, Malware generation propensity measurement, Social engineering attack facilitation analysis, Quantification of dual-use risk for code models, Standardized safety benchmarking, LLM red teaming support
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Evaluated Models
- gpt-4o, claude-opus-4, llama-3-70b, gemini-2-5-pro
- Metrics
- insecure-code-rate, exploit-assist-rate, malware-gen-rate
- Methodology
- Three evaluation axes: (1) insecure coding practices measured by CWE violation rate in generated code via static analysis; (2) cyberattack assistance measured by compliance rate with attack-setup prompts; (3) prompt injection resistance. Lower scores indicate safer behavior.
- Last Run
- 2026-02-26
- Tags
- cybersecurity, ai-safety, llm-evaluation, red-teaming, responsible-ai, code-generation, vulnerability-assessment, malware-analysis, social-engineering, benchmark, dual-use-risk
- Added
- 2026-03-17
- Completeness
- 0.9%
Index Score
63.8Adoption
70
Quality
89
Freshness
85
Citations
72
Engagement
0