Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkAI Ethics & Safetyv2.0

CyberSecEval

by Meta AI · free · Last verified 2026-03-17

CyberSecEval is a benchmark developed by Meta to assess the cybersecurity risks associated with Large Language Models (LLMs). It evaluates a model's propensity to generate insecure code, assist in exploiting vulnerabilities, and facilitate attacks, helping safety teams quantify the dual-use risk of code-capable models.

https://github.com/meta-llama/PurpleLlama/tree/main/CybersecurityBenchmarks
B
BAbove Average
Adoption: B+Quality: AFreshness: ACitations: B+Engagement: F

Specifications

License
MIT
Pricing
free
Capabilities
LLM cybersecurity risk assessment, Insecure code generation evaluation, Vulnerability exploitation assistance testing, Malware generation propensity measurement, Social engineering attack facilitation analysis, Quantification of dual-use risk for code models, Standardized safety benchmarking, LLM red teaming support
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, llama-3-70b, gemini-2-5-pro
Metrics
insecure-code-rate, exploit-assist-rate, malware-gen-rate
Methodology
Three evaluation axes: (1) insecure coding practices measured by CWE violation rate in generated code via static analysis; (2) cyberattack assistance measured by compliance rate with attack-setup prompts; (3) prompt injection resistance. Lower scores indicate safer behavior.
Last Run
2026-02-26
Tags
cybersecurity, ai-safety, llm-evaluation, red-teaming, responsible-ai, code-generation, vulnerability-assessment, malware-analysis, social-engineering, benchmark, dual-use-risk
Added
2026-03-17
Completeness
0.9%

Index Score

63.8
Adoption
70
Quality
89
Freshness
85
Citations
72
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service