Skip to main content
BenchmarkLLMsv1.0

IFEval

by Google Research · open-source · Last verified 2026-03-01

Instruction-Following Evaluation benchmark testing models' ability to precisely follow verifiable formatting instructions. Includes constraints like word count limits, specific formatting requirements, keyword inclusion/exclusion, and structural rules that can be programmatically verified.

https://github.com/google-research/google-research/tree/master/instruction_following_eval
B
BAbove Average
Adoption: B+Quality: AFreshness: ACitations: B+Engagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
model-evaluation, instruction-following-testing, constraint-verification
Integrations
lm-eval-harness
Use Cases
instruction-compliance-testing, formatting-evaluation, constraint-following-assessment
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
Metrics
prompt-level-accuracy, instruction-level-accuracy
Methodology
541 prompts with verifiable instructions like word count, formatting, keyword constraints. Evaluated programmatically for exact compliance with each instruction.
Last Run
2026-02-25
Tags
benchmark, evaluation, instruction-following, constraints, formatting
Added
2026-03-17
Completeness
100%

Index Score

64.3
Adoption
74
Quality
86
Freshness
84
Citations
70
Engagement
0

Explore the full AI ecosystem on Agents as a Service