IFEval
by Google Research · open-source · Last verified 2026-03-01
Instruction-Following Evaluation benchmark testing models' ability to precisely follow verifiable formatting instructions. Includes constraints like word count limits, specific formatting requirements, keyword inclusion/exclusion, and structural rules that can be programmatically verified.
https://github.com/google-research/google-research/tree/master/instruction_following_eval ↗B
B—Above Average
Adoption: B+Quality: AFreshness: ACitations: B+Engagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- model-evaluation, instruction-following-testing, constraint-verification
- Integrations
- lm-eval-harness
- Use Cases
- instruction-compliance-testing, formatting-evaluation, constraint-following-assessment
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
- Metrics
- prompt-level-accuracy, instruction-level-accuracy
- Methodology
- 541 prompts with verifiable instructions like word count, formatting, keyword constraints. Evaluated programmatically for exact compliance with each instruction.
- Last Run
- 2026-02-25
- Tags
- benchmark, evaluation, instruction-following, constraints, formatting
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
64.3Adoption
74
Quality
86
Freshness
84
Citations
70
Engagement
0