Skip to main content
DatasetAI for Codev1.0

HumanEval Dataset

by OpenAI · open-source · Last verified 2026-03-17

A curated set of 164 handwritten Python programming problems released by OpenAI, each consisting of a function signature, docstring, reference solution, and unit tests. HumanEval introduced the pass@k metric for functional code correctness evaluation and has become the de facto standard benchmark reported in virtually every code generation model paper.

https://huggingface.co/datasets/openai/openai_humaneval
C+
C+Average
Adoption: A+Quality: A+Freshness: BCitations: FEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
evaluation, code-generation, unit-testing
Integrations
hugging-face
Use Cases
code-model-evaluation, research, benchmarking
API Available
Yes
Tags
code, evaluation, python, unit-tests, benchmark
Added
2026-03-17
Completeness
100%

Index Score

55
Adoption
91
Quality
94
Freshness
60
Citations
0
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service