Skip to main content
DatasetAI for Codev1.0

HumanEval Dataset

by OpenAI · open-source · Last verified 2026-03-17

A curated set of 164 handwritten Python programming problems released by OpenAI, each consisting of a function signature, docstring, reference solution, and unit tests. HumanEval introduced the pass@k metric for functional code correctness evaluation and has become the de facto standard benchmark reported in virtually every code generation model paper.

https://huggingface.co/datasets/openai/openai_humaneval
B+
B+Good
Adoption: A+Quality: A+Freshness: BCitations: A+Engagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
evaluation, code-generation, unit-testing
Integrations
hugging-face
Use Cases
code-model-evaluation, research, benchmarking
API Available
Yes
Tags
code, evaluation, python, unit-tests, benchmark
Added
2026-03-17
Completeness
100%

Index Score

79
Adoption
91
Quality
94
Freshness
60
Citations
95
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service