DatasetAI for Codev1.0

HumanEval Dataset

by OpenAI · open-source · Last verified 2026-03-17

A curated set of 164 handwritten Python programming problems released by OpenAI, each consisting of a function signature, docstring, reference solution, and unit tests. HumanEval introduced the pass@k metric for functional code correctness evaluation and has become the de facto standard benchmark reported in virtually every code generation model paper.

https://huggingface.co/datasets/openai/openai_humaneval ↗

B+

B+—Good

Adoption: A+Quality: A+Freshness: BCitations: A+Engagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: evaluation, code-generation, unit-testing
Integrations: hugging-face
Use Cases: code-model-evaluation, research, benchmarking
API Available: Yes
Tags: code, evaluation, python, unit-tests, benchmark
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service