Skip to main content
Datasetcodev1.0

Evol-CodeAlpaca

by Microsoft Research · open-source · Last verified 2026-03-17

Evol-CodeAlpaca is a code instruction-tuning dataset of 110,000 programming problem-solution pairs generated using the EvolInstruct methodology applied to Code Alpaca seeds, progressively evolving instructions in complexity and diversity through GPT-4. It is the primary training dataset for WizardCoder and significantly improved coding benchmarks such as HumanEval and MBPP over its predecessor datasets.

https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F

Specifications

License
CC BY 4.0
Pricing
open-source
Capabilities
code-generation, instruction-tuning, complexity-evolution
Integrations
huggingface-datasets
Use Cases
code-model-finetuning, programming-instruction-tuning, humaneval-optimization
API Available
Yes
Tags
code, instruction-tuning, evol-instruct, python, programming
Added
2026-03-17
Completeness
100%

Index Score

65.3
Adoption
74
Quality
85
Freshness
74
Citations
75
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service