Skip to main content
brand
context
industry
strategy
AaaS
Datasetcodev1.0

Evol-CodeAlpaca

by Microsoft Research · free · Last verified 2026-03-17

Evol-CodeAlpaca is a dataset of 110,000 instruction-solution pairs for code generation, created by applying the EvolInstruct method to Code Alpaca seeds. Using GPT-4, it progressively increases the complexity and diversity of programming problems, serving as the primary training data for the WizardCoder models.

https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F

Specifications

License
CC BY 4.0
Pricing
free
Capabilities
instruction-tuning-for-code, synthetic-data-generation, complexity-evolution-of-instructions, large-language-model-finetuning, python-code-generation-training, problem-solving-dataset, code-benchmark-improvement
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object], [object Object]
API Available
Yes
Tags
code-generation, instruction-tuning, evol-instruct, python, dataset, wizardcoder, llm-finetuning, synthetic-data, gpt-4, problem-solving
Added
2026-03-17
Completeness
0.9%

Index Score

65.3
Adoption
74
Quality
85
Freshness
74
Citations
75
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service