Evol-CodeAlpaca
by Microsoft Research · free · Last verified 2026-03-17
Evol-CodeAlpaca is a dataset of 110,000 instruction-solution pairs for code generation, created by applying the EvolInstruct method to Code Alpaca seeds. Using GPT-4, it progressively increases the complexity and diversity of programming problems, serving as the primary training data for the WizardCoder models.
https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k ↗B
B—Above Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F
Specifications
- License
- CC BY 4.0
- Pricing
- free
- Capabilities
- instruction-tuning-for-code, synthetic-data-generation, complexity-evolution-of-instructions, large-language-model-finetuning, python-code-generation-training, problem-solving-dataset, code-benchmark-improvement
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- Yes
- Tags
- code-generation, instruction-tuning, evol-instruct, python, dataset, wizardcoder, llm-finetuning, synthetic-data, gpt-4, problem-solving
- Added
- 2026-03-17
- Completeness
- 0.9%
Index Score
65.3Adoption
74
Quality
85
Freshness
74
Citations
75
Engagement
0