Skip to main content
Datasetsyntheticv1.0

Phi-1 TextBooks

by Microsoft · open-source · Last verified 2026-03-17

The Phi-1 TextBooks dataset consists of synthetically generated Python coding textbooks and exercises created by GPT-3.5 and GPT-4 for Microsoft's Phi-1 small language model. The dataset demonstrated that high-quality, curriculum-style synthetic data can dramatically outperform web-scraped corpora on coding benchmarks when used to train small models.

https://huggingface.co/datasets/nampdn-ai/tiny-textbooks
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
pretraining, code-generation, instruction-following
Integrations
huggingface-datasets
Use Cases
code-model-training, curriculum-learning, python-education
API Available
Yes
Tags
synthetic, textbooks, coding, python, pretraining
Added
2026-03-17
Completeness
100%

Index Score

67.7
Adoption
72
Quality
88
Freshness
75
Citations
85
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service