Phi-1 TextBooks
by Microsoft · free · Last verified 2026-03-17
Phi-1 TextBooks is a synthetic dataset of Python coding textbooks and exercises generated by GPT-3.5 and GPT-4. It was created to pretrain Microsoft's Phi-1 small language model, demonstrating that high-quality, curriculum-style data can significantly boost the coding abilities of smaller models compared to training on general web data.
https://huggingface.co/datasets/nampdn-ai/tiny-textbooks ↗B
B—Above Average
Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F
Specifications
- License
- MIT
- Pricing
- free
- Capabilities
- Pretraining small language models (SLMs), Fine-tuning models for Python code generation, Improving instruction-following for programming tasks, Benchmarking model performance on coding benchmarks, Researching the impact of synthetic data quality, Generating educational coding content, Training models for code completion and explanation, Developing AI-powered coding tutors
- Integrations
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- Yes
- Tags
- synthetic-data, textbooks, coding, python, pretraining, language-model-training, code-generation, dataset, nlp, ai-research, small-language-model
- Added
- 2026-03-17
- Completeness
- 0.9%
Index Score
67.7Adoption
72
Quality
88
Freshness
75
Citations
85
Engagement
0