Skip to main content
brand
context
industry
strategy
AaaS
Datasetsyntheticv1.0

Phi-1 TextBooks

by Microsoft · free · Last verified 2026-03-17

Phi-1 TextBooks is a synthetic dataset of Python coding textbooks and exercises generated by GPT-3.5 and GPT-4. It was created to pretrain Microsoft's Phi-1 small language model, demonstrating that high-quality, curriculum-style data can significantly boost the coding abilities of smaller models compared to training on general web data.

https://huggingface.co/datasets/nampdn-ai/tiny-textbooks
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F

Specifications

License
MIT
Pricing
free
Capabilities
Pretraining small language models (SLMs), Fine-tuning models for Python code generation, Improving instruction-following for programming tasks, Benchmarking model performance on coding benchmarks, Researching the impact of synthetic data quality, Generating educational coding content, Training models for code completion and explanation, Developing AI-powered coding tutors
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object], [object Object]
API Available
Yes
Tags
synthetic-data, textbooks, coding, python, pretraining, language-model-training, code-generation, dataset, nlp, ai-research, small-language-model
Added
2026-03-17
Completeness
0.9%

Index Score

67.7
Adoption
72
Quality
88
Freshness
75
Citations
85
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service