Datasetsyntheticv1.0

Phi-1 TextBooks

by Microsoft · free · Last verified 2026-03-17

Phi-1 TextBooks is a synthetic dataset of Python coding textbooks and exercises generated by GPT-3.5 and GPT-4. It was created to pretrain Microsoft's Phi-1 small language model, demonstrating that high-quality, curriculum-style data can significantly boost the coding abilities of smaller models compared to training on general web data.

https://huggingface.co/datasets/nampdn-ai/tiny-textbooks ↗

B—Above Average

Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F

Specifications

License: MIT
Pricing: free
Capabilities: Pretraining small language models (SLMs), Fine-tuning models for Python code generation, Improving instruction-following for programming tasks, Benchmarking model performance on coding benchmarks, Researching the impact of synthetic data quality, Generating educational coding content, Training models for code completion and explanation, Developing AI-powered coding tutors
Integrations
Use Cases: [object Object], [object Object], [object Object], [object Object], [object Object]
API Available: Yes
Tags: synthetic-data, textbooks, coding, python, pretraining, language-model-training, code-generation, dataset, nlp, ai-research, small-language-model
Added: 2026-03-17
Completeness: 0.9%

Index Score

67.7

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service