Skip to main content
Datasetalignmentv1.0

Self-Instruct

by University of Washington · open-source · Last verified 2026-03-17

Self-Instruct is the foundational instruction-tuning dataset and methodology introduced by Wang et al. (2022), where 175 human-written seed tasks are iteratively expanded into 52,000 instruction-input-output triplets using GPT-3 as the generator. It established the paradigm of bootstrapping instruction data from existing LLMs and directly inspired Alpaca, WizardLM, and most subsequent synthetic alignment datasets.

https://github.com/yizhongw/self-instruct
B
BAbove Average
Adoption: B+Quality: B+Freshness: BCitations: A+Engagement: F

Specifications

License
Apache 2.0
Pricing
open-source
Capabilities
instruction-tuning, data-generation, self-play
Integrations
huggingface-datasets
Use Cases
sft-training, instruction-data-generation, alignment-research
API Available
No
Tags
instruction-tuning, self-play, seed-tasks, gpt-3, alignment
Added
2026-03-17
Completeness
100%

Index Score

69.8
Adoption
78
Quality
78
Freshness
60
Citations
92
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service