Dolly-15K
by Databricks · free · Last verified 2026-03-17
Dolly-15K is a high-quality, open-source dataset of 15,000 instruction-following records generated by humans. Created by Databricks employees, it's designed for fine-tuning large language models to exhibit instruction-following capabilities, such as those seen in ChatGPT, using a relatively small, targeted dataset.
https://huggingface.co/datasets/databricks/databricks-dolly-15k ↗B
B—Above Average
Adoption: AQuality: B+Freshness: BCitations: AEngagement: F
Specifications
- License
- CC-BY-SA-3.0
- Pricing
- free
- Capabilities
- Supervised Fine-Tuning (SFT), Instruction-Following Model Training, Natural Language Generation (NLG), Question Answering, Text Summarization, Creative Writing and Brainstorming, Information Extraction, Dialogue System Development
- Integrations
- Hugging Face Datasets, PyTorch, TensorFlow, Databricks Platform, Jax
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Tags
- instruction-tuning, supervised-fine-tuning, human-generated-data, databricks, llm-training, open-source-dataset, natural-language-processing, question-answering, dialogue-generation, model-alignment
- Added
- 2026-03-17
- Completeness
- 0.85%
Index Score
68.3Adoption
80
Quality
79
Freshness
68
Citations
82
Engagement
0