Skip to main content
Datasetalignmentv1.0

Capybara

by Argilla / LDJnr · open-source · Last verified 2026-03-17

Capybara is a high-quality instruction-tuning dataset of 15,000 diverse, long-form single- and multi-turn conversations synthesized to cover a wide range of topics and response styles, designed to improve model coherence and verbosity on open-ended tasks. It emphasizes narrative quality and conceptual depth over simple factual responses, making it particularly effective for improving chat model fluency and reasoning.

https://huggingface.co/datasets/LDJnr/Capybara
C+
C+Average
Adoption: BQuality: AFreshness: ACitations: BEngagement: F

Specifications

License
CC BY 4.0
Pricing
open-source
Capabilities
instruction-tuning, long-form-generation, chat-finetuning
Integrations
huggingface-datasets
Use Cases
sft-training, chat-model-finetuning, response-quality-improvement
API Available
Yes
Tags
instruction-tuning, long-form, diverse, synthetic, sft
Added
2026-03-17
Completeness
100%

Index Score

57.4
Adoption
65
Quality
82
Freshness
80
Citations
60
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service