Scaling Data-Constrained Language Models
by Hugging Face / ETH Zurich · free · Last verified 2026-03-17
Investigates scaling behavior when data is limited and must be repeated, finding that repeated data is less harmful than expected and that compute should be redirected toward more parameters when data is exhausted. Provides practical guidance for real-world data-constrained training.
https://arxiv.org/abs/2305.16264 ↗C
C—Below Average
Adoption: BQuality: AFreshness: B+Citations: FEngagement: F
Specifications
- License
- Open Access
- Pricing
- free
- Capabilities
- scaling-analysis, data-efficient-training, compute-budgeting
- Integrations
- Use Cases
- data-limited-training, compute-allocation, research-planning
- API Available
- No
- Tags
- scaling-laws, data-constrained, repeated-data, epochs, compute-optimal
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
42Adoption
62
Quality
88
Freshness
72
Citations
0
Engagement
0