Strong Teacher Not Needed? On Distillation in LLM Pretraining
by [unverified] · free · Last verified 2026-06-21T03:07:56.505Z
Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield better students. In this work, we examine this assumption about distillation in large language model pretraining. By varying architecture sizes and training token budgets, we create strong-to-weak...
http://arxiv.org/abs/2605.23857v1 ↗F
F—Critical
Adoption: FQuality: FFreshness: A+Citations: FEngagement: F
Specifications
- Pricing
- free
- Capabilities
- unverified
- Integrations
- Use Cases
- API Available
- No
- Tags
- auto-discovered
- Added
- 2026-06-21T03:07:56.505Z
- Completeness
- 60%
Index Score
0Adoption
0
Quality
0
Freshness
100
Citations
0
Engagement
0