Skip to main content
brand
context
industry
strategy
AaaS
DatasetComputer Visionv2.0

WebVid-10M

by University of Oxford · free · Last verified 2026-03-17

WebVid-10M is a massive dataset containing over 10 million video clips paired with descriptive text captions. Scraped from stock video websites, it serves as a foundational pretraining corpus for state-of-the-art video-language models, facilitating research in video understanding, retrieval, and generation.

https://m-bain.github.io/webvid-dataset/
B
BAbove Average
Adoption: BQuality: AFreshness: C+Citations: B+Engagement: F

Specifications

License
Custom
Pricing
free
Capabilities
video-language model pretraining, text-to-video retrieval, video-to-text generation (captioning), zero-shot video classification, temporal reasoning, action recognition, video question answering (VQA), multimodal representation learning
Integrations
[object Object], [object Object], [object Object]
Use Cases
[object Object], [object Object], [object Object], [object Object], [object Object]
API Available
No
Tags
multimodal, video-text, video-captioning, large-scale, pretraining, video-understanding, computer-vision, nlp, text-to-video-retrieval, temporal-reasoning
Added
2026-03-17
Completeness
0.85%

Index Score

62.7
Adoption
68
Quality
80
Freshness
50
Citations
78
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service