Video-LLaVA
by PKU-YuanLab · free · Last verified 2026-03-17
Video-LLaVA is an open video-language model that extends the LLaVA architecture with temporal video understanding capabilities, enabling detailed question answering and reasoning over video content. It achieves strong performance on video QA benchmarks by aligning visual features from both images and videos into a shared representation space.
https://huggingface.co/LanguageBind/Video-LLaVA-7B-hf ↗C
C—Below Average
Adoption: CQuality: B+Freshness: B+Citations: C+Engagement: F
Specifications
- License
- Apache 2.0
- Pricing
- free
- Capabilities
- video-understanding, visual-question-answering, temporal-reasoning, image-understanding
- Integrations
- Hugging Face, Transformers
- Use Cases
- video-qa, video-analysis, temporal-reasoning, multimodal-research
- API Available
- No
- Parameters
- 7B
- Context Window
- 4K
- Modalities
- text, image, video
- Training Cutoff
- 2023
- Tags
- video-understanding, vision-language, open-source, temporal-reasoning
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
47Adoption
45
Quality
76
Freshness
76
Citations
55
Engagement
0