May 18, 2026
AI & Machine Learning
deep learning
video analysis
visual-language-model
Gallery
About
The Marlin-2B is a tiny Visual-Language Model (VLM) designed to extract structured information from videos. It achieves this by aligning visual and textual features to facilitate tasks such as video question answering and video captioning. The model is available on the Hugging Face platform for integration into various applications.
Comments (0)
No comments yet. Be the first to comment!
Related Products
OpenBrief – Local-first video downloader/summarizer
Nerve – self hosted runtime for AI agents
skills-for-humanity – 171 structured reasoning skills for Claude Code
skills-for-humanity – 171 structured reasoning skills for Claude Code
OpenBrief – Local-first video downloader/summarizer
Bae – AI companion built around persistent memory architecture