ComingUp
Marlin-2B: a tiny VLM to extract structured information from videos

Marlin-2B: a tiny VLM to extract structured information from videos

May 18, 2026 AI & Machine Learning
deep learning video analysis visual-language-model

Gallery

Marlin-2B: a tiny VLM to extract structured information from videos

About

The Marlin-2B is a tiny Visual-Language Model (VLM) designed to extract structured information from videos. It achieves this by aligning visual and textual features to facilitate tasks such as video question answering and video captioning. The model is available on the Hugging Face platform for integration into various applications.

Comments (0)

No comments yet. Be the first to comment!