Long-VITA  by VITA-MLLM

Long-context visual language model for million-token processing

created 7 months ago
291 stars

Top 91.6% on sourcepulse

GitHubView on GitHub
Project Summary

Long-VITA is a large multi-modal model designed to process extremely long contexts, exceeding one million tokens, for both image and video understanding tasks. It targets researchers and developers working with extensive visual data, offering state-of-the-art performance on benchmarks like Video-MME for models under 20B parameters.

How It Works

Long-VITA achieves its long-context capabilities through an unspecified architectural innovation that enables processing over 1 million visual tokens. The model is trained on a dataset of 17 million publicly available samples, focusing on open-source data. It utilizes a Logits-Masked LM Head, which is highlighted as a key component for its effectiveness.

Quick Start & Requirements

  • Models: Available on Hugging Face, with weights for MindSpeed (Ascend NPU) and Megatron (Nvidia GPU) also provided.
  • Training/Inference: Supports Ascend NPU with MindSpeed, Nvidia GPU with Megatron, and Nvidia GPU with DeepSpeed.
  • Resources: Training and inference are GPU-intensive, with specific support for Nvidia and Ascend hardware.

Highlighted Details

  • Processes over 1 million visual tokens (4K frames).
  • Achieves state-of-the-art performance on Video-MME for models under 20B parameters.
  • Trained exclusively on 17 million open-source data samples.
  • Competitive results on image and video understanding benchmarks.

Maintenance & Community

The project has recently added an online demo and support in VLMEvalKit (OpenCompass). Training and inference code, logs, deployment code, and model weights are released.

Licensing & Compatibility

The README does not explicitly state the license type or any compatibility notes for commercial use.

Limitations & Caveats

The project's primary focus is on Ascend NPU and Nvidia GPU architectures, with specific framework support (MindSpeed, Megatron, DeepSpeed). Compatibility with other hardware or frameworks is not detailed.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.