VisionZip  by dvlab-research

Vision-language model research paper for efficient VLMs

created 8 months ago
325 stars

Top 85.0% on sourcepulse

GitHubView on GitHub
Project Summary

VisionZip addresses the computational inefficiency of Vision Language Models (VLMs) by drastically reducing the number of visual tokens processed without significant performance loss. Targeting researchers and developers working with VLMs, it offers substantial speedups and memory savings during inference and training.

How It Works

VisionZip employs a text-agnostic method to select a small subset of dominant and contextual visual tokens from the input sequence. This approach aims to retain the most salient information while discarding redundant or less informative tokens, leading to faster processing and reduced memory footprint. Its text-agnostic nature allows it to be integrated with any VLM architecture and existing LLM acceleration techniques.

Quick Start & Requirements

  • Install via pip: pip install visionzip
  • For development: Clone the repo and run pip install -e .
  • Requires a LLaVA environment.
  • Official Demo: Hugging Face Space
  • Usage Video: Usage-Video

Highlighted Details

  • Achieves state-of-the-art performance among efficient VLM methods.
  • Retains ~10% of visual tokens, achieving ~95% of performance in training-free mode.
  • Applicable during inference, efficient tuning, and training stages, saving memory and time.
  • Significantly reduces prefilling and total inference time with KV cache.

Maintenance & Community

  • Accepted by CV CVPR 2025.
  • Built upon LLaVA, mini-Gemini, Lmms-Eval, and Video-LLaVA.
  • Demo-Chat code available in a 'demo-chat' branch for interactive analysis.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is presented as a CVPR 2025 submission, indicating it is a recent research artifact. While claiming minimal performance degradation, the exact impact on specific downstream tasks or edge cases is not detailed.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
55 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.