VisionZip by JIA-Lab-research

Vision-language model research paper for efficient VLMs

Created 1 year ago

396 stars

Top 72.9% on SourcePulse

Project Summary

VisionZip addresses the computational inefficiency of Vision Language Models (VLMs) by drastically reducing the number of visual tokens processed without significant performance loss. Targeting researchers and developers working with VLMs, it offers substantial speedups and memory savings during inference and training.

How It Works

VisionZip employs a text-agnostic method to select a small subset of dominant and contextual visual tokens from the input sequence. This approach aims to retain the most salient information while discarding redundant or less informative tokens, leading to faster processing and reduced memory footprint. Its text-agnostic nature allows it to be integrated with any VLM architecture and existing LLM acceleration techniques.

Quick Start & Requirements

Install via pip: pip install visionzip
For development: Clone the repo and run pip install -e .
Requires a LLaVA environment.
Official Demo: Hugging Face Space
Usage Video: Usage-Video

Highlighted Details

Achieves state-of-the-art performance among efficient VLM methods.
Retains ~10% of visual tokens, achieving ~95% of performance in training-free mode.
Applicable during inference, efficient tuning, and training stages, saving memory and time.
Significantly reduces prefilling and total inference time with KV cache.

Maintenance & Community

Accepted by CV CVPR 2025.
Built upon LLaVA, mini-Gemini, Lmms-Eval, and Video-LLaVA.
Demo-Chat code available in a 'demo-chat' branch for interactive analysis.

Licensing & Compatibility

Licensed under the Apache License 2.0.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is presented as a CVPR 2025 submission, indicating it is a recent research artifact. While claiming minimal performance degradation, the exact impact on specific downstream tasks or edge cases is not detailed.

VisionZip by JIA-Lab-research

Explore Similar Projects

LaCT by a1600012888

flex-nano-vllm by changjonathanc

FastV by pkunlp-icler

Long-VITA by VITA-MLLM

Black-Box-Tuning by txsun1997

WeDLM by Tencent

Quest by mit-han-lab

SpargeAttn by thu-ml

LookaheadDecoding by hao-ai-lab

EAGLE by SafeAILab

Awesome-LLM-Inference by xlite-dev

minimind-v by jingyaogong