PLLaVA  by magic-research

Research paper for parameter-free LLaVA extension to videos

created 1 year ago
663 stars

Top 51.6% on sourcepulse

GitHubView on GitHub
Project Summary

PLLaVA extends existing image-language models to video data for tasks like video dense captioning, targeting researchers and developers. It offers a parameter-free approach to adapt image models for video, achieving state-of-the-art results on benchmarks like Video ChatGPT and MVBench by employing a novel temporal pooling strategy to mitigate feature saturation.

How It Works

PLLaVA addresses the computational and data demands of video-language pre-training by adapting image-language models. It introduces a simple pooling strategy that smooths feature distributions across the temporal dimension, reducing the impact of dominant "extreme tokens" in video frames. This parameter-free extension allows existing image models to be fine-tuned for video tasks more efficiently and effectively, particularly for captioning.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (after installing PyTorch with CUDA support).
  • Prerequisites: Python 3.10, PyTorch 2.2.1+cu118 or 2.2.1+cu122.
  • Model Download: Requires downloading base model weights from Hugging Face (e.g., llava-hf/llava-v1.6-vicuna-7b-hf).
  • Demo: bash scripts/demo.sh <model_dir> <weights_dir>
  • Docs: Usage, Data

Highlighted Details

  • Achieves SOTA on Video ChatGPT (3.48/5) and MVBench (58.1% accuracy), outperforming GPT-4V(IG-VLM).
  • Employs a temporal pooling strategy to improve video feature representation.
  • Supports 7B, 13B, and 34B parameter models.
  • Codebase built upon Videochat2, leveraging transformers and accelerate.

Maintenance & Community

  • Project is under active development and reconstruction.
  • Contributions and suggestions are welcomed.
  • Acknowledgements include LLaVA, VideoChatGPT, and VideoLLaVA.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README.
  • It is built upon other open-source projects, whose licenses would apply.

Limitations & Caveats

  • The repository is noted as undergoing development and reconstruction, with unoptimized response speed and frontend logic.
  • Specific data preparation instructions are linked but not detailed within the README.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.