EAGLE  by NVlabs

Vision-language model for long-context multimodal learning

created 1 year ago
841 stars

Top 43.2% on sourcepulse

GitHubView on GitHub
Project Summary

The Eagle family of models addresses the challenge of long-context multimodal learning, offering solutions for comprehending extended video sequences and high-resolution images. Targeting researchers and developers in computer vision and natural language processing, Eagle 2.5 provides a generalist framework that significantly enhances performance on long-context benchmarks, rivaling larger commercial models with fewer parameters.

How It Works

Eagle 2.5 employs Automatic Degrade Sampling (ADS) and Image Area Preservation (IAP) to maintain contextual integrity and visual detail during long-context training. ADS dynamically balances visual and textual inputs, while IAP optimizes image tiling to retain original aspect ratios and fine-grained details. The training pipeline also utilizes progressive mixed post-training to gradually increase context length, improving information density. A key component is the Eagle-Video-110K dataset, curated for long video understanding with story-level and clip-level annotations.

Quick Start & Requirements

  • Install: pip install transformers==4.37.2 flash-attn
  • Prerequisites: CUDA, Python. Specific GPU memory requirements are not detailed but implied by model sizes (1B to 9B parameters).
  • Demo: A Streamlit-based local chat demo is available.
  • Models: Available on Hugging Face (e.g., nvidia/Eagle2-1B).
  • Documentation: Links to papers, Hugging Face models, and demos are provided.

Highlighted Details

  • Eagle 2.5-8B achieves 72.4% on Video-MME with 512 input frames, competitive with GPT-4o and Qwen2.5-VL-72B.
  • Supports long-context multimodal learning for both video and high-resolution images.
  • Introduces novel training techniques: Automatic Degrade Sampling (ADS) and Image Area Preservation (IAP).
  • Features the Eagle-Video-110K dataset for long video understanding.

Maintenance & Community

  • The project is actively developed by NVlabs, with releases including Eagle-1, Eagle-2, and Eagle-2.5.
  • Eagle-1 was accepted to ICLR 2025.
  • No specific community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • Code: Apache 2.0 license.
  • Model Weights: Creative Commons Attribution-NonCommercial 4.0 International.
  • Restrictions: The model weights are for non-commercial use only. Licenses for underlying models (Qwen2.5, Llama, PaliGemma) are also mentioned.

Limitations & Caveats

The model weights are restricted to non-commercial use. The README mentions TODO items for vLLM inference support and AWQ quantization weights, indicating these features are not yet available.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
85 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.