EVE  by baaivision

Vision-language model research paper exploring encoder-free architectures

Created 1 year ago
350 stars

Top 79.5% on SourcePulse

GitHubView on GitHub
Project Summary

EVE Series offers encoder-free Vision-Language Models (VLMs) designed to remove the need for a separate vision encoder, enabling efficient and stable transfer of Large Language Models (LLMs) to multimodal tasks. Targeting researchers and practitioners in multimodal AI, EVE aims to bridge the performance gap between encoder-free and encoder-based VLM architectures.

How It Works

EVE employs a pioneering route by developing a pure decoder-only architecture across modalities. This approach allows for arbitrary image aspect ratios and focuses on efficient, transparent, and practical training strategies. By filtering and recaptioning less than 100 million publicly available data points from sources like OpenImages, SAM, and LAION, EVE demonstrates data efficiency while achieving performance competitive with modular encoder-based VLMs.

Highlighted Details

  • Superior capability with arbitrary image aspect ratios, outperforming encoder-free counterparts.
  • Data efficiency achieved by utilizing a filtered subset of publicly available data.
  • Pioneering a transparent and practical training strategy for decoder-only multimodal architectures.
  • EVEv1 accepted to NeurIPS 2024 (Spotlight).

Maintenance & Community

The project is associated with BAAI (Beijing Academy of Artificial Intelligence). Further community engagement details are not provided in the README.

Licensing & Compatibility

The project content is licensed under a specific LICENSE file, with no explicit mention of common open-source licenses like MIT or Apache. Users should verify compatibility for commercial use or closed-source linking.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be actively developed with EVEv2 recently released.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Phil Wang Phil Wang(Prolific Research Paper Implementer).

Cosmos-Tokenizer by NVIDIA

0.1%
2k
Suite of neural tokenizers for image and video processing
Created 10 months ago
Updated 7 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.