FlagEmbedding by FlagOpen

Toolkit for retrieval and RAG applications

Created 2 years ago

11,124 stars

Top 4.6% on SourcePulse

View on GitHub

13 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Jesse Clark

Cofounder of Marqo

Pawel Garbacki

Cofounder of Fireworks AI

and 9 more!

Project Summary

FlagOpen/FlagEmbedding provides a comprehensive toolkit for retrieval-augmented LLMs, offering a suite of embedding and reranking models. It targets researchers and developers building search and RAG systems, enabling state-of-the-art performance across various languages and retrieval tasks.

How It Works

The project leverages advanced transformer architectures, including LLM-based models, to generate dense embeddings. It supports multiple retrieval paradigms such as dense, lexical, and multi-vector (ColBERT) retrieval, unifying these functionalities within single models like BGE-M3. This multi-faceted approach enhances retrieval accuracy and flexibility.

Quick Start & Requirements

Installation: pip install -U FlagEmbedding (for inference) or pip install -U FlagEmbedding[finetune] (for fine-tuning).
Dependencies: Python. GPU with CUDA is recommended for optimal performance.
Usage: Load models via FlagAutoModel.from_finetuned and use the .encode() method. See embedder inference and reranker inference for details.

Highlighted Details

BGE-M3 offers multi-linguality (100+ languages), multi-granularity (up to 8192 tokens), and multi-functionality (dense, lexical, multi-vec retrieval).
BGE-VL provides state-of-the-art multimodal embedding capabilities for diverse visual search applications.
The toolkit includes lightweight rerankers and models fine-tuned with techniques like LM-Cocktail for improved resilience.
Active development includes new benchmarks like MLVU for long video understanding and AIR-Bench for fair OOD evaluation.

Maintenance & Community

The project is actively maintained with frequent updates and new model releases. Community engagement is encouraged via WeChat groups. Tutorials are continuously updated.

Licensing & Compatibility

FlagEmbedding is licensed under the MIT License, permitting both academic and commercial use without significant restrictions.

Limitations & Caveats

While the project offers extensive multilingual support, specific performance nuances may exist across all languages. Some newer models like BGE-VL are released under MIT, but the README also mentions other projects with potentially different licenses, requiring careful verification for specific components.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

148 stars in the last 30 days