Discover and explore top open-source AI tools and projects—updated daily.
texttronUnified toolkit for document retrieval across modalities, languages, and scale
Top 48.1% on SourcePulse
Tevatron is a unified toolkit for building and deploying neural document retrieval systems, supporting large-scale, multilingual, and multimodal data. It enables researchers and practitioners to efficiently train and fine-tune dense retrievers using parameter-efficient methods like LoRA, integrating with advanced libraries for optimized performance.
How It Works
Tevatron leverages efficient training techniques such as DeepSpeed, vLLM, and FlashAttention for large-scale model training on GPUs and TPUs. It supports parameter-efficient fine-tuning (PEFT) via LoRA, allowing adaptation of large language models to retrieval tasks with reduced computational cost. The toolkit handles data preparation, model encoding, and similarity search, offering flexibility for both textual and multimodal retrieval scenarios.
Quick Start & Requirements
pip install transformers datasets peft deepspeed accelerate faiss-cpu && pip install -e .magix, and GradCache. pip install transformers datasets flax optax followed by cloning and installing magix and GradCache, then pip install -e ..jax-toolbox Docker image.transformers, datasets, peft, deepspeed, accelerate, faiss-cpu (for PyTorch), flax, optax, magix, GradCache (for JAX).jsonl for training (query, positive/negative docs) and corpus (docid, text). Image fields are optional.Tevatron/msmarco-passage-aug).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
v1 branch.4 weeks ago
1 day
ContextualAI
activeloopai
FlagOpen
EleutherAI