LAVIS by salesforce

Library for language-vision AI research

Created 3 years ago

11,169 stars

Top 4.6% on SourcePulse

View on GitHub

12 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Research Scientist at Meta Superintelligence Lab

and 8 more!

Project Summary

LAVIS is a comprehensive Python library for language-vision intelligence research and applications, offering a unified interface for over 10 tasks, 20 datasets, and 30 state-of-the-art models. It empowers researchers and engineers to rapidly develop, benchmark, and deploy multimodal AI solutions, from image captioning and visual question answering to multimodal feature extraction.

How It Works

LAVIS employs a modular design, providing a unified interface to easily access, repurpose, and extend existing modules like datasets, models, and preprocessors. It supports off-the-shelf inference with readily available pre-trained models and includes automatic download tools for numerous language-vision datasets, simplifying data preparation and model training/evaluation.

Quick Start & Requirements

Install via pip: pip install salesforce-lavis
For development: git clone https://github.com/salesforce/LAVIS.git && cd LAVIS && pip install -e .
Requires Python 3.8+ and PyTorch. GPU recommended for performance.
Documentation: https://opensource.salesforce.com/LAVIS/latest/index.html
Examples: https://github.com/salesforce/LAVIS/tree/main/examples

Highlighted Details

Supports recent models like BLIP-2, InstructBLIP, BLIP-Diffusion, and X-InstructBLIP.
Achieves state-of-the-art zero-shot performance on various vision-language tasks.
Offers unified feature extraction for multimodal classification and cross-modal similarity.
Includes automatic dataset downloading and organization tools.

Maintenance & Community

Developed by Salesforce.
Contact: lavis@salesforce.com

Licensing & Compatibility

BSD 3-Clause License.
Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The library may exhibit socioeconomic biases present in the training data, potentially leading to misclassifications or offensive outputs. Users are advised to review models for responsible use.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

49 stars in the last 30 days