LAVIS  by salesforce

Library for language-vision AI research

created 2 years ago
10,795 stars

Top 4.8% on sourcepulse

GitHubView on GitHub
Project Summary

LAVIS is a comprehensive Python library for language-vision intelligence research and applications, offering a unified interface for over 10 tasks, 20 datasets, and 30 state-of-the-art models. It empowers researchers and engineers to rapidly develop, benchmark, and deploy multimodal AI solutions, from image captioning and visual question answering to multimodal feature extraction.

How It Works

LAVIS employs a modular design, providing a unified interface to easily access, repurpose, and extend existing modules like datasets, models, and preprocessors. It supports off-the-shelf inference with readily available pre-trained models and includes automatic download tools for numerous language-vision datasets, simplifying data preparation and model training/evaluation.

Quick Start & Requirements

Highlighted Details

  • Supports recent models like BLIP-2, InstructBLIP, BLIP-Diffusion, and X-InstructBLIP.
  • Achieves state-of-the-art zero-shot performance on various vision-language tasks.
  • Offers unified feature extraction for multimodal classification and cross-modal similarity.
  • Includes automatic dataset downloading and organization tools.

Maintenance & Community

Licensing & Compatibility

  • BSD 3-Clause License.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The library may exhibit socioeconomic biases present in the training data, potentially leading to misclassifications or offensive outputs. Users are advised to review models for responsible use.

Health Check
Last commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
318 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.