Oscar  by microsoft

Vision-language pre-training research paper

created 5 years ago
1,050 stars

Top 36.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and models for Oscar and VinVL, advanced pre-training methods for vision-language tasks. It targets researchers and practitioners in NLP and computer vision, enabling state-of-the-art performance on tasks like image captioning and visual question answering.

How It Works

Oscar utilizes object tags detected in images as anchors to facilitate image-text alignment during pre-training. VinVL, an evolution of Oscar, revisits visual representations, offering improved object-attribute detection for enhanced performance on vision-language tasks. This object-centric approach simplifies cross-modal learning and achieves superior results.

Quick Start & Requirements

Installation instructions are available in INSTALL.md. Pre-trained models, datasets, and VinVL image features can be found in VinVL_DOWNLOAD.md and DOWNLOAD.md. Scripts for downstream finetuning are in MODEL_ZOO.md and VinVL_MODEL_ZOO.md.

Highlighted Details

  • Achieved state-of-the-art performance on seven vision-language tasks with VinVL.
  • Pre-trained on a corpus of 6.5 million text-image pairs.
  • Released finetuned models and pre-trained checkpoints.
  • Code for Oscar+ pretraining and VinVL feature extraction is available.

Maintenance & Community

The project is associated with Microsoft Research. Updates include visual instruction tuning with GPT-4 (LLaVA).

Licensing & Compatibility

Oscar is released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README mentions the release of Oscar+ pretraining code and VinVL features, but specific details on dependencies or setup complexity beyond general installation instructions are not immediately apparent.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.