Vary-toy by Ucas-HaoranWei

Code for a research paper on vision-language models

Created 2 years ago

627 stars

Top 52.8% on SourcePulse

Project Summary

Vary-toy provides an open-source implementation for a Small Language Model (SLM) enhanced with a reinforced vision vocabulary, targeting researchers and developers in the multimodal AI space. It aims to scale vision capabilities within SLMs, enabling advanced visual understanding tasks.

How It Works

Vary-toy integrates a vision encoder with a language model, creating a "reinforced vision vocabulary" that expands the SLM's ability to process and understand visual information. This approach allows for efficient scaling of visual features within smaller language models, offering a competitive alternative to larger, more resource-intensive models.

Quick Start & Requirements

Install: Clone the repository and install dependencies using pip install -e ..
Prerequisites: Python 3.10, PyTorch, DeepSpeed, Flash-Attention (requires ninja).
Weights: Requires downloading Vary-toy weights and CLIP-VIT-L weights.
Demo: Run python vary/demo/run_qwen_vary.py.
Training: Use deepspeed Vary/train/train_qwen_vary.py with specified arguments.
Resources: Claims a single 1080Ti is sufficient for all features.

Highlighted Details

Accepted by ECCV2024.
Released a LAVIS codebase and Vary-600k dataset for training from scratch.
Supports both English and Chinese chart parsing (OneChart).
Developed GOT-OCR2.0, a comprehensive OCR model.

Maintenance & Community

The project is actively updated with recent acceptances at ECCV2024 and ACM MM 2024. Contact information for questions is provided via email.

Licensing & Compatibility

The data, code, and checkpoints are licensed for research use only. Usage is restricted by the license agreements of LLaMA, Vicuna, GPT-4, Qwen, and LLaVA.

Limitations & Caveats

The project is explicitly licensed for research purposes only, which may restrict commercial applications. The README notes that users should rebuild the repository if they have built the original Vary.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days