VisCPM by OpenBMB

Multimodal model for both visual-language tasks in Chinese and English

Created 2 years ago

1,072 stars

Top 35.3% on SourcePulse

Project Summary

VisCPM is an open-source family of Chinese and English multimodal large models, offering both conversational (VisCPM-Chat) and text-to-image generation (VisCPM-Paint) capabilities. It targets researchers and developers seeking state-of-the-art bilingual multimodal performance, leveraging the 10B parameter CPM-Bee LLM.

How It Works

VisCPM integrates a visual encoder (Muffin) and a visual decoder (Diffusion-UNet) with the CPM-Bee LLM. VisCPM-Chat is trained on English multimodal data and fine-tuned with English and translated Chinese instruction data, enabling strong cross-lingual generalization. VisCPM-Paint uses CPM-Bee as a text encoder and a UNet as an image decoder, initialized with Stable Diffusion 2.1 parameters, and trained on English data, also showing good Chinese performance.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (python=3.10), and install dependencies (pip install torch>=1.10, pip install -r requirements.txt).
Prerequisites: PyTorch, Python 3.10+.
Resources: Low-resource inference is supported, with VisCPM-Chat requiring as little as 5GB VRAM and VisCPM-Paint 17GB VRAM using BMInf.
Demos & Docs: Online demos and API usage guides are available.

Highlighted Details

Achieves state-of-the-art performance among Chinese open-source multimodal models.
Supports both multimodal dialogue (image-to-text) and text-to-image generation.
Demonstrates strong bilingual capabilities, with models trained primarily on English data generalizing well to Chinese.
Offers fine-tuning scripts for adapting models to specific use cases.

Maintenance & Community

The project is actively updated, with recent releases including MiniCPM-V 2.0 and OmniLMM. The VisCPM paper was accepted as a spotlight at ICLR 2024. Community support channels are not explicitly mentioned, but Huggingface integration is provided.

Licensing & Compatibility

VisCPM models are licensed under a "General Model License Agreement - Source Attribution - Publicity Restriction - Non-Commercial" allowing personal and research use. Commercial use requires contacting cpm@modelbest.cn for licensing. The CPM-Bee base model has commercial licensing with similar contact requirements.

Limitations & Caveats

The safety modules are not perfect and may have false positives or negatives. Fine-tuning code is currently tested only on Linux. The project roadmap indicates planned support for model quantization.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days