Open-source, commercially usable multimodal model for bilingual visual-text dialogue
Top 77.1% on sourcepulse
This repository provides Chinese-LLaVA, an open-source, commercially viable multimodal model supporting bilingual (Chinese and English) visual-textual dialogue. It also includes the Chinese-LLaVA-Vision-Instructions dataset for visual instruction tuning, targeting researchers and developers working with multimodal AI in Chinese and English.
How It Works
Chinese-LLaVA builds upon the LLaVA architecture, integrating Chinese language models like Chinese-Llama-2-7B and Baichuan-7B with visual encoders. This approach allows for seamless understanding and generation of responses based on both image content and textual prompts in two languages, offering a robust solution for multimodal applications.
Quick Start & Requirements
conda create -n Cllava python=3.10
), activate it (conda activate Cllava
), and install the package (pip install -e .
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README mentions "TODO" for training details, int4 quantization, and Docker deployment, indicating these features may be incomplete or under development.
1 year ago
Inactive