Discover and explore top open-source AI tools and projects—updated daily.
NExT-ChatVMultimodal LLM for integrated vision tasks
Top 99.6% on SourcePulse
NExT-Chat is a Large Multimodal Model (LMM) designed to integrate conversational AI with visual understanding, specifically object detection and segmentation. It targets researchers and developers seeking to build multimodal applications capable of not only chatting but also precisely locating and segmenting objects within images, offering enhanced visual grounding for AI interactions.
How It Works
NExT-Chat functions as an LMM by combining a language model with visual encoders. It leverages OpenAI's CLIP ViT for visual feature extraction and the Segment Anything Model (SAM) for segmentation tasks. The framework employs a multi-stage training approach, encompassing VL+Detection Pre-training, VL+Detection Instruction Following, and VL+Detection+Segmentation, allowing for progressive integration of capabilities.
Quick Start & Requirements
git clone https://github.com/NExT-ChatV/NExT-Chat.git), navigate into the directory, and install dependencies (pip install -r requirements.txt).openai-clip-vit-large-patch14-336) and the SAM model. GPU acceleration is essential.nextchat-7b-224 to 32GB for nextchat-7b-336 and 35GB for nextchat-13b-224.Highlighted Details
7B and 13B parameter models with varying ViT resolutions (224x224, 336x336), recommending the nextchat-7b-336-v1 for superior performance.Maintenance & Community
The initial code was released in December 2023. No specific community channels (like Discord or Slack) or details on maintainers/sponsorships are provided in the README.
Licensing & Compatibility
The provided README does not specify a software license. Compatibility for commercial use or integration with closed-source projects is undetermined without a license.
Limitations & Caveats
The project notes that its current implementation struggles to outperform top-tier pixel2seq models on Referring Expression Comprehension (REC) tasks in the pre-training setting, with ongoing research into this area. Older v0 model versions are explicitly marked as "not recommended" compared to newer iterations. The setup requires careful configuration of external model paths (CLIP, SAM).
2 years ago
Inactive
LLaVA-VL
baaivision
mlfoundations