Skywork-R1V  by SkyworkAI

Multimodal model for advanced visual/text reasoning, using chain-of-thought

created 4 months ago
2,919 stars

Top 16.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Skywork-R1V is an open-source multimodal reasoning model designed for advanced visual and text-based thinking. It targets researchers and developers working on AI systems that require understanding and reasoning across both images and text, offering state-of-the-art performance on various benchmarks.

How It Works

The model leverages a hybrid reinforcement learning approach with Chain-of-Thought (CoT) prompting to achieve sophisticated multimodal reasoning. This methodology allows the model to break down complex problems, generate intermediate reasoning steps, and arrive at more accurate conclusions, particularly in tasks involving visual comprehension and logical inference.

Quick Start & Requirements

  • Install: Clone the repository and set up a Python 3.10 environment using setup.sh for Transformers or pip install -U vllm for vLLM.
  • Prerequisites: Requires CUDA-enabled GPUs. Quantized versions (AWQ) support single-card inference with >30GB VRAM. vLLM integration offers significant speedups.
  • Resources: Inference code is provided for both Transformers and vLLM. vLLM integration on 4x L20Y GPUs achieves ~12.3s for 1k tokens.
  • Links: R1V2 ModelScope, R1V2 Report, R1V Report.

Highlighted Details

  • State-of-the-art performance on text and multimodal reasoning benchmarks (AIME24, LiveCodebench, MMMU, MathVista).
  • Achieves competitive results against proprietary models like GPT-4o and Claude 3.5 Sonnet.
  • Supports inference via vLLM for enhanced speed and efficiency.
  • Offers AWQ quantized versions for reduced VRAM requirements.

Maintenance & Community

The project is actively developed by SkyworkAI. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use, modification, and distribution.

Limitations & Caveats

The README focuses on recent releases (April 2025) and does not detail long-term maintenance plans or potential deprecations. Specific hardware requirements for non-quantized versions are not detailed.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
12
Star History
640 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.