Skywork-R1V by SkyworkAI

Multimodal model for advanced visual/text reasoning, using chain-of-thought

Created 1 year ago

3,160 stars

Top 14.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

Skywork-R1V is an open-source multimodal reasoning model designed for advanced visual and text-based thinking. It targets researchers and developers working on AI systems that require understanding and reasoning across both images and text, offering state-of-the-art performance on various benchmarks.

How It Works

The model leverages a hybrid reinforcement learning approach with Chain-of-Thought (CoT) prompting to achieve sophisticated multimodal reasoning. This methodology allows the model to break down complex problems, generate intermediate reasoning steps, and arrive at more accurate conclusions, particularly in tasks involving visual comprehension and logical inference.

Quick Start & Requirements

Install: Clone the repository and set up a Python 3.10 environment using setup.sh for Transformers or pip install -U vllm for vLLM.
Prerequisites: Requires CUDA-enabled GPUs. Quantized versions (AWQ) support single-card inference with >30GB VRAM. vLLM integration offers significant speedups.
Resources: Inference code is provided for both Transformers and vLLM. vLLM integration on 4x L20Y GPUs achieves ~12.3s for 1k tokens.
Links: R1V2 ModelScope, R1V2 Report, R1V Report.

Highlighted Details

State-of-the-art performance on text and multimodal reasoning benchmarks (AIME24, LiveCodebench, MMMU, MathVista).
Achieves competitive results against proprietary models like GPT-4o and Claude 3.5 Sonnet.
Supports inference via vLLM for enhanced speed and efficiency.
Offers AWQ quantized versions for reduced VRAM requirements.

Maintenance & Community

The project is actively developed by SkyworkAI. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive for commercial use, modification, and distribution.

Limitations & Caveats

The README focuses on recent releases (April 2025) and does not detail long-term maintenance plans or potential deprecations. Specific hardware requirements for non-quantized versions are not detailed.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days