R1-Onevision by Fancy-MLLM

Multimodal LLM for visual reasoning tasks

Created 11 months ago

573 stars

Top 56.3% on SourcePulse

Project Summary

R1-Onevision is a multimodal reasoning large language model designed to tackle complex visual reasoning tasks by integrating visual and textual data. It aims to provide precise interpretations for domains like mathematics, science, and logical reasoning, serving as a powerful AI assistant for problem-solving.

How It Works

The model employs a cross-modal reasoning pipeline that transforms images into formal textual representations, enabling language-based reasoning. This approach is facilitated by the R1-Onevision dataset, which contains detailed, step-by-step multimodal reasoning annotations. The model is further developed through supervised fine-tuning and reinforcement learning to enhance reasoning and generalization abilities.

Quick Start & Requirements

Install via Hugging Face transformers library.
Requires Python, PyTorch, and Hugging Face libraries.
GPU with CUDA support is recommended for inference.
Official Demo: https://huggingface.co/spaces/Fancy-MLLM/R1-Onevision-7B-demo
Model Weights: https://huggingface.co/Fancy-MLLM/R1-Onevision-7B

Highlighted Details

Fine-tuned from Qwen2.5-VL on the R1-Onevision dataset.
R1-Onevision-Bench benchmark is aligned with human educational stages.
Dataset includes diverse domains: natural scenes, science, math, OCR, charts.
Supports deep CoT reasoning.

Maintenance & Community

Project is actively updated with new versions of dataset, models, and benchmark.
Developed by Zhejiang University.
Open to ideas and contributions.

Licensing & Compatibility

The README does not explicitly state the license.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research artifact with recent releases, suggesting it may still be in an experimental or evolving stage. Specific limitations or unsupported features are not detailed in the README.

R1-Onevision by Fancy-MLLM

Explore Similar Projects

Lumina-mGPT by Alpha-VLLM

Thyme by yfzhang114

gill by kohjingyu

Qwen3-SmVL by ShaohonChen

OMG-Seg by lxtGH

Seed1.5-VL by ByteDance-Seed

Awesome-RL-based-Reasoning-MLLMs by Sun-Haoyuan23

Show-o by showlab

Ovis by AIDC-AI

ml-mgie by apple

DeepSeek-VL by deepseek-ai

DeepSeek-VL2 by deepseek-ai