tiny-qwen  by Emericen

Minimal multimodal AI model re-implementation

Created 1 year ago
300 stars

Top 88.9% on SourcePulse

GitHubView on GitHub
Project Summary

A minimal, easy-to-read PyTorch re-implementation of the Qwen3 VL model, targeting engineers and researchers seeking a clear, foundational understanding or a flexible base for multimodal AI projects. It simplifies access to Qwen3 VL's text and vision capabilities, providing a streamlined alternative to official implementations.

How It Works

The project reconstructs Qwen3 VL's architecture in PyTorch with an emphasis on code readability and minimal dependencies. It supports both text and vision inputs, processing them through a model that can utilize dense or Mixture of Experts (MoE) configurations. This design choice facilitates easier comprehension of the underlying mechanisms and allows for straightforward experimentation with multimodal transformer models.

Quick Start & Requirements

  • Installation: Set up a virtual environment using uv (pip install uv, uv venv, source .venv/bin/activate), then install project dependencies with uv pip install -r requirements.txt.
  • Prerequisites: Python, PyTorch. The code examples utilize huggingface_hub for model weights and PIL for image handling. CUDA is recommended for optimal performance.
  • Running: Initiate an interactive chat session via the command python run.py. Images can be referenced within prompts using the @relative/path/to/image.jpg syntax.
  • Documentation: Code examples demonstrating model loading, processing, and generation are included directly in the README.

Highlighted Details

  • Provides a minimal, highly readable PyTorch re-implementation of Qwen3 VL.
  • Fully supports multimodal input, processing both text and vision data.
  • Accommodates both dense and Mixture of Experts (MoE) model variants.
  • Features a user-friendly "fancy CLI" for direct interaction and chat.
  • Code examples showcase programmatic use with model.generate and model.generate_stream.

Maintenance & Community

  • A dedicated Discord channel is available for community discussions and support.
  • The project is actively maintained by the developer Emericen.

Licensing & Compatibility

  • The specific open-source license for this repository is not detailed in the provided README content.

Limitations & Caveats

  • As a re-implementation, it may not perfectly mirror the performance characteristics or specific optimizations of the official Qwen3 VL.
  • The README directs users to a separate branch for Qwen3 (text-only) and Qwen2.5 VL support.
  • Integration with other models like DeepSeek R1 is explicitly handled by different repositories.
Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
23 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.