Namo-R1  by lucasjinreal

Compact, CPU-first multimodal AI for diverse applications

Created 1 year ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Namo R1 is an open-source, compact (500M parameter) Visual Language Model (VLM) designed for efficient CPU execution, addressing the accessibility gap for users without high-end GPUs. It offers researchers and developers a powerful, yet lightweight, MLLM solution with a focus on training transparency and future extensibility, aiming to democratize VLM research and deployment.

How It Works

This project introduces Namo R1, a 500M parameter MLLM engineered for exceptional CPU performance. Its core innovations include an architecture optimized for CPU-friendly inference, native support for omni-modal scalability (encompassing future audio capabilities), and complete training transparency. By fully disclosing data curation processes and dynamic curriculum scheduling, Namo R1 facilitates reproducible AI research and development, differentiating itself from many closed-source or less transparent MLLM projects.

Quick Start & Requirements

  • Installation: pip install -U namo
  • Prerequisites: Primarily designed for CPU execution, though GPU usage is supported via torch.cuda.is_available(). No specific OS or hardware constraints beyond standard Python environments are detailed for basic operation.
  • Links:
    • Community Discord: https://discord.gg/5ftPBVspXj

Highlighted Details

  • Achieves performance surpassing SmolVLM and Moondream2 on specific benchmarks for models of comparable size.
  • Features multilingual OCR capabilities (English, Chinese, Japanese, etc.) within its 500M parameter footprint.
  • Supports native dynamic resolution, enhancing robustness with images of varying aspect ratios.
  • Provides full open-source access to all model code, training scripts, and data curation methodologies.
  • Incorporates SigLIP2 as a vision encoder option for enhanced training capabilities.

Maintenance & Community

The project is actively under development, with recent updates including SigLIP2 integration. A community Discord server is available for support and discussion.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license permits broad use, including commercial applications and integration into closed-source projects.

Limitations & Caveats

Current benchmark results are based on a limited set of metrics, with more comprehensive evaluations planned. Some larger model variants (e.g., 700M) are still undergoing training. Users encountering issues with deepspeed should ensure their transformers library is updated to version 4.48 or later.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.