qwen.cpp by QwenLM

C++ project for Qwen-LM inference (now integrated into llama.cpp)

Created 2 years ago

615 stars

Top 53.6% on SourcePulse

View on GitHub

5 Experts Love This Project

Alex Chen

Cofounder of Nexa AI

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Jon Bratseth

Cofounder of Vespa

Junyang Lin

Core Maintainer at Alibaba Qwen

and 1 more!

Project Summary

This project provides a C++ implementation of the Qwen-LM large language model, designed for efficient, real-time inference on local machines, particularly MacBooks. It targets developers and power users seeking to run Qwen models without relying on Python environments or cloud services, offering a performant, self-contained solution.

How It Works

Leveraging the ggml library, qwen.cpp mirrors the architecture of llama.cpp, enabling it to run quantized LLMs on various hardware. It features a pure C++ tiktoken implementation for tokenization and supports streaming generation with a typewriter effect. The project also includes Python bindings for easier integration.

Quick Start & Requirements

Install: pip install -U qwen-cpp (triggers compilation) or pip install git+https://github.com/QwenLM/qwen.cpp.git@master.
Prerequisites: CMake, C++ compiler, Python. Optional acceleration via OpenBLAS, cuBLAS (NVIDIA GPU), or Metal (Apple Silicon GPU).
Model Conversion: Requires downloading Qwen models and using convert.py to quantize them into GGML format (e.g., q4_0).
Running: ./build/bin/main -m <quantized_model.bin> --tiktoken <tiktoken_file> -p <prompt>
Docs: https://github.com/QwenLM/qwen.cpp

Highlighted Details

Pure C++ implementation based on ggml.
Pure C++ tiktoken implementation, benchmarked to match OpenAI's tiktoken speed.
Supports x86/arm CPU, NVIDIA GPU, and Apple Silicon GPU (Metal).
Python binding available for high-level chat and stream_chat interfaces.

Maintenance & Community

The project is no longer actively maintained, with core features merged into llama.cpp as of December 2023. No further issues, pull requests, or updates are expected for qwen.cpp.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is no longer actively maintained, meaning no bug fixes, feature updates, or support for newer Qwen models or formats beyond what was integrated into llama.cpp. Functionality, efficiency, and device support may lag behind llama.cpp.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days