qwen.cpp  by QwenLM

C++ project for Qwen-LM inference (now integrated into llama.cpp)

Created 2 years ago
606 stars

Top 54.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a C++ implementation of the Qwen-LM large language model, designed for efficient, real-time inference on local machines, particularly MacBooks. It targets developers and power users seeking to run Qwen models without relying on Python environments or cloud services, offering a performant, self-contained solution.

How It Works

Leveraging the ggml library, qwen.cpp mirrors the architecture of llama.cpp, enabling it to run quantized LLMs on various hardware. It features a pure C++ tiktoken implementation for tokenization and supports streaming generation with a typewriter effect. The project also includes Python bindings for easier integration.

Quick Start & Requirements

  • Install: pip install -U qwen-cpp (triggers compilation) or pip install git+https://github.com/QwenLM/qwen.cpp.git@master.
  • Prerequisites: CMake, C++ compiler, Python. Optional acceleration via OpenBLAS, cuBLAS (NVIDIA GPU), or Metal (Apple Silicon GPU).
  • Model Conversion: Requires downloading Qwen models and using convert.py to quantize them into GGML format (e.g., q4_0).
  • Running: ./build/bin/main -m <quantized_model.bin> --tiktoken <tiktoken_file> -p <prompt>
  • Docs: https://github.com/QwenLM/qwen.cpp

Highlighted Details

  • Pure C++ implementation based on ggml.
  • Pure C++ tiktoken implementation, benchmarked to match OpenAI's tiktoken speed.
  • Supports x86/arm CPU, NVIDIA GPU, and Apple Silicon GPU (Metal).
  • Python binding available for high-level chat and stream_chat interfaces.

Maintenance & Community

The project is no longer actively maintained, with core features merged into llama.cpp as of December 2023. No further issues, pull requests, or updates are expected for qwen.cpp.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is no longer actively maintained, meaning no bug fixes, feature updates, or support for newer Qwen models or formats beyond what was integrated into llama.cpp. Functionality, efficiency, and device support may lag behind llama.cpp.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.