C++ project for Qwen-LM inference (now integrated into llama.cpp)
Top 55.1% on sourcepulse
This project provides a C++ implementation of the Qwen-LM large language model, designed for efficient, real-time inference on local machines, particularly MacBooks. It targets developers and power users seeking to run Qwen models without relying on Python environments or cloud services, offering a performant, self-contained solution.
How It Works
Leveraging the ggml library, qwen.cpp mirrors the architecture of llama.cpp, enabling it to run quantized LLMs on various hardware. It features a pure C++ tiktoken implementation for tokenization and supports streaming generation with a typewriter effect. The project also includes Python bindings for easier integration.
Quick Start & Requirements
pip install -U qwen-cpp
(triggers compilation) or pip install git+https://github.com/QwenLM/qwen.cpp.git@master
.convert.py
to quantize them into GGML format (e.g., q4_0
)../build/bin/main -m <quantized_model.bin> --tiktoken <tiktoken_file> -p <prompt>
Highlighted Details
Maintenance & Community
The project is no longer actively maintained, with core features merged into llama.cpp as of December 2023. No further issues, pull requests, or updates are expected for qwen.cpp.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is no longer actively maintained, meaning no bug fixes, feature updates, or support for newer Qwen models or formats beyond what was integrated into llama.cpp. Functionality, efficiency, and device support may lag behind llama.cpp.
7 months ago
1 week