qwen.cpp  by QwenLM

C++ project for Qwen-LM inference (now integrated into llama.cpp)

created 1 year ago
602 stars

Top 55.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a C++ implementation of the Qwen-LM large language model, designed for efficient, real-time inference on local machines, particularly MacBooks. It targets developers and power users seeking to run Qwen models without relying on Python environments or cloud services, offering a performant, self-contained solution.

How It Works

Leveraging the ggml library, qwen.cpp mirrors the architecture of llama.cpp, enabling it to run quantized LLMs on various hardware. It features a pure C++ tiktoken implementation for tokenization and supports streaming generation with a typewriter effect. The project also includes Python bindings for easier integration.

Quick Start & Requirements

  • Install: pip install -U qwen-cpp (triggers compilation) or pip install git+https://github.com/QwenLM/qwen.cpp.git@master.
  • Prerequisites: CMake, C++ compiler, Python. Optional acceleration via OpenBLAS, cuBLAS (NVIDIA GPU), or Metal (Apple Silicon GPU).
  • Model Conversion: Requires downloading Qwen models and using convert.py to quantize them into GGML format (e.g., q4_0).
  • Running: ./build/bin/main -m <quantized_model.bin> --tiktoken <tiktoken_file> -p <prompt>
  • Docs: https://github.com/QwenLM/qwen.cpp

Highlighted Details

  • Pure C++ implementation based on ggml.
  • Pure C++ tiktoken implementation, benchmarked to match OpenAI's tiktoken speed.
  • Supports x86/arm CPU, NVIDIA GPU, and Apple Silicon GPU (Metal).
  • Python binding available for high-level chat and stream_chat interfaces.

Maintenance & Community

The project is no longer actively maintained, with core features merged into llama.cpp as of December 2023. No further issues, pull requests, or updates are expected for qwen.cpp.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is no longer actively maintained, meaning no bug fixes, feature updates, or support for newer Qwen models or formats beyond what was integrated into llama.cpp. Functionality, efficiency, and device support may lag behind llama.cpp.

Health Check
Last commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Shawn Wang Shawn Wang(Editor of Latent Space), and
8 more.

llm by rustformers

0%
6k
Rust ecosystem for LLM Rust inference (unmaintained)
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.