CPU inference for transformer models via a C++ backend
Top 88.0% on sourcepulse
Cformers provides fast inference for state-of-the-art transformer models on CPUs by leveraging a C/C++ backend. It targets developers and researchers seeking efficient, pre-compressed AI model execution with minimal setup, enabling rapid experimentation and deployment of large language models.
How It Works
The project builds upon llama.cpp
and GGML, utilizing optimized C/C++ inference kernels for CPU execution. This approach focuses on inference speed, pre-compressed models (quantization), and an easy-to-use Python API, abstracting away the complexities of the underlying C++ implementation.
Quick Start & Requirements
pip install transformers wget
git clone https://github.com/nolanoOrg/cformers.git && cd cformers/cformers/cpp && make && cd ../..
from interface import AutoInference as AI
transformers
, wget
. C++ compiler for building the backend.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is still under active development, with features like interactive chat mode and advanced quantization techniques (e.g., GPTQ) marked as upcoming. The current interface is limited to generation, with plans to add support for embeddings, logits, and mid-generation stopping.
1 year ago
1 day