cformers by NolanoOrg

CPU inference for transformer models via a C++ backend

Created 2 years ago

309 stars

Top 87.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Meng Zhang

Cofounder of TabbyML

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

Cformers provides fast inference for state-of-the-art transformer models on CPUs by leveraging a C/C++ backend. It targets developers and researchers seeking efficient, pre-compressed AI model execution with minimal setup, enabling rapid experimentation and deployment of large language models.

How It Works

The project builds upon llama.cpp and GGML, utilizing optimized C/C++ inference kernels for CPU execution. This approach focuses on inference speed, pre-compressed models (quantization), and an easy-to-use Python API, abstracting away the complexities of the underlying C++ implementation.

Quick Start & Requirements

Install: pip install transformers wget
Build: git clone https://github.com/nolanoOrg/cformers.git && cd cformers/cformers/cpp && make && cd ../..
Usage: from interface import AutoInference as AI
Prerequisites: Python, transformers, wget. C++ compiler for building the backend.
Docs: https://github.com/nolanoOrg/cformers

Highlighted Details

Supports GPT-J, BLOOM, GPT-NeoX/Pythia, CodeGen, and GPT-2 architectures.
Offers Int4 quantization with fixed zero-offset.
Active community contributions are encouraged for model quantization and feature development.
Planned features include chat mode, prompt engineering tools, and Pybind11 integration for improved performance.

Maintenance & Community

Community-driven development with active encouragement for contributions.
Discord server available for communication and support: https://discord.gg/HGujTPQtR6

Licensing & Compatibility

MIT License.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is still under active development, with features like interactive chat mode and advanced quantization techniques (e.g., GPTQ) marked as upcoming. The current interface is limited to generation, with plans to add support for embeddings, logits, and mid-generation stopping.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days