cformers  by NolanoOrg

CPU inference for transformer models via a C++ backend

created 2 years ago
309 stars

Top 88.0% on sourcepulse

GitHubView on GitHub
Project Summary

Cformers provides fast inference for state-of-the-art transformer models on CPUs by leveraging a C/C++ backend. It targets developers and researchers seeking efficient, pre-compressed AI model execution with minimal setup, enabling rapid experimentation and deployment of large language models.

How It Works

The project builds upon llama.cpp and GGML, utilizing optimized C/C++ inference kernels for CPU execution. This approach focuses on inference speed, pre-compressed models (quantization), and an easy-to-use Python API, abstracting away the complexities of the underlying C++ implementation.

Quick Start & Requirements

  • Install: pip install transformers wget
  • Build: git clone https://github.com/nolanoOrg/cformers.git && cd cformers/cformers/cpp && make && cd ../..
  • Usage: from interface import AutoInference as AI
  • Prerequisites: Python, transformers, wget. C++ compiler for building the backend.
  • Docs: https://github.com/nolanoOrg/cformers

Highlighted Details

  • Supports GPT-J, BLOOM, GPT-NeoX/Pythia, CodeGen, and GPT-2 architectures.
  • Offers Int4 quantization with fixed zero-offset.
  • Active community contributions are encouraged for model quantization and feature development.
  • Planned features include chat mode, prompt engineering tools, and Pybind11 integration for improved performance.

Maintenance & Community

  • Community-driven development with active encouragement for contributions.
  • Discord server available for communication and support: https://discord.gg/HGujTPQtR6

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is still under active development, with features like interactive chat mode and advanced quantization techniques (e.g., GPTQ) marked as upcoming. The current interface is limited to generation, with plans to add support for embeddings, logits, and mid-generation stopping.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Eugene Yan Eugene Yan(AI Scientist at AWS), and
2 more.

starcoder.cpp by bigcode-project

0.2%
456
C++ example for StarCoder inference
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
5 more.

gemma.cpp by google

0.1%
7k
C++ inference engine for Google's Gemma models
created 1 year ago
updated 1 day ago
Feedback? Help us improve.