cformers  by NolanoOrg

CPU inference for transformer models via a C++ backend

Created 2 years ago
309 stars

Top 86.9% on SourcePulse

GitHubView on GitHub
Project Summary

Cformers provides fast inference for state-of-the-art transformer models on CPUs by leveraging a C/C++ backend. It targets developers and researchers seeking efficient, pre-compressed AI model execution with minimal setup, enabling rapid experimentation and deployment of large language models.

How It Works

The project builds upon llama.cpp and GGML, utilizing optimized C/C++ inference kernels for CPU execution. This approach focuses on inference speed, pre-compressed models (quantization), and an easy-to-use Python API, abstracting away the complexities of the underlying C++ implementation.

Quick Start & Requirements

  • Install: pip install transformers wget
  • Build: git clone https://github.com/nolanoOrg/cformers.git && cd cformers/cformers/cpp && make && cd ../..
  • Usage: from interface import AutoInference as AI
  • Prerequisites: Python, transformers, wget. C++ compiler for building the backend.
  • Docs: https://github.com/nolanoOrg/cformers

Highlighted Details

  • Supports GPT-J, BLOOM, GPT-NeoX/Pythia, CodeGen, and GPT-2 architectures.
  • Offers Int4 quantization with fixed zero-offset.
  • Active community contributions are encouraged for model quantization and feature development.
  • Planned features include chat mode, prompt engineering tools, and Pybind11 integration for improved performance.

Maintenance & Community

  • Community-driven development with active encouragement for contributions.
  • Discord server available for communication and support: https://discord.gg/HGujTPQtR6

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is still under active development, with features like interactive chat mode and advanced quantization techniques (e.g., GPTQ) marked as upcoming. The current interface is limited to generation, with plans to add support for embeddings, logits, and mid-generation stopping.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.2%
2k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 17 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.