ctransformers by marella

Python bindings for fast Transformer model inference

Created 2 years ago

1,877 stars

Top 22.9% on SourcePulse

View on GitHub

13 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Lukas Biewald

Cofounder of Weights & Biases

Junyang Lin

Core Maintainer at Alibaba Qwen

and 9 more!

Project Summary

This library provides Python bindings for Transformer models implemented in C/C++ using the GGML library, targeting developers and researchers working with large language models who need efficient inference. It offers a unified interface for various model architectures and supports GPU acceleration via CUDA, ROCm, and Metal, as well as GPTQ quantization for reduced memory footprint.

How It Works

The core of ctransformers is its C/C++ implementation leveraging the GGML library, which is optimized for efficient tensor operations on CPUs and GPUs. This approach allows for faster inference and lower memory usage compared to pure Python implementations. It supports loading models directly from Hugging Face Hub or local files, with options for specifying model types and files, and provides fine-grained control over generation parameters.

Quick Start & Requirements

Install: pip install ctransformers
GPU Support: pip install ctransformers[cuda] (CUDA), CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers (ROCm), CT_METAL=1 pip install ctransformers --no-binary ctransformers (Metal)
GPTQ Support: pip install ctransformers[gptq]
Usage: from ctransformers import AutoModelForCausalLM; llm = AutoModelForCausalLM.from_pretrained("model_path", model_type="gpt2"); print(llm("AI is going to"))
Documentation: https://github.com/marella/ctransformers#documentation

Highlighted Details

Supports a wide range of models including LLaMA, Falcon, GPT-NeoX, and more.
Offers GPU offloading via gpu_layers parameter for accelerated inference.
Integrates with LangChain for seamless use in LLM applications.
Experimental support for Hugging Face transformers pipeline and tokenizers.

Maintenance & Community

The project is actively maintained by marella and has contributions from a community of developers.

Licensing & Compatibility

Licensed under MIT, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

Experimental features like Hugging Face integration and GPTQ support may have limitations or change. Embedding and context length parameters are not universally supported across all model types.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days