Python bindings for fast Transformer model inference
Top 23.7% on sourcepulse
This library provides Python bindings for Transformer models implemented in C/C++ using the GGML library, targeting developers and researchers working with large language models who need efficient inference. It offers a unified interface for various model architectures and supports GPU acceleration via CUDA, ROCm, and Metal, as well as GPTQ quantization for reduced memory footprint.
How It Works
The core of ctransformers
is its C/C++ implementation leveraging the GGML library, which is optimized for efficient tensor operations on CPUs and GPUs. This approach allows for faster inference and lower memory usage compared to pure Python implementations. It supports loading models directly from Hugging Face Hub or local files, with options for specifying model types and files, and provides fine-grained control over generation parameters.
Quick Start & Requirements
pip install ctransformers
pip install ctransformers[cuda]
(CUDA), CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
(ROCm), CT_METAL=1 pip install ctransformers --no-binary ctransformers
(Metal)pip install ctransformers[gptq]
from ctransformers import AutoModelForCausalLM; llm = AutoModelForCausalLM.from_pretrained("model_path", model_type="gpt2"); print(llm("AI is going to"))
Highlighted Details
gpu_layers
parameter for accelerated inference.transformers
pipeline and tokenizers.Maintenance & Community
The project is actively maintained by marella and has contributions from a community of developers.
Licensing & Compatibility
Licensed under MIT, allowing for commercial use and integration into closed-source projects.
Limitations & Caveats
Experimental features like Hugging Face integration and GPTQ support may have limitations or change. Embedding and context length parameters are not universally supported across all model types.
1 year ago
1 week