ctransformers  by marella

Python bindings for fast Transformer model inference

created 2 years ago
1,872 stars

Top 23.7% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides Python bindings for Transformer models implemented in C/C++ using the GGML library, targeting developers and researchers working with large language models who need efficient inference. It offers a unified interface for various model architectures and supports GPU acceleration via CUDA, ROCm, and Metal, as well as GPTQ quantization for reduced memory footprint.

How It Works

The core of ctransformers is its C/C++ implementation leveraging the GGML library, which is optimized for efficient tensor operations on CPUs and GPUs. This approach allows for faster inference and lower memory usage compared to pure Python implementations. It supports loading models directly from Hugging Face Hub or local files, with options for specifying model types and files, and provides fine-grained control over generation parameters.

Quick Start & Requirements

  • Install: pip install ctransformers
  • GPU Support: pip install ctransformers[cuda] (CUDA), CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers (ROCm), CT_METAL=1 pip install ctransformers --no-binary ctransformers (Metal)
  • GPTQ Support: pip install ctransformers[gptq]
  • Usage: from ctransformers import AutoModelForCausalLM; llm = AutoModelForCausalLM.from_pretrained("model_path", model_type="gpt2"); print(llm("AI is going to"))
  • Documentation: https://github.com/marella/ctransformers#documentation

Highlighted Details

  • Supports a wide range of models including LLaMA, Falcon, GPT-NeoX, and more.
  • Offers GPU offloading via gpu_layers parameter for accelerated inference.
  • Integrates with LangChain for seamless use in LLM applications.
  • Experimental support for Hugging Face transformers pipeline and tokenizers.

Maintenance & Community

The project is actively maintained by marella and has contributions from a community of developers.

Licensing & Compatibility

Licensed under MIT, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

Experimental features like Hugging Face integration and GPTQ support may have limitations or change. Embedding and context length parameters are not universally supported across all model types.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
2 more.

matmulfreellm by ridgerchu

0.1%
3k
MatMul-free language models
created 1 year ago
updated 1 week ago
Feedback? Help us improve.