ctransformers  by marella

Python bindings for fast Transformer model inference

Created 2 years ago
1,878 stars

Top 23.2% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides Python bindings for Transformer models implemented in C/C++ using the GGML library, targeting developers and researchers working with large language models who need efficient inference. It offers a unified interface for various model architectures and supports GPU acceleration via CUDA, ROCm, and Metal, as well as GPTQ quantization for reduced memory footprint.

How It Works

The core of ctransformers is its C/C++ implementation leveraging the GGML library, which is optimized for efficient tensor operations on CPUs and GPUs. This approach allows for faster inference and lower memory usage compared to pure Python implementations. It supports loading models directly from Hugging Face Hub or local files, with options for specifying model types and files, and provides fine-grained control over generation parameters.

Quick Start & Requirements

  • Install: pip install ctransformers
  • GPU Support: pip install ctransformers[cuda] (CUDA), CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers (ROCm), CT_METAL=1 pip install ctransformers --no-binary ctransformers (Metal)
  • GPTQ Support: pip install ctransformers[gptq]
  • Usage: from ctransformers import AutoModelForCausalLM; llm = AutoModelForCausalLM.from_pretrained("model_path", model_type="gpt2"); print(llm("AI is going to"))
  • Documentation: https://github.com/marella/ctransformers#documentation

Highlighted Details

  • Supports a wide range of models including LLaMA, Falcon, GPT-NeoX, and more.
  • Offers GPU offloading via gpu_layers parameter for accelerated inference.
  • Integrates with LangChain for seamless use in LLM applications.
  • Experimental support for Hugging Face transformers pipeline and tokenizers.

Maintenance & Community

The project is actively maintained by marella and has contributions from a community of developers.

Licensing & Compatibility

Licensed under MIT, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

Experimental features like Hugging Face integration and GPTQ support may have limitations or change. Embedding and context length parameters are not universally supported across all model types.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

matmulfreellm by ridgerchu

0.0%
3k
MatMul-free language models
Created 1 year ago
Updated 1 month ago
Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI).

xDiT by xdit-project

0.7%
2k
Inference engine for parallel Diffusion Transformer (DiT) deployment
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.