mpt-30B-inference  by abacaj

CPU inference code for MPT-30B

created 2 years ago
575 stars

Top 56.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Python code for running inference on the MPT-30B model using only a CPU, targeting users who want to leverage large language models without requiring expensive GPUs. It utilizes a ggml quantized model and the ctransformers Python library for efficient CPU execution.

How It Works

The project leverages ggml, a C library for machine learning that enables efficient tensor operations on CPUs. By using a ggml quantized version of the MPT-30B model, the memory footprint and computational requirements are significantly reduced, making it feasible to run on consumer-grade hardware. The ctransformers library provides Python bindings to ggml, simplifying the integration and inference process.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Download model weights: python download_model.py
  • Run inference: python inference.py
  • Prerequisites: Minimum 32GB RAM, Python 3.10 recommended. Docker is also recommended for easier setup.

Highlighted Details

  • Enables MPT-30B inference on CPU.
  • Uses ggml quantized model weights (approx. 19GB download).
  • Relies on the ctransformers Python library.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels is provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires a substantial amount of RAM (32GB minimum). Performance benchmarks or comparisons to GPU inference are not yet available.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Eugene Yan Eugene Yan(AI Scientist at AWS), and
2 more.

starcoder.cpp by bigcode-project

0.2%
456
C++ example for StarCoder inference
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Ying Sheng Ying Sheng(Author of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
created 2 years ago
updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
6 more.

AutoGPTQ by AutoGPTQ

0.1%
5k
LLM quantization package using GPTQ algorithm
created 2 years ago
updated 3 months ago
Starred by Bojan Tunguz Bojan Tunguz(AI Scientist; Formerly at NVIDIA), Mckay Wrigley Mckay Wrigley(Founder of Takeoff AI), and
8 more.

ggml by ggml-org

0.3%
13k
Tensor library for machine learning
created 2 years ago
updated 3 days ago
Feedback? Help us improve.