mpt-30B-inference by abacaj

CPU inference code for MPT-30B

Created 2 years ago

576 stars

Top 56.1% on SourcePulse

3 Experts Love This Project

chiphuyen

Author of "AI Engineering", "Designing Machine Learning Systems"

jaredpalmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

simonw

Coauthor of Django

Project Summary

This repository provides Python code for running inference on the MPT-30B model using only a CPU, targeting users who want to leverage large language models without requiring expensive GPUs. It utilizes a ggml quantized model and the ctransformers Python library for efficient CPU execution.

How It Works

The project leverages ggml, a C library for machine learning that enables efficient tensor operations on CPUs. By using a ggml quantized version of the MPT-30B model, the memory footprint and computational requirements are significantly reduced, making it feasible to run on consumer-grade hardware. The ctransformers library provides Python bindings to ggml, simplifying the integration and inference process.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Download model weights: python download_model.py
Run inference: python inference.py
Prerequisites: Minimum 32GB RAM, Python 3.10 recommended. Docker is also recommended for easier setup.

Highlighted Details

Enables MPT-30B inference on CPU.
Uses ggml quantized model weights (approx. 19GB download).
Relies on the ctransformers Python library.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels is provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires a substantial amount of RAM (32GB minimum). Performance benchmarks or comparisons to GPU inference are not yet available.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

TriForce by Infini-AI-Lab

Framework for lossless acceleration of long sequence generation

Created 1 year ago

Updated 1 year ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

vit.cpp by staghado

C/C++ inference engine for Vision Transformer (ViT) models

Created 2 years ago

Updated 1 year ago

Starred by

Jared Palmer

Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX),

Eugene Yan

Eugene Yan(AI Scientist at AWS), and

2 more.

starcoder.cpp by bigcode-project

C++ example for StarCoder inference

Created 2 years ago

Updated 2 years ago

Starred by

Meng Zhang

Meng Zhang(Cofounder of TabbyML).

crabml by crabml

Llama.cpp compatible inference engine in Rust

Created 2 years ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

minillm by kuleshov

Minimal system for running LLMs on consumer GPUs (research project)

Created 2 years ago

Updated 2 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI), and

1 more.

GPTFast by MDK8888

HF Transformers accelerator for faster inference

Created 1 year ago

Updated 1 year ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django),

Casper Hansen

Casper Hansen(Author of AutoAWQ), and

3 more.

rwkv.cpp by RWKV

CPU inference lib for RWKV language model

Created 2 years ago

Updated 9 months ago

ComfyUI-nunchaku by nunchaku-tech

ComfyUI plugin for efficient 4-bit neural network inference

Created 10 months ago

Updated 2 days ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Ying Sheng

Ying Sheng(Coauthor of SGLang).

GPTQModel by ModelCloud

LLM compression toolkit for accelerated CPU/GPU inference

Created 1 year ago

Updated 1 day ago

Starred by

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect),

Buck Shlegeris

Buck Shlegeris(Cofounder of Redwood Research), and

6 more.

YaLM-100B by yandex

GPT-like neural network for text generation/processing

Created 3 years ago

Updated 2 years ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

4 more.

gemma_pytorch by google

PyTorch implementation for Google's Gemma models

Created 1 year ago

Updated 7 months ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Tim J. Baek

Tim J. Baek(Founder of Open WebUI), and

7 more.

gemma.cpp by google

C++ inference engine for Google's Gemma models

Created 1 year ago

Updated 2 days ago

Feedback? Help us improve.