mpt-30B-inference  by abacaj

CPU inference code for MPT-30B

Created 2 years ago
575 stars

Top 56.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Python code for running inference on the MPT-30B model using only a CPU, targeting users who want to leverage large language models without requiring expensive GPUs. It utilizes a ggml quantized model and the ctransformers Python library for efficient CPU execution.

How It Works

The project leverages ggml, a C library for machine learning that enables efficient tensor operations on CPUs. By using a ggml quantized version of the MPT-30B model, the memory footprint and computational requirements are significantly reduced, making it feasible to run on consumer-grade hardware. The ctransformers library provides Python bindings to ggml, simplifying the integration and inference process.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Download model weights: python download_model.py
  • Run inference: python inference.py
  • Prerequisites: Minimum 32GB RAM, Python 3.10 recommended. Docker is also recommended for easier setup.

Highlighted Details

  • Enables MPT-30B inference on CPU.
  • Uses ggml quantized model weights (approx. 19GB download).
  • Relies on the ctransformers Python library.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels is provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires a substantial amount of RAM (32GB minimum). Performance benchmarks or comparisons to GPU inference are not yet available.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Eugene Yan Eugene Yan(AI Scientist at AWS), and
2 more.

starcoder.cpp by bigcode-project

0%
456
C++ example for StarCoder inference
Created 2 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jeremy Howard Jeremy Howard(Cofounder of fast.ai).

GPTFast by MDK8888

0%
687
HF Transformers accelerator for faster inference
Created 1 year ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
7 more.

gemma.cpp by google

0.1%
7k
C++ inference engine for Google's Gemma models
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.