bloomz.cpp  by NouamaneTazi

C++ project for BLOOM model inference

created 2 years ago
810 stars

Top 44.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a C++ implementation for running Hugging Face's BLOOM-like language models, enabling efficient inference on various platforms. It targets developers and researchers looking to deploy large language models locally without relying on Python or heavy frameworks. The primary benefit is reduced resource consumption and faster inference times compared to Python-based solutions.

How It Works

Built upon the llama.cpp project, bloomz.cpp leverages the GGML tensor library for efficient computation. It supports BLOOM models loaded via BloomForCausalLM.from_pretrained(). The core approach involves converting Hugging Face model weights into the GGML format, which allows for optimized, CPU-centric inference, with optional quantization to further reduce memory footprint and improve speed.

Quick Start & Requirements

  • Install/Run: Clone the repo, run make to build, then ./main -m <model_path>.
  • Prerequisites: C++ compiler, Python 3, torch, numpy, transformers, accelerate for weight conversion.
  • Resources: Requires model weights conversion. A 7B1 FP16 model quantized to 4-bit uses approximately 4.7 GB disk space and 5.3 GB RAM.
  • Links: Hugging Face Hub for converted weights.

Highlighted Details

  • Supports all BLOOM models loadable via BloomForCausalLM.from_pretrained().
  • Includes a converter tool (Hugging Face Space or local script) for model weights.
  • Offers optional 4-bit quantization for reduced memory usage.
  • Demonstrates inference with detailed output logs and sampling parameters.
  • Includes a proof-of-concept iOS app.

Maintenance & Community

The project is a fork of llama.cpp, inheriting its active development community. Specific community links for bloomz.cpp are not detailed in the README.

Licensing & Compatibility

The project inherits the license of llama.cpp, which is the MIT License. This permits commercial use and linking with closed-source applications.

Limitations & Caveats

The README focuses on inference and conversion; training or fine-tuning capabilities are not mentioned. The iOS app is presented as a proof-of-concept, suggesting potential limitations for production use.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Eugene Yan Eugene Yan(AI Scientist at AWS), and
2 more.

starcoder.cpp by bigcode-project

0.2%
456
C++ example for StarCoder inference
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
5 more.

gemma.cpp by google

0.1%
7k
C++ inference engine for Google's Gemma models
created 1 year ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Feedback? Help us improve.