bloomz.cpp by NouamaneTazi

C++ project for BLOOM model inference

Created 2 years ago

808 stars

Top 43.8% on SourcePulse

View on GitHub

8 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Lewis Tunstall

Research Engineer at Hugging Face

and 4 more!

Project Summary

This repository provides a C++ implementation for running Hugging Face's BLOOM-like language models, enabling efficient inference on various platforms. It targets developers and researchers looking to deploy large language models locally without relying on Python or heavy frameworks. The primary benefit is reduced resource consumption and faster inference times compared to Python-based solutions.

How It Works

Built upon the llama.cpp project, bloomz.cpp leverages the GGML tensor library for efficient computation. It supports BLOOM models loaded via BloomForCausalLM.from_pretrained(). The core approach involves converting Hugging Face model weights into the GGML format, which allows for optimized, CPU-centric inference, with optional quantization to further reduce memory footprint and improve speed.

Quick Start & Requirements

Install/Run: Clone the repo, run make to build, then ./main -m <model_path>.
Prerequisites: C++ compiler, Python 3, torch, numpy, transformers, accelerate for weight conversion.
Resources: Requires model weights conversion. A 7B1 FP16 model quantized to 4-bit uses approximately 4.7 GB disk space and 5.3 GB RAM.
Links: Hugging Face Hub for converted weights.

Highlighted Details

Supports all BLOOM models loadable via BloomForCausalLM.from_pretrained().
Includes a converter tool (Hugging Face Space or local script) for model weights.
Offers optional 4-bit quantization for reduced memory usage.
Demonstrates inference with detailed output logs and sampling parameters.
Includes a proof-of-concept iOS app.

Maintenance & Community

The project is a fork of llama.cpp, inheriting its active development community. Specific community links for bloomz.cpp are not detailed in the README.

Licensing & Compatibility

The project inherits the license of llama.cpp, which is the MIT License. This permits commercial use and linking with closed-source applications.

Limitations & Caveats

The README focuses on inference and conversion; training or fine-tuning capabilities are not mentioned. The iOS app is presented as a proof-of-concept, suggesting potential limitations for production use.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days