Discover and explore top open-source AI tools and projects—updated daily.
NouamaneTaziC++ project for BLOOM model inference
Top 43.8% on SourcePulse
This repository provides a C++ implementation for running Hugging Face's BLOOM-like language models, enabling efficient inference on various platforms. It targets developers and researchers looking to deploy large language models locally without relying on Python or heavy frameworks. The primary benefit is reduced resource consumption and faster inference times compared to Python-based solutions.
How It Works
Built upon the llama.cpp project, bloomz.cpp leverages the GGML tensor library for efficient computation. It supports BLOOM models loaded via BloomForCausalLM.from_pretrained(). The core approach involves converting Hugging Face model weights into the GGML format, which allows for optimized, CPU-centric inference, with optional quantization to further reduce memory footprint and improve speed.
Quick Start & Requirements
make to build, then ./main -m <model_path>.torch, numpy, transformers, accelerate for weight conversion.Highlighted Details
BloomForCausalLM.from_pretrained().Maintenance & Community
The project is a fork of llama.cpp, inheriting its active development community. Specific community links for bloomz.cpp are not detailed in the README.
Licensing & Compatibility
The project inherits the license of llama.cpp, which is the MIT License. This permits commercial use and linking with closed-source applications.
Limitations & Caveats
The README focuses on inference and conversion; training or fine-tuning capabilities are not mentioned. The iOS app is presented as a proof-of-concept, suggesting potential limitations for production use.
2 years ago
Inactive
Maknee
trymirai
bigcode-project
monatis
huggingface
YavorGIvanov
vllm-project
google