starcoder.cpp  by bigcode-project

C++ example for StarCoder inference

Created 2 years ago
456 stars

Top 66.4% on SourcePulse

GitHubView on GitHub
Project Summary

This C++ project provides a CPU-based inference engine for the StarCoder family of large language models, leveraging the ggml library. It targets developers and researchers seeking to run these powerful coding models on standard hardware without requiring a GPU, offering a more accessible path for experimentation and deployment.

How It Works

The project implements StarCoder inference in C++ using the ggml library, a tensor library optimized for CPU execution. This approach allows for efficient model loading and inference directly on the CPU, eliminating the need for specialized hardware like GPUs. The use of ggml also facilitates model quantization, significantly reducing memory footprint and improving inference speed on consumer-grade machines.

Quick Start & Requirements

  • Install: Clone the repository and build using make.
  • Prerequisites: Python and the transformers library are needed for model conversion.
  • Model Conversion: Use python convert-hf-to-ggml.py <hf_model_name> to convert Hugging Face models.
  • Quantization: Use ./quantize <ggml_model_path> <output_path> <quantization_type> to quantize models (e.g., 4-bit integer).
  • Inference: Run ./bin/starcoder -m <quantized_model_path> -p "<prompt>" for inference.
  • Demo: See example usage in the README for running inference with specific parameters.

Highlighted Details

  • Supports multiple StarCoder variants: bigcode/starcoder, bigcode/gpt_bigcode-santacoder, and HuggingFaceH4/starchat-beta.
  • Enables CPU-only inference, making large models accessible without GPUs.
  • Includes model quantization (e.g., 4-bit integer) to reduce size and improve performance.
  • Offers a proof-of-concept iOS app for on-device inference.

Maintenance & Community

The project is part of the BigCode community initiative. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README snippet. Compatibility for commercial use or closed-source linking would depend on the underlying ggml library license and the StarCoder model licenses.

Limitations & Caveats

Performance benchmarks are marked as "TODO" and are not yet available. The project is presented as a C++ example, suggesting it may be experimental or under active development.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
1 more.

mpt-30B-inference by abacaj

0%
575
CPU inference code for MPT-30B
Created 2 years ago
Updated 2 years ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.2%
2k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 18 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.