chatllm.cpp  by foldl

C++ inference for real-time chatting with RAG

created 1 year ago
666 stars

Top 51.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a pure C++ implementation for running various large language models (LLMs) for real-time chatting, supporting both CPU and GPU inference with advanced quantization techniques. It targets developers and researchers looking for efficient, local LLM deployment with features like RAG and continuous chatting, offering a performant alternative to Python-heavy frameworks.

How It Works

The core of ChatLLM.cpp is built upon ggerganov/ggml, leveraging its C++ tensor library for accelerated memory-efficient inference. It employs object-oriented programming to manage similarities across different Transformer-based models, enabling support for a wide range of architectures. Key optimizations include int4/int8 quantization, an optimized KV cache, and parallel computing for enhanced performance.

Quick Start & Requirements

  • Install: git clone --recursive https://github.com/foldl/chatllm.cpp.git && cd chatllm.cpp followed by git submodule update --init --recursive.
  • Dependencies: Python 3.x for model conversion (pip install -r requirements.txt). CMake for building.
  • Model Conversion: Use convert.py to transform Hugging Face models to the project's GGML format.
  • Build: cmake -B build && cmake --build build -j --config Release.
  • Run: ./build/bin/main -m model.bin (interactive mode: ./build/bin/main -m model.bin -i).
  • Docs: docs/models.md

Highlighted Details

  • Supports LLMs from <1B to >300B parameters.
  • Features RAG, LoRA integration, and streaming generation with a typewriter effect.
  • Offers continuous chatting with virtually unlimited context length.
  • Provides Python/JavaScript/C/Nim bindings and a web demo.
  • Supports OpenAI API and CodeGemma models.

Maintenance & Community

This is a hobby project under active development. While feature PRs are not accepted, bug fix PRs are welcome.

Licensing & Compatibility

The project's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The generated .bin file format is different from GGUF used by llama.cpp. The project is a hobbyist endeavor, and PRs for new features are not accepted, potentially impacting future development direction.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
90 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Shawn Wang Shawn Wang(Editor of Latent Space), and
8 more.

llm by rustformers

0%
6k
Rust ecosystem for LLM Rust inference (unmaintained)
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.