chatllm.cpp by foldl

C++ inference for real-time chatting with RAG

Created 2 years ago

762 stars

Top 45.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

This project provides a pure C++ implementation for running various large language models (LLMs) for real-time chatting, supporting both CPU and GPU inference with advanced quantization techniques. It targets developers and researchers looking for efficient, local LLM deployment with features like RAG and continuous chatting, offering a performant alternative to Python-heavy frameworks.

How It Works

The core of ChatLLM.cpp is built upon ggerganov/ggml, leveraging its C++ tensor library for accelerated memory-efficient inference. It employs object-oriented programming to manage similarities across different Transformer-based models, enabling support for a wide range of architectures. Key optimizations include int4/int8 quantization, an optimized KV cache, and parallel computing for enhanced performance.

Quick Start & Requirements

Install: git clone --recursive https://github.com/foldl/chatllm.cpp.git && cd chatllm.cpp followed by git submodule update --init --recursive.
Dependencies: Python 3.x for model conversion (pip install -r requirements.txt). CMake for building.
Model Conversion: Use convert.py to transform Hugging Face models to the project's GGML format.
Build: cmake -B build && cmake --build build -j --config Release.
Run: ./build/bin/main -m model.bin (interactive mode: ./build/bin/main -m model.bin -i).
Docs: docs/models.md

Highlighted Details

Supports LLMs from <1B to >300B parameters.
Features RAG, LoRA integration, and streaming generation with a typewriter effect.
Offers continuous chatting with virtually unlimited context length.
Provides Python/JavaScript/C/Nim bindings and a web demo.
Supports OpenAI API and CodeGemma models.

Maintenance & Community

This is a hobby project under active development. While feature PRs are not accepted, bug fix PRs are welcome.

Licensing & Compatibility

The project's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The generated .bin file format is different from GGUF used by llama.cpp. The project is a hobbyist endeavor, and PRs for new features are not accepted, potentially impacting future development direction.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days