Bonsai-demo by PrismML-Eng

Run large language models locally with optimized backends

Created 3 months ago

880 stars

Top 40.1% on SourcePulse

Project Summary

This project provides a streamlined demo and setup process for running Bonsai language models locally across diverse hardware. It targets engineers and researchers seeking an accessible way to deploy LLMs on Mac (Metal, Apple Silicon), Linux, and Windows (CUDA/CPU), offering a unified interface for model inference.

How It Works

The project integrates two primary inference backends: llama.cpp for broad cross-platform compatibility (GGUF format) and MLX for optimized performance on Apple Silicon (MLX format). Crucially, it utilizes custom forks of these projects (PrismML-Eng/llama.cpp, PrismML-Eng/mlx) to incorporate necessary inference kernels not yet available in their upstream versions, enabling immediate functionality.

Quick Start & Requirements

Primary install / run command: ./setup.sh (macOS/Linux) or .\setup.ps1 (Windows).
Non-default prerequisites and dependencies:
- macOS: Xcode CLT.
- Linux: build-essential.
- Windows: Visual Studio Build Tools (for building from source).
- Python: Managed via uv and a virtual environment.
- CUDA toolkit: Required for Linux/Windows GPU acceleration.
- PRISM_HF_TOKEN: Required for downloading models from private HuggingFace repositories.
Estimated setup time or resource footprint: The setup.sh/setup.ps1 script automates dependency installation, environment setup, model downloads, and binary acquisition/compilation, implying a comprehensive initial setup.
Links: Bonsai Demo Website, HuggingFace Collection, Whitepaper, GitHub, Discord (no direct URLs provided in README).

Highlighted Details

Offers Bonsai models in three sizes: 8B, 4B, and 1.7B.
Supports both GGUF (for llama.cpp) and MLX (for MLX) model formats.
The 8B model supports context lengths up to 65,536 tokens, with dynamic KV cache sizing.
Includes scripts for direct inference, running a local chat server, and integration with Open WebUI.

Maintenance & Community

The project maintains custom forks of llama.cpp and MLX, suggesting active development on these core components. Community support is available via Discord.

Licensing & Compatibility

The specific open-source license is not explicitly stated in the provided README, which is a critical omission for due diligence. The project is designed for local execution on macOS (Apple Silicon, Metal), Linux (CUDA, CPU), and Windows (CUDA).

Limitations & Caveats

Requires custom forks of llama.cpp and MLX due to missing upstream kernels. Model downloads necessitate a PRISM_HF_TOKEN, indicating reliance on private HuggingFace repositories. The comprehensive setup script may require careful monitoring on diverse system configurations.

Bonsai-demo by PrismML-Eng

Explore Similar Projects

r1-ktransformers-guide by ubergarm

icebreaker by hecrj

dotLLM by kkokosa

InferLLM by MegEngine

local-studio by sybil-solutions

LLM.swift by eastriverlee

alpaca-electron by ItsPi3141

torchchat by pytorch

text-generation-webui-colab by camenduru

LLamaSharp by SciSharp

OpenLLM by bentoml

llamafile by mozilla-ai