OneClickLLAMA by neavo

Easy local LLM deployment and inference

Created 1 year ago

286 stars

Top 91.7% on SourcePulse

Project Summary

This project provides a simplified way to run local Large Language Models (LLMs) like Qwen2.5 and SakuraLLM, targeting users who want to integrate these models with applications that support OpenAI-compatible APIs, such as translation or analysis tools. It aims to offer a performance boost of 3-5x compared to default settings when paired with specific companion applications.

How It Works

The project utilizes pre-quantized LLM models in GGUF format, designed to be run locally. It offers specialized launch scripts (.bat files) tailored for NVIDIA GPUs and a general Vulkan version for broader compatibility. The setup guides users to select models and scripts based on their GPU's VRAM capacity, ensuring efficient loading and execution. The advantage lies in its streamlined deployment and configuration, abstracting away complex LLM setup processes.

Quick Start & Requirements

Download the latest release from the project's release page.
Requires a dedicated GPU with at least 8GB of VRAM; NVIDIA GPUs are recommended for optimal performance.
Ensure the latest graphics card drivers are installed.
Models should be downloaded and placed within the OneClickLLAMA folder.
Launch the appropriate .bat script based on VRAM and model choice.
See Wiki - LinguaGacha_Sakura for setup guides with companion apps.

Highlighted Details

Supports Qwen2.5 and SakuraLLM models.
Offers NVIDIA-specific and Vulkan versions for broader hardware support.
Provides model selection guidance based on VRAM (8GB to 24GB).
Claims 3-5x performance improvement when used with LinguaGacha or KeywordGacha.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (like Discord/Slack) is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that 8B models offer poor performance, and even 14B models are significantly inferior to online APIs. Users must manage VRAM usage carefully to avoid "out of memory" errors, which can lead to abnormal performance or crashes.

OneClickLLAMA by neavo

Explore Similar Projects

KVSplit by dipampaul17

Kolo by MaxHastings

dash-infer by modelscope

Nanoflow by efeslab

aikit by kaito-project

ServerlessLLM by ServerlessLLM

chitu by thu-pacman

LiteRT-LM by google-ai-edge

fastllm by ztxz16

CTranslate2 by OpenNMT

ipex-llm by intel

unsloth by unslothai