OneClickLLAMA  by neavo

Easy local LLM deployment and inference

Created 1 year ago
259 stars

Top 97.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a simplified way to run local Large Language Models (LLMs) like Qwen2.5 and SakuraLLM, targeting users who want to integrate these models with applications that support OpenAI-compatible APIs, such as translation or analysis tools. It aims to offer a performance boost of 3-5x compared to default settings when paired with specific companion applications.

How It Works

The project utilizes pre-quantized LLM models in GGUF format, designed to be run locally. It offers specialized launch scripts (.bat files) tailored for NVIDIA GPUs and a general Vulkan version for broader compatibility. The setup guides users to select models and scripts based on their GPU's VRAM capacity, ensuring efficient loading and execution. The advantage lies in its streamlined deployment and configuration, abstracting away complex LLM setup processes.

Quick Start & Requirements

  • Download the latest release from the project's release page.
  • Requires a dedicated GPU with at least 8GB of VRAM; NVIDIA GPUs are recommended for optimal performance.
  • Ensure the latest graphics card drivers are installed.
  • Models should be downloaded and placed within the OneClickLLAMA folder.
  • Launch the appropriate .bat script based on VRAM and model choice.
  • See Wiki - LinguaGacha_Sakura for setup guides with companion apps.

Highlighted Details

  • Supports Qwen2.5 and SakuraLLM models.
  • Offers NVIDIA-specific and Vulkan versions for broader hardware support.
  • Provides model selection guidance based on VRAM (8GB to 24GB).
  • Claims 3-5x performance improvement when used with LinguaGacha or KeywordGacha.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (like Discord/Slack) is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that 8B models offer poor performance, and even 14B models are significantly inferior to online APIs. Users must manage VRAM usage carefully to avoid "out of memory" errors, which can lead to abnormal performance or crashes.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 1 week ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
36 more.

unsloth by unslothai

0.6%
46k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 16 hours ago
Feedback? Help us improve.