USB-Uncensored-LLM  by techjarves

Portable, air-gapped LLM inference environment

Created 1 month ago
1,342 stars

Top 29.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

USB-Uncensored-LLM provides a zero-install, portable, air-gapped local AI environment for running uncensored Large Language Models (LLMs) directly from a USB drive or SSD. It targets users requiring privacy-first, cross-platform LLM access without complex setup or internet connectivity, offering seamless execution across Windows, macOS, and Linux.

How It Works

The project employs a zero-dependency architecture featuring portable Python and isolated engine binaries. It utilizes a custom-compiled Ollama engine that dynamically leverages host hardware acceleration (AVX, CUDA, Metal). A unified "Shared volume" system allows models downloaded once to be accessed natively across different operating systems, minimizing storage footprint. A Python HTTP server provides a dark-mode UI accessible locally or via LAN.

Quick Start & Requirements

  • Install/Run: Initialize the OS-specific engine via install scripts (Windows/install.bat, Mac/install.command, Linux/install.sh, Android/install.sh), download models (preferably via Windows script or manually to Shared/models), then launch via OS-specific start scripts.
  • Prerequisites:
    • Storage: 8GB minimum USB 3.0+ drive/SSD (16GB recommended).
    • RAM: 8GB for 2B/4B models, 16GB for 9B/12B models.
    • Android: Termux (F-Droid), 6GB+ RAM (8GB+ recommended), ARM64.
  • Links: Demo Video: https://youtu.be/60PSXsoXc8A

Highlighted Details

  • Zero Dependency: No system permissions, registry edits, or package managers required.
  • Cross-Platform: Unified model storage for seamless use across Windows, macOS, and Linux.
  • Censorship Free: Integrates "ablative" and "heretic" fine-tuned models for unfiltered interactions.
  • Network UI: Python HTTP server provides a dark-mode UI accessible via LAN from mobile devices.
  • Hardware Acceleration: Dynamically utilizes AVX, NVIDIA CUDA, or Apple Metal.
  • Model Library: Ships with Gemma 2 2B Abliterated, Gemma 4 E4B Ultra Uncensored Heretic, Qwen 3.5 9B Uncensored Aggressive, supports custom GGUF downloads.
  • Android Native: Runs directly on Android via Termux.

Maintenance & Community

No specific details on contributors, sponsorships, or community channels (e.g., Discord, Slack) are provided in the README.

Licensing & Compatibility

The README does not specify a software license, which may impact commercial use or integration with other projects.

Limitations & Caveats

Performance varies significantly with host hardware; Android generation speeds are notably slower (~3-10 tokens/sec). Insufficient host RAM for the selected model will lead to slow generation or failure. Windows users may need to adjust firewall settings for LAN access. The project's "ablative" models are designed for unfiltered output, requiring responsible user discretion.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
13
Star History
758 stars in the last 30 days

Explore Similar Projects

Starred by Sourabh Bajaj Sourabh Bajaj(Cofounder of Uplimit), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

NextChat by ChatGPTNextWeb

0.1%
88k
AI assistant for web, iOS, MacOS, Android, Linux, and Windows
Created 3 years ago
Updated 1 week ago
Feedback? Help us improve.