dockerLLM by TheBlokeAI

Docker templates for local LLMs

Created 2 years ago

308 stars

Top 87.3% on SourcePulse

Project Summary

This repository provides Dockerfiles for deploying large language models (LLMs) via the text-generation-webui interface, targeting users who want a streamlined, pre-configured environment for running various LLM backends. It simplifies the setup of complex AI inference environments, particularly on platforms like Runpod.

How It Works

The project leverages Docker to encapsulate text-generation-webui and its dependencies, including optimized backends like AutoGPTQ, ExLlama, and GGML. This approach ensures consistent environments and simplifies deployment across different hardware configurations, especially those with NVIDIA GPUs. Key advantages include automatic updates of ExLlama and text-generation-webui on boot, and support for multiple GPU acceleration methods.

Quick Start & Requirements

Install/Run: Deploy via Runpod template links provided in the README.
Prerequisites: NVIDIA GPU with CUDA 12.1.1 (though container name may reflect older versions).
Setup: Deployment via Runpod templates is typically quick, with model loading configured via environment variables.
Docs: Runpod: TheBloke's Local LLMs UI and Runpod: TheBloke's Local LLMs UI & API.

Highlighted Details

Supports Mixtral, Llama 2 (including 70B), and GPTQ models.
Integrates AutoGPTQ, ExLlama (2x faster for Llama 4bit GPTQs), and CUDA-accelerated GGML.
Includes all text-generation-webui extensions (Chat, SuperBooga, Whisper).
Automatic model download/loading via MODEL env var; UI parameters via UI_ARGS.

Maintenance & Community

The project is maintained by "TheBloke," a prominent figure in the LLM community known for quantizing and distributing models. Updates are frequent, indicating active maintenance.

Licensing & Compatibility

The repository contains Dockerfiles, which are generally permissive. However, the underlying software (text-generation-webui, LLM libraries) will have their own licenses. Compatibility for commercial use depends on the licenses of the included LLM frameworks and models.

Limitations & Caveats

The container naming convention may not always reflect the latest CUDA version used internally, which could be confusing. Support is primarily focused on Runpod instances, though the Dockerfiles themselves could potentially be adapted for other environments.

dockerLLM by TheBlokeAI

Explore Similar Projects

Kolo by MaxHastings

cog-llama-template by replicate

ollama-intel-gpu by mattcurf

LLM.swift by eastriverlee

BrowserAI by sauravpanda

alpaca-electron by ItsPi3141

harbor by av

mnn-llm by wangzhaode

torchchat by pytorch

outlines by dottxt-ai

OpenLLM by bentoml

ollama by ollama