CLI tool for running LLMs locally on Cloud Workstations
Top 27.3% on sourcepulse
This repository provides a tool, local-llm
, for running quantized Large Language Models (LLMs) locally, primarily targeting Google Cloud Workstations. It simplifies the deployment and interaction with LLMs like Llama-2, offering a managed environment for developers and researchers to experiment with these models without complex local setup.
How It Works
The project leverages llama-cpp-python
's webserver to serve quantized LLMs. It provides a Dockerfile for building a custom Cloud Workstations image and a CLI tool (local-llm
) for managing model downloads, serving, and interaction. The CLI abstracts away the complexities of model loading and API exposure, allowing users to run models with simple commands.
Quick Start & Requirements
pip3 install ./local-llm/.
(after cloning the repo)gcloud
CLI, Docker. The setup involves creating Cloud Workstation clusters and configurations, which can take up to 20 minutes.e2-standard-32
(32 vCPU, 16 core, 128 GB memory).~/.cache/huggingface/hub/
and supports .gguf
files.Highlighted Details
gcloud
command sequence.local-llm
CLI for managing model lifecycle: run
, list
, ps
, kill
, pull
, rm
.Q4_K_S.gguf
).querylocal.py
script for direct model interaction.Maintenance & Community
Licensing & Compatibility
local-llm
tool itself.Limitations & Caveats
The project is primarily designed for Google Cloud Workstations, and running it "locally" outside this environment might require manual setup of dependencies. The README also includes a disclaimer regarding the verification and liability of content generated by the LLMs.
1 year ago
Inactive