Discover and explore top open-source AI tools and projects—updated daily.
GoogleCloudPlatformCLI tool for running LLMs locally on Cloud Workstations
Top 26.8% on SourcePulse
This repository provides a tool, local-llm, for running quantized Large Language Models (LLMs) locally, primarily targeting Google Cloud Workstations. It simplifies the deployment and interaction with LLMs like Llama-2, offering a managed environment for developers and researchers to experiment with these models without complex local setup.
How It Works
The project leverages llama-cpp-python's webserver to serve quantized LLMs. It provides a Dockerfile for building a custom Cloud Workstations image and a CLI tool (local-llm) for managing model downloads, serving, and interaction. The CLI abstracts away the complexities of model loading and API exposure, allowing users to run models with simple commands.
Quick Start & Requirements
pip3 install ./local-llm/. (after cloning the repo)gcloud CLI, Docker. The setup involves creating Cloud Workstation clusters and configurations, which can take up to 20 minutes.e2-standard-32 (32 vCPU, 16 core, 128 GB memory).~/.cache/huggingface/hub/ and supports .gguf files.Highlighted Details
gcloud command sequence.local-llm CLI for managing model lifecycle: run, list, ps, kill, pull, rm.Q4_K_S.gguf).querylocal.py script for direct model interaction.Maintenance & Community
Licensing & Compatibility
local-llm tool itself.Limitations & Caveats
The project is primarily designed for Google Cloud Workstations, and running it "locally" outside this environment might require manual setup of dependencies. The README also includes a disclaimer regarding the verification and liability of content generated by the LLMs.
1 year ago
1 day
eastriverlee
bentoml
SeldonIO