localllm  by GoogleCloudPlatform

CLI tool for running LLMs locally on Cloud Workstations

Created 1 year ago
1,556 stars

Top 26.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a tool, local-llm, for running quantized Large Language Models (LLMs) locally, primarily targeting Google Cloud Workstations. It simplifies the deployment and interaction with LLMs like Llama-2, offering a managed environment for developers and researchers to experiment with these models without complex local setup.

How It Works

The project leverages llama-cpp-python's webserver to serve quantized LLMs. It provides a Dockerfile for building a custom Cloud Workstations image and a CLI tool (local-llm) for managing model downloads, serving, and interaction. The CLI abstracts away the complexities of model loading and API exposure, allowing users to run models with simple commands.

Quick Start & Requirements

  • Installation: pip3 install ./local-llm/. (after cloning the repo)
  • Prerequisites: Google Cloud Project, gcloud CLI, Docker. The setup involves creating Cloud Workstation clusters and configurations, which can take up to 20 minutes.
  • Recommended Machine Type: e2-standard-32 (32 vCPU, 16 core, 128 GB memory).
  • Model Cache: Assumes models are downloaded to ~/.cache/huggingface/hub/ and supports .gguf files.
  • Documentation: OpenAPI documentation for interacting with served models.

Highlighted Details

  • Streamlined deployment on Google Cloud Workstations via a comprehensive gcloud command sequence.
  • local-llm CLI for managing model lifecycle: run, list, ps, kill, pull, rm.
  • Supports specific model files (e.g., quantized versions like Q4_K_S.gguf).
  • Includes a querylocal.py script for direct model interaction.

Maintenance & Community

  • Developed by Google Cloud Platform.
  • No explicit community links (Discord, Slack) or roadmap mentioned in the README.

Licensing & Compatibility

  • The README does not specify a license for the local-llm tool itself.
  • It facilitates running freely available LLMs, implying compatibility with their respective licenses.

Limitations & Caveats

The project is primarily designed for Google Cloud Workstations, and running it "locally" outside this environment might require manual setup of dependencies. The README also includes a disclaimer regarding the verification and liability of content generated by the LLMs.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
4 more.

seldon-core by SeldonIO

0.2%
5k
MLOps framework for production model deployment on Kubernetes
Created 7 years ago
Updated 13 hours ago
Feedback? Help us improve.