claude-code-local  by nicedreamzapp

Run Claude Code locally on Apple Silicon

Created 2 weeks ago

New!

490 stars

Top 62.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project enables running Claude Code and other large language models entirely locally on Apple Silicon Macs, eliminating cloud dependencies and API fees. It targets users prioritizing privacy, offline capability, and cost savings, offering a full Claude Code experience powered by on-device AI.

How It Works

The core is a custom MLX server that directly interfaces with local models (Gemma, Llama 3.3, Qwen) using Apple's Metal GPU acceleration. By speaking the Anthropic API natively, it bypasses proxy latency, achieving significantly faster inference. The system supports various models optimized for different needs, from quick coding to complex reasoning.

Quick Start & Requirements

  • Install: Run bash setup.sh for a one-command installer, or manually clone, set up Python 3.12+ virtualenv, download models (scripts/download-and-import.sh), and start the server (scripts/start-mlx-server.sh).
  • Prerequisites: Apple Silicon Mac (M1/M2/M3/M4), Python 3.12+, npm install -g @anthropic-ai/claude-code.
  • Resources: Models require 18GB (Gemma) to ~75GB disk space and substantial RAM (32GB minimum for Gemma, 96GB recommended for Llama/Qwen).
  • Docs: Repo Link

Highlighted Details

  • Model Flexibility: Choose from Gemma 4 31B (fast, 18GB RAM), Llama 3.3 70B (reasoning, 75GB RAM, 8-bit abliterated), or Qwen 3.5 122B (max throughput, 75GB RAM, MoE).
  • Multi-Modal Modes: Includes Code, Browser (autonomous agent), Narrative (TTS), and Phone (iMessage media pipeline).
  • Privacy Focus: Guarantees zero outbound network calls, telemetry, or data leakage, ideal for sensitive code and offline use.
  • Performance: Achieves up to 65 tok/s (Qwen 122B on M5 Max) and drastically reduces task completion times (17.6s vs 133s) by eliminating proxy overhead.
  • "Abliterated" Llama: Features a custom 8-bit MLX build of Llama 3.3 70B, suppressing refusals (user responsibility applies).
  • Tool Call Fixes: Enhanced reliability for tool usage through KV cache improvements and recovery logic.

Maintenance & Community

No specific community links (Discord/Slack) or detailed contributor information beyond the primary repository owner and model uploaders are present in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Strictly limited to Apple Silicon Macs. Larger models demand significant RAM (96GB+ recommended for Llama/Qwen). Local models may not match the advanced reasoning capabilities of top-tier cloud offerings. "Abliterated" models require responsible usage and adherence to upstream licenses.

Health Check
Last Commit

4 hours ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
2
Star History
598 stars in the last 17 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
7 more.

executorch by pytorch

0.7%
5k
On-device AI framework for PyTorch inference and training
Created 4 years ago
Updated 7 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Daniel Han Daniel Han(Cofounder of Unsloth), and
18 more.

gpt-oss by openai

0.2%
20k
Open-weight LLMs for reasoning and agents
Created 9 months ago
Updated 2 weeks ago
Feedback? Help us improve.