claude-code-local by nicedreamzapp

Run Claude Code locally on Apple Silicon

Created 3 months ago

2,931 stars

Top 15.7% on SourcePulse

Project Summary

Summary

This project enables running Claude Code and other large language models entirely locally on Apple Silicon Macs, eliminating cloud dependencies and API fees. It targets users prioritizing privacy, offline capability, and cost savings, offering a full Claude Code experience powered by on-device AI.

How It Works

The core is a custom MLX server that directly interfaces with local models (Gemma, Llama 3.3, Qwen) using Apple's Metal GPU acceleration. By speaking the Anthropic API natively, it bypasses proxy latency, achieving significantly faster inference. The system supports various models optimized for different needs, from quick coding to complex reasoning.

Quick Start & Requirements

Install: Run bash setup.sh for a one-command installer, or manually clone, set up Python 3.12+ virtualenv, download models (scripts/download-and-import.sh), and start the server (scripts/start-mlx-server.sh).
Prerequisites: Apple Silicon Mac (M1/M2/M3/M4), Python 3.12+, npm install -g @anthropic-ai/claude-code.
Resources: Models require 18GB (Gemma) to ~75GB disk space and substantial RAM (32GB minimum for Gemma, 96GB recommended for Llama/Qwen).
Docs: Repo Link

Highlighted Details

Model Flexibility: Choose from Gemma 4 31B (fast, 18GB RAM), Llama 3.3 70B (reasoning, 75GB RAM, 8-bit abliterated), or Qwen 3.5 122B (max throughput, 75GB RAM, MoE).
Multi-Modal Modes: Includes Code, Browser (autonomous agent), Narrative (TTS), and Phone (iMessage media pipeline).
Privacy Focus: Guarantees zero outbound network calls, telemetry, or data leakage, ideal for sensitive code and offline use.
Performance: Achieves up to 65 tok/s (Qwen 122B on M5 Max) and drastically reduces task completion times (17.6s vs 133s) by eliminating proxy overhead.
"Abliterated" Llama: Features a custom 8-bit MLX build of Llama 3.3 70B, suppressing refusals (user responsibility applies).
Tool Call Fixes: Enhanced reliability for tool usage through KV cache improvements and recovery logic.

Maintenance & Community

No specific community links (Discord/Slack) or detailed contributor information beyond the primary repository owner and model uploaders are present in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Strictly limited to Apple Silicon Macs. Larger models demand significant RAM (96GB+ recommended for Llama/Qwen). Local models may not match the advanced reasoning capabilities of top-tier cloud offerings. "Abliterated" models require responsible usage and adherence to upstream licenses.

claude-code-local by nicedreamzapp

Explore Similar Projects

Silicon-Studio by rileycleavenger

vllm-swift by TheTom

on-device-browser-agent by RunanywhereAI

claude-in-mobile by AlexGladkov

agent-sessions by jazzyalex

Agent by macOS26

TinyAgent by SqueezeAILab

mac_computer_use by deedy

ax by ax-llm

browser-control by keon

FastDeploy by PaddlePaddle

executorch by pytorch