forkd  by deeplethe

Warmed microVM forking in 101ms

Created 3 weeks ago

New!

1,502 stars

Top 26.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

forkd addresses the challenge of rapidly spawning numerous isolated sandboxes, particularly for AI agent fan-out workloads. It targets developers and power users needing to execute many tasks concurrently with minimal overhead, offering KVM-level isolation at speeds approaching native process forking by leveraging warmed parent VM snapshots and copy-on-write memory sharing.

How It Works

The core innovation lies in a "fork-from-warm" approach. A single parent microVM boots and initializes the runtime environment (e.g., Python with dependencies, JIT-warmed JVM, loaded ML models), then pauses and snapshots its state to disk. Subsequent child microVMs are created as Firecracker processes that memory-map this snapshot using MAP_PRIVATE. The Linux kernel's copy-on-write mechanism ensures that memory pages are shared until modified, drastically reducing spawn time and resource consumption compared to cold-booting each VM.

Quick Start & Requirements

  • Installation: Primarily via pip install forkd. Host setup requires sudo bash scripts/setup-host.sh (for KVM, Firecracker, etc.) and sudo bash scripts/netns-setup.sh for network namespaces.
  • Prerequisites: x86_64 Linux with KVM enabled (Ubuntu 22.04+ recommended).
  • Resource Footprint: Pre-built snapshots can be downloaded in seconds. Parent VM size dictates fan-out capacity; a 512 MiB parent supports roughly one active agent per vCPU or 50 idle agents per 8 GiB RAM.
  • Documentation: Key resources include docs/HUB.md (snapshot registry), recipes/README.md (pre-built environments), docs/API.md, and docs/SECURITY.md.

Highlighted Details

  • Performance: Achieves spawning 100 microVMs in 101 ms, with minimal memory overhead (0.12 MiB per child).
  • Isolation: Provides hardware-level isolation via KVM, with per-child network namespaces and cgroup v2 memory limits.
  • Warmed Runtimes: Children inherit all initialized states from the parent, eliminating per-sandbox import or JIT costs.
  • Use Cases: Ideal for AI code interpreters, parallel evaluation harnesses (e.g., SWE-bench), per-user code execution at scale, and untrusted code execution in CI environments.

Maintenance & Community

The project is open-source under Apache 2.0. While specific community channels like Discord/Slack are not detailed, a roadmap (docs/ROADMAP.md) and issue tracking on GitHub are available for engagement and future planning.

Licensing & Compatibility

Licensed under Apache 2.0, permitting commercial use and integration with closed-source projects without copyleft restrictions.

Limitations & Caveats

forkd is currently in Alpha status, with potential for API and on-disk format changes before version 1.0. Key limitations include single-host deployment (no multi-node scheduling), lack of default-deny egress network policies (requiring manual configuration), and absence of CPU, IO, or PID quotas beyond memory limits. Past security advisories highlight the need for users to upgrade to recent versions, and a third-party security audit is pending.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
187
Issues (30d)
41
Star History
1,544 stars in the last 26 days

Explore Similar Projects

Feedback? Help us improve.