3FS  by deepseek-ai

Distributed file system for AI training/inference workloads

created 5 months ago
9,176 stars

Top 5.6% on sourcepulse

GitHubView on GitHub
Project Summary

Fire-Flyer File System (3FS) is a high-performance distributed file system engineered for AI training and inference workloads. It offers a disaggregated architecture, strong consistency via Chain Replication with Apportioned Queries (CRAQ), and familiar file interfaces, simplifying development for distributed applications.

How It Works

3FS employs a disaggregated architecture, pooling thousands of SSDs and hundreds of storage nodes to provide locality-oblivious storage access. Its metadata services are stateless, backed by a transactional key-value store like FoundationDB, ensuring strong consistency. This design aims to deliver high throughput and simplify reasoning for complex AI workloads.

Quick Start & Requirements

  • Install: Clone the repository, update submodules (git submodule update --init --recursive), and apply patches (./patches/apply.sh).
  • Dependencies: Requires cmake, libuv1-dev, liblz4-dev, liblzma-dev, libdouble-conversion-dev, libdwarf-dev, libunwind-dev, libaio-dev, libgflags-dev, libgoogle-glog-dev, libgtest-dev, libgmock-dev, clang-format-14, clang-14, clang-tidy-14, lld-14, libgoogle-perftools-dev, google-perftools, libssl-dev, gcc-10/gcc-12, libboost (version varies by Ubuntu), build-essential, libfuse 3.16.1+, FoundationDB 7.1+, and Rust toolchain (1.75.0+). Docker images are available for TencentOS-4 and OpenCloudOS-9.
  • Build: Use cmake with specific compiler flags (e.g., clang++-14) and cmake --build build.
  • Docs: Design Notes, Setup Guide, USRBIO API Reference.

Highlighted Details

  • Achieved 6.6 TiB/s aggregate read throughput on a 180-node cluster with 200Gbps InfiniBand and NVMe SSDs.
  • GraySort benchmark sorted 110.5 TiB in 30 minutes and 14 seconds (3.66 TiB/min) on a 25-node storage cluster.
  • KVCache for LLM inference demonstrated peak read throughput up to 40 GiB/s.
  • Supports familiar file interfaces, eliminating the need for new APIs.

Maintenance & Community

  • Project hosted on GitHub: deepseek-ai/3fs.
  • Issue reporting via GitHub Issues.

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

The setup process involves a significant number of system-level dependencies and specific compiler versions, potentially leading to complex environment configuration. FoundationDB and a Rust toolchain are mandatory prerequisites.

Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
8
Star History
401 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.