Discover and explore top open-source AI tools and projects—updated daily.
perplexityaiHigh-performance LLM inference engine
New!
Top 94.4% on SourcePulse
Perplexity AI's pplx-garden provides an open-source toolkit for high-performance LLM inference, specifically addressing RDMA point-to-point communication for Mixture-of-Experts (MoE) systems. It enables researchers and engineers to optimize inter-node communication, reducing latency and improving throughput for large-scale AI deployments. The project offers a novel approach to P2P MoE dispatch and combine kernels, aiming to accelerate LLM inference workloads.
How It Works
The core of pplx-garden is its RDMA TransferEngine library, designed for efficient inter-node data transfer in LLM systems. It implements P2P MoE dispatch/combine kernels, optimizing for decode operations while supporting prefill. The system utilizes NVLink for intra-node communication and RDMA for inter-node transfers, supporting NVIDIA ConnectX-7 and AWS EFA NICs. A key design choice is splitting send and receive stages to enable micro-batching and achieve SM-free RDMA transfers, further enhancing performance and efficiency.
Quick Start & Requirements
docker build -t pplx-garden-dev - < docker/dev.Dockerfile, ./scripts/run-docker.sh). Python wheels can be built and installed using python3 -m build --wheel and python3 -m pip install /app/dist/*.whl after setting TORCH_CMAKE_PREFIX_PATH.libfabric, libibverbs, GDRCopy. Requires an RDMA network with GPUDirect RDMA support, where each GPU has at least one dedicated RDMA NIC. SYS_PTRACE and SYS_ADMIN capabilities are also needed.https://arxiv.org/abs/2510.27656.Highlighted Details
TransferEngine, and optimized P2P MoE dispatch/combine kernels.Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmap were provided in the README excerpt.
Licensing & Compatibility
Limitations & Caveats
The project necessitates a complex and specific hardware infrastructure, including RDMA-capable network interfaces and GPUDirect RDMA support, which may present a significant barrier to adoption for users without such setups. Specific, recent versions of Linux kernel and CUDA are also required.
1 week ago
Inactive
uccl-project
kubeai-project
ROCm
b4rtaz
llm-d
predibase
kvcache-ai
ai-dynamo
vllm-project