Discover and explore top open-source AI tools and projects—updated daily.
interestingLSYLLM inference system for research
Top 84.7% on SourcePulse
SwiftLLM is a lightweight LLM inference system designed for research purposes, offering performance comparable to vLLM with a significantly smaller codebase. It targets researchers who need to understand, modify, and extend LLM inference systems without the complexity of production-focused frameworks.
How It Works
SwiftLLM employs a master-worker architecture, separating concerns into a control plane for scheduling and a data plane for computation. It leverages Python and OpenAI Triton for efficient CUDA kernel implementation. Key features include iterative scheduling, selective batching, PagedAttention, and FlashAttention, enabling high performance with a minimal code footprint.
Quick Start & Requirements
pip install -r requirements.txt, and install SwiftLLM with pip install -e . and pip install -e csrc.packaging. NVIDIA GPU is required for Triton kernels..bin and .safetensors formats).examples/offline.py), online serving (examples/online.py), and a vLLM-like API server (swiftllm/server/api_server.py).Highlighted Details
Maintenance & Community
The project is under active development, with some features potentially incomplete and documentation limited. Future plans include support for tensor and pipeline parallelism.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
SwiftLLM is explicitly not a production-ready solution. Features like quantization, LoRA, multimodal models, and non-greedy sampling are not supported and would require custom implementation. Support is limited to NVIDIA GPUs, though migration to other hardware is possible if supported by OpenAI Triton.
10 months ago
1 day
allenai
mryab
openai