FastDeploy  by PaddlePaddle

Toolkit for LLM deployment

created 3 years ago
3,438 stars

Top 14.4% on sourcepulse

GitHubView on GitHub
Project Summary

FastDeploy is a comprehensive toolkit designed for the efficient deployment of large language models (LLMs) and other deep learning models across diverse hardware and operating systems. It targets developers and researchers seeking to optimize inference speed and reduce resource consumption for production environments.

How It Works

FastDeploy leverages a unified API for model inference, abstracting away hardware-specific complexities. It supports various backend inference engines (e.g., ONNX Runtime, TensorRT, OpenVINO) and provides optimized runtime libraries for CPUs, GPUs (NVIDIA, AMD), and NPUs. This approach allows users to achieve high performance with minimal code changes across different deployment targets.

Quick Start & Requirements

Highlighted Details

  • Supports over 200+ pre-trained models and 10+ inference backends.
  • Offers quantization and pruning tools for model compression.
  • Provides optimized inference for LLMs, computer vision, and speech models.
  • Includes a unified C++ and Python API for cross-platform compatibility.

Maintenance & Community

  • Actively maintained by the PaddlePaddle team.
  • Community support available via GitHub Issues.
  • Roadmap and updates are typically posted on the GitHub repository.

Licensing & Compatibility

  • Apache 2.0 License.
  • Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project primarily focuses on inference optimization; model training capabilities are not included. While supporting many backends, achieving optimal performance may require specific hardware configurations and backend tuning.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
426
Issues (30d)
33
Star History
294 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 19 hours ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.