Lianmin Zheng

Coauthor of SGLang, vLLM

Authored Projects (1)

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Wei-Lin Chiang

Wei-Lin Chiang(Cofounder of LMArena),

Ying Sheng

Ying Sheng(Coauthor of SGLang),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

10 more.

awesome-tensor-compilers by merrymercy

Curated list of tensor compiler projects and papers

Optimizes deep learning computations across diverse hardware via advanced compilation techniques.
Features projects on intermediate representations, auto-tuning, cost models, and polyhedral optimization.
Covers dynamic shapes, quantization, sparsity, and distributed computing for ML workloads.
Showcases key frameworks like TVM, MLIR, and Triton for research and development.

Created 5 years ago

Updated 1 year ago

Starred Projects (99)

mini-sglang by sgl-project

Lightweight LLM inference framework with advanced optimizations

bryanhelmig:

Ying1123:

Created 4 months ago

Updated 5 days ago

SpecForge by sgl-project

Train speculative decoding models for faster inference

xiezhq-hermann:

Ying1123:

zhyncs:

Created 7 months ago

Updated 14 hours ago

miles by radixark

Enterprise RL for large-scale MoE models

wsxiaoys:

Ying1123:

lewtun:

ekzhang:

Created 3 months ago

Updated 1 day ago

DeepSeek-V3.2-Exp by deepseek-ai

Experimental LLM boosting long-context efficiency

Jiayi-Pan:

zhyncs:

Created 3 months ago

Updated 1 month ago

GLM-4.5 by zai-org

Foundation models for intelligent agents

Created 5 months ago

Updated 2 weeks ago

Kimi-K2 by MoonshotAI

State-of-the-art MoE language model

krisr:

lucidrains:

shizhediao:

vincentweisser:

Created 6 months ago

Updated 2 months ago

ome by sgl-project

Kubernetes operator for LLM serving

Ying1123:

zhyncs:

Created 7 months ago

Updated 1 day ago

genai-bench by sgl-project

LLM serving performance benchmarking

Ying1123:

zhyncs:

Created 6 months ago

Updated 2 days ago

slime by THUDM

LLM post-training framework for RL scaling

jiamings:

hammer:

pgarbacki:

pcmoritz:

Created 6 months ago

Updated 16 hours ago

RL2 by ChenmienTan

Reinforcement learning for large language models

CodeCreator:

winglian:

willccbb:

jaredpalmer:

Created 9 months ago

Updated 2 days ago

Mooncake by kvcache-ai

Research paper on a disaggregated architecture for LLM serving

jiamings:

luiscape:

WoosukKwon:

hiyouga:

Created 1 year ago

Updated 1 day ago

AReaL by inclusionAI

Distributed RL system for LLM reasoning

yiranwu0:

casper-hansen:

zhyncs:

Created 10 months ago

Updated 16 hours ago

LlamaFactory by hiyouga

Unified fine-tuning tool for 100+ LLMs & VLMs (ACL 2024)

patrickvonplaten:

alexchen4ai:

teetone:

lysandrejik:

Created 2 years ago

Updated 14 hours ago

verl by volcengine

RL training library for LLMs

WoosukKwon:

hammer:

yiranwu0:

luiscape:

Created 1 year ago

Updated 16 hours ago

DeepSeek-V3 by deepseek-ai

MoE language model research paper with 671B total parameters

tobi:

shimmyshimmer:

jiamings:

syrusakbary:

Created 1 year ago

Updated 4 months ago

gemlite by dropbox

Triton kernels for efficient low-bit matrix multiplication

winglian:

Created 1 year ago

Updated 3 weeks ago

HunyuanVideo by Tencent-Hunyuan

PyTorch code for video generation research

sxyu:

didierrlopes:

jiamings:

Created 1 year ago

Updated 1 month ago

MiniCPM by OpenBMB

Ultra-efficient LLMs for end devices, achieving 5x+ speedup

chiphuyen:

casper-hansen:

Created 1 year ago

Updated 3 months ago

nunchaku by nunchaku-ai

High-performance 4-bit diffusion model inference engine

luiscape:

parano:

chiphuyen:

alexchen4ai:

Created 1 year ago

Updated 13 hours ago

deepcompressor by nunchaku-tech

Model compression toolbox for LLMs and diffusion models

alexchen4ai:

zhiyuan8:

Created 1 year ago

Updated 5 months ago

xgrammar by mlc-ai

Library for efficient structured generation

ogabrielluiz:

youkaichao:

simonw:

simon-mo:

Created 1 year ago

Updated 13 hours ago

Awesome-ML-SYS-Tutorial by zhaochenyang20

ML SYS learning notes and code

lilianweng:

yiranwu0:

hiyouga:

shizhediao:

Created 1 year ago

Updated 3 days ago

OpenRLHF by OpenRLHF

RLHF framework for scalable training of large language models

beyang:

parano:

vincentweisser:

binarybana:

Created 2 years ago

Updated 3 days ago

SageAttention by thu-ml

Attention kernel for plug-and-play inference acceleration

chiphuyen:

philschmid:

winglian:

Created 1 year ago

Updated 2 weeks ago

sgl-learning-materials by sgl-project

Learning materials for SGLang, an efficient LLM serving engine

Ying1123:

Created 1 year ago

Updated 6 days ago

fish-speech by fishaudio

Open-source TTS for multilingual speech synthesis

jiamings:

chiphuyen:

parano:

Created 2 years ago

Updated 3 days ago

Liger-Kernel by linkedin

Triton kernels for efficient LLM training

karpathy:

chiphuyen:

pgarbacki:

Jiayi-Pan:

Created 1 year ago

Updated 4 days ago

ao by pytorch

PyTorch library for quantization and sparsity in training/inference

danielhanchen:

shimmyshimmer:

parano:

willccbb:

Created 2 years ago

Updated 1 day ago

appl by appl-team

A prompt programming language for Python

Created 1 year ago

Updated 10 months ago

ttt-lm-jax by test-time-training

JAX implementation of test-time training RNN research paper

agajews:

parano:

gakonst:

Created 1 year ago

Updated 2 months ago

RouteLLM by lm-sys

Framework for LLM routing and cost reduction (research paper)

chiphuyen:

ebursztein:

hammer:

ogabrielluiz:

Created 1 year ago

Updated 1 year ago

DistServe by LLMServe

Disaggregated serving system for LLMs

Created 2 years ago

Updated 9 months ago

gpt-fast by meta-pytorch

PyTorch text generation for efficient transformer inference

karpathy:

antiagainst:

jamesr66a:

chiphuyen:

Created 2 years ago

Updated 4 months ago

lmdeploy by InternLM

Toolkit for LLM compression, deployment, and serving

shimmyshimmer:

wsxiaoys:

luiscape:

jn2clark:

Created 2 years ago

Updated 2 days ago

LLaVA-NeXT by LLaVA-VL

Multimodal model for image, video, and 3D understanding

Jiayi-Pan:

Created 1 year ago

Updated 3 months ago

arena-hard-auto by lmarena

Automatic LLM benchmark for instruction-tuned models, correlating with human preference

hiyouga:

pgarbacki:

mlabonne:

zhuohan123:

Created 2 years ago

Updated 6 months ago

llama3 by meta-llama

*Deprecated* minimal example for loading and running Llama 3 models

tobi:

mckaywrigley:

osanseviero:

simonw:

Created 1 year ago

Updated 11 months ago

llama.cpp by ggml-org

C/C++ library for local LLM inference

karpathy:

nat:

tobi:

hiyouga:

Created 2 years ago

Updated 14 hours ago

SWE-agent by SWE-agent

Agent for automated software engineering (NeurIPS 2024)

truell20:

hugs:

chiphuyen:

ogabrielluiz:

Created 1 year ago

Updated 1 week ago

ollama by ollama

CLI tool for running LLMs locally

tobi:

jmorganca:

domoritz:

ekzhu:

Created 2 years ago

Updated 21 hours ago

scattermoe by shawntan

Triton-based Sparse Mixture-of-Experts for efficient deep learning

winglian:

casper-hansen:

hammer:

Created 1 year ago

Updated 3 months ago

Consistency_LLM by hao-ai-lab

Parallel decoder for efficient LLM inference

comaniac:

chiphuyen:

zhuohan123:

Created 2 years ago

Updated 1 year ago

grok-1 by xai-org

JAX example code for loading and running Grok-1 open-weights model

geohot:

yiranwu0:

omarsar:

handotdev:

Created 1 year ago

Updated 1 year ago

flashinfer by flashinfer-ai

Kernel library for LLM serving

chiphuyen:

hammer:

JustinLin610:

luiscape:

Created 2 years ago

Updated 15 hours ago

distrifuser by mit-han-lab

Research paper for distributed parallel inference of high-resolution diffusion models

philschmid:

xiezhq-hermann:

Created 1 year ago

Updated 1 year ago

LWM by LargeWorldModel

Multimodal autoregressive model for long-context video/text

mateiz:

chiphuyen:

Jiayi-Pan:

pgarbacki:

Created 1 year ago

Updated 1 year ago

Qwen3 by QwenLM

Large language model series by Qwen team, Alibaba Cloud

teetone:

vincentweisser:

ggerganov:

sxyu:

Created 1 year ago

Updated 2 days ago

cutlass by NVIDIA

CUDA C++ and Python DSLs for high-performance linear algebra

tridao:

chiphuyen:

joker-eph:

mattjj:

Created 8 years ago

Updated 2 days ago

search_with_lepton by leptonai

Conversational search engine demo

chiphuyen:

zhiyuan8:

soumith:

khou22:

Created 1 year ago

Updated 1 month ago

sglang by sgl-project

Fast serving framework for LLMs and vision language models

shimmyshimmer:

beyang:

samlambert:

ebursztein:

Created 2 years ago

Updated 14 hours ago

EAGLE by SafeAILab

Speculative decoding research paper for faster LLM inference

shizhediao:

zhyncs:

philschmid:

pgarbacki:

Created 2 years ago

Updated 3 weeks ago

llm-decontaminator by lm-sys

LLM contamination detector for quantifying rephrased samples

casper-hansen:

huybery:

Ying1123:

infwinston:

Created 2 years ago

Updated 2 years ago

ChatRTX by NVIDIA

Demo app for local RAG chatbot on Windows

omarsar:

JustinLin610:

jerryjliu:

Ying1123:

Created 2 years ago

Updated 9 months ago

S-LoRA by S-LoRA

System for scalable LoRA adapter serving

chiphuyen:

JohannesHa:

winglian:

osanseviero:

Created 2 years ago

Updated 2 years ago

Yi by 01-ai

Open-source bilingual LLMs trained from scratch

chiphuyen:

simonw:

yiranwu0:

pgarbacki:

Created 2 years ago

Updated 1 year ago

CTranslate2 by OpenNMT

Fast inference engine for Transformer models

simonw:

eugeneyan:

jph00:

bryanhelmig:

Created 6 years ago

Updated 14 hours ago

DeepSpeed-MII by deepspeedai

Python library for high-throughput, low-latency, and cost-effective model inference

hiyouga:

Ying1123:

Sanger2000:

casper-hansen:

Created 3 years ago

Updated 6 months ago

ChatGLM3 by zai-org

Bilingual chat LLM for complex scenarios (tool use, code execution, agents)

victortaelin:

Created 2 years ago

Updated 1 year ago

AgentTuning by THUDM

Agent tuning for generalized LLM agent abilities

JohannesHa:

vincentweisser:

omarsar:

Created 2 years ago

Updated 2 years ago

TensorRT-LLM by NVIDIA

LLM inference optimization SDK for NVIDIA GPUs

beyang:

hammer:

zhyncs:

shizhediao:

Created 2 years ago

Updated 15 hours ago

guidance by guidance-ai

Guidance is a programming paradigm for steering LLMs

tobi:

ekzhu:

stas00:

Ying1123:

Created 3 years ago

Updated 5 days ago

Medusa by FasterDecoding

Framework for accelerating LLM generation using multiple decoding heads

osanseviero:

parano:

luiscape:

zhyncs:

Created 2 years ago

Updated 1 year ago

codellama by meta-llama

Inference code for CodeLlama models

chiphuyen:

vincentweisser:

hiyouga:

shizhediao:

Created 2 years ago

Updated 1 year ago

llm-attacks by llm-attacks

Attack framework for aligned LLMs, based on a research paper

ebursztein:

chiphuyen:

jph00:

hiyouga:

Created 2 years ago

Updated 1 year ago

rerope by bojone

Position embeddings research paper

winglian:

jph00:

Ying1123:

Created 2 years ago

Updated 1 year ago

LightLLM by ModelTC

Python framework for LLM inference and serving

zhyncs:

chiphuyen:

pgarbacki:

JustinLin610:

Created 2 years ago

Updated 1 day ago

long_llama by CStanKonrad

LLM for long context handling, fine-tuned with Focused Transformer

Ying1123:

pgarbacki:

simonw:

Created 2 years ago

Updated 2 years ago

text-generation-inference by huggingface

Rust/Python/gRPC server for fast LLM text generation

tobi:

clmnt:

tjbck:

chiphuyen:

Created 3 years ago

Updated 3 days ago

gorilla-cli by gorilla-llm

CLI tool using LLMs to generate commands

luiscape:

pirroh:

infwinston:

zhuohan123:

Created 2 years ago

Updated 1 year ago

LongChat by DachengLi1

Long-context LLM chatbot training and evaluation framework

casper-hansen:

hiyouga:

huybery:

pgarbacki:

Created 2 years ago

Updated 1 year ago

vllm by vllm-project

LLM serving engine for high-throughput, memory-efficient inference

karpathy:

clmnt:

tobi:

danielhanchen:

Created 2 years ago

Updated 14 hours ago

WizardLM by nlpxucan

LLMs built using Evol-Instruct for complex instruction following

vincentweisser:

chiphuyen:

WoosukKwon:

ishaan-jaff:

Created 2 years ago

Updated 7 months ago

mlc-llm by mlc-ai

Universal LLM deployment engine with ML compilation

tobi:

osanseviero:

zhiyuan8:

zhuohan123:

Created 2 years ago

Updated 1 week ago

LLaVA by haotian-liu

Multimodal assistant with GPT-4 level capabilities

shizhediao:

zhiyuan8:

transitive-bullshit:

patrickvonplaten:

Created 2 years ago

Updated 1 year ago

MiniGPT-4 by Vision-CAIR

Vision-language model for multi-task learning

pgarbacki:

forresti:

jn2clark:

hiyouga:

Created 2 years ago

Updated 1 year ago

web-llm by mlc-ai

In-browser LLM inference engine using WebGPU for hardware acceleration

tobi:

didierrlopes:

doriandarko:

osanseviero:

Created 2 years ago

Updated 1 month ago

EasyLM by young-geng

LLM training/finetuning framework in JAX/Flax

Jiayi-Pan:

chiphuyen:

shizhediao:

zhuohan123:

Created 3 years ago

Updated 1 year ago

zkml by ddkang

Framework for proofs of ML model execution in ZK-SNARKs

gakonst:

Created 2 years ago

Updated 1 year ago

llama by meta-llama

Inference code for Llama 2 models (deprecated)

froystig:

xiezhq-hermann:

fabhed:

borzunov:

Created 2 years ago

Updated 11 months ago

FastChat by lm-sys

Open platform for training, serving, and evaluating LLM-based chatbots

zjasper666:

aangelopoulos:

osanseviero:

natolambert:

Created 2 years ago

Updated 7 months ago

web-stable-diffusion by mlc-ai

Browser-based Stable Diffusion demo with no server support

pgarbacki:

gakonst:

syrusakbary:

binarybana:

Created 2 years ago

Updated 1 year ago

FlexLLMGen by FMInference

High-throughput generation engine for LLMs with limited GPU memory

chiphuyen:

jrk:

jph00:

parano:

Created 2 years ago

Updated 1 year ago

smoothquant by mit-han-lab

Post-training quantization research paper for large language models

zhyncs:

zhiyuan8:

gakonst:

forresti:

Created 3 years ago

Updated 1 year ago

dpm-solver by LuChengTHU

Fast ODE solver for diffusion probabilistic model sampling

Edward-Sun:

Ying1123:

patrickvonplaten:

PiotrDabkowski:

Created 3 years ago

Updated 1 year ago

GLM-130B by zai-org

Bilingual model for research and evaluation

soldni:

wassemgtk:

jiamings:

mckaywrigley:

Created 3 years ago

Updated 2 years ago

AITemplate by facebookincubator

Generate high-performance inference engines

nat:

hammer:

transitive-bullshit:

jrk:

Created 3 years ago

Updated 3 weeks ago

compiler-and-arch by KnowingNothing

Compiler/architecture resources for emerging domains

zhuohan123:

WoosukKwon:

Created 3 years ago

Updated 1 year ago

alpa by alpa-projects

Auto-parallelization framework for large-scale neural network training and serving

chiphuyen:

Jiayi-Pan:

transitive-bullshit:

soldni:

Created 4 years ago

Updated 2 years ago

skypilot by skypilot-org

Framework for cloud AI/batch jobs, unifying execution across diverse infrastructure

karpathy:

tobi:

amin3141:

luiscape:

Created 4 years ago

Updated 15 hours ago

flexflow-train by flexflow

Accelerating distributed deep learning training

parano:

jph00:

Edward-Sun:

NikolaBorisov:

Created 7 years ago

Updated 1 day ago

aqueduct by RunLLM

MLOps framework for cloud deployment of LLM/ML workloads

ShishirPatil:

hammer:

jheer:

spencerkimball:

Created 3 years ago

Updated 2 years ago

paper-reading by mli

Deep learning paper readings

parano:

omarsar:

Ying1123:

Jiayi-Pan:

Created 4 years ago

Updated 9 months ago

vision_transformer by google-research

Vision Transformer and MLP-Mixer models in JAX/Flax

aravindsrinivas:

gakonst:

jn2clark:

chiphuyen:

Created 5 years ago

Updated 2 days ago

flax by google

NN library for JAX, designed for flexibility in neural network research

charliermarsh:

codekansas:

Jiayi-Pan:

jaredpalmer:

Created 6 years ago

Updated 2 days ago

awesome-tensor-compilers by merrymercy

Curated list of tensor compiler projects and papers

chiphuyen:

infwinston:

Ying1123:

luiscape:

Created 5 years ago

Updated 1 year ago

antares by microsoft

Compiler solution for PyTorch operator optimization on diverse accelerators

parasj:

comaniac:

xiezhq-hermann:

Created 5 years ago

Updated 8 months ago

dgl by dmlc

Python package for deep learning on graphs

osanseviero:

jiamings:

xiezhq-hermann:

parasj:

Created 7 years ago

Updated 5 months ago

tvm by apache

Compiler stack for deep learning systems

aravindsrinivas:

transitive-bullshit:

guberti:

wesm:

Created 9 years ago

Updated 1 day ago

MARL-Papers by LantaoYu

Paper list for multi-agent reinforcement learning (MARL)

aravindsrinivas:

chenlin9:

jiamings:

Created 8 years ago

Updated 1 month ago

Feedback? Help us improve.