Woosuk Kwon

Coauthor of vLLM

Starred Projects (67)

torchtitan by pytorch

PyTorch platform for generative AI model training research

karpathy:

pgarbacki:

lewtun:

zhuohan123:

Created 2 years ago

Updated 1 day ago

vllm-omni by vllm-project

Omni-modality model inference and serving framework

omarsar:

hiyouga:

Created 4 months ago

Updated 19 hours ago

verl by volcengine

RL training library for LLMs

hammer:

yiranwu0:

luiscape:

binarybana:

Created 1 year ago

Updated 16 hours ago

SkyRL by NovaSky-AI

RL training pipeline for multi-turn tool use LLMs, optimized for real-world tasks

lewtun:

hiyouga:

zhuohan123:

JohannesHa:

Created 8 months ago

Updated 1 day ago

tinker-cookbook by thinking-machines-lab

Advanced LLM fine-tuning SDK and example cookbook

ekzhu:

joschu:

JohannesHa:

gakonst:

Created 6 months ago

Updated 3 days ago

batch_invariant_ops by thinking-machines-lab

Enhance LLM inference determinism

Edward-Sun:

zhyncs:

ekzhang:

willccbb:

Created 4 months ago

Updated 2 months ago

recipes by vllm-project

LLM inference recipes

simon-mo:

Created 5 months ago

Updated 2 days ago

openevolve by algorithmicsuperintelligence

Coding agent for scientific/algorithmic discovery, based on AlphaEvolve paper

chiphuyen:

vincentweisser:

taranjeet:

suquark:

Created 8 months ago

Updated 2 weeks ago

nano-vllm by GeeeekExplorer

Lightweight vLLM implementation from scratch

jrk:

ekzhang:

jiamings:

luiscape:

Created 7 months ago

Updated 2 months ago

llm-d by llm-d

Kubernetes-native framework for distributed LLM inference

parano:

vanpelt:

hammer:

Created 8 months ago

Updated 2 days ago

dynamo by ai-dynamo

Inference framework for distributed generative AI model serving

vincentweisser:

willingc:

chiphuyen:

luiscape:

Created 10 months ago

Updated 16 hours ago

ArcticInference by snowflakedb

vLLM plugin for high-throughput, low-latency LLM and embedding inference

stas00:

luiscape:

Created 9 months ago

Updated 5 days ago

MiMo by XiaomiMiMo

LLM for reasoning, pre-trained and post-trained for math/code tasks

alexchen4ai:

zhyncs:

omarsar:

Created 8 months ago

Updated 7 months ago

rllm by rllm-org

Framework for post-training language agents via reinforcement learning

yiranwu0:

pgarbacki:

vincentweisser:

shizhediao:

Created 11 months ago

Updated 1 day ago

chatgpt_system_prompt by LouisShark

GPT system prompt collection for prompt engineering and security education

simonw:

thomwolf:

chiphuyen:

lewtun:

Created 2 years ago

Updated 2 days ago

fairseq2 by facebookresearch

Sequence modeling toolkit for content generation research

Created 3 years ago

Updated 2 days ago

Mooncake by kvcache-ai

Research paper on a disaggregated architecture for LLM serving

jiamings:

luiscape:

merrymercy:

hiyouga:

Created 1 year ago

Updated 1 day ago

xgrammar by mlc-ai

Library for efficient structured generation

merrymercy:

ogabrielluiz:

youkaichao:

simonw:

Created 1 year ago

Updated 13 hours ago

Liger-Kernel by linkedin

Triton kernels for efficient LLM training

karpathy:

chiphuyen:

pgarbacki:

Jiayi-Pan:

Created 1 year ago

Updated 4 days ago

Nanoflow by efeslab

LLM serving framework for high throughput

Ying1123:

zhyncs:

zhuohan123:

Created 1 year ago

Updated 2 months ago

xla by pytorch

PyTorch on XLA devices

aravindsrinivas:

Jiayi-Pan:

comaniac:

ankane:

Created 7 years ago

Updated 3 weeks ago

ao by pytorch

PyTorch library for quantization and sparsity in training/inference

danielhanchen:

shimmyshimmer:

parano:

willccbb:

Created 2 years ago

Updated 1 day ago

Model-Optimizer by NVIDIA

Library for optimizing deep learning models for GPU inference

luiscape:

mfuntowicz:

Created 1 year ago

Updated 1 day ago

llm-compressor by vllm-project

Transformers-compatible library for LLM compression, optimized for vLLM deployment

chiphuyen:

dguido:

hammer:

patrickvonplaten:

Created 1 year ago

Updated 18 hours ago

intel-extension-for-pytorch by intel

PyTorch extension for performance boost on Intel platforms

hammer:

lantiga:

Created 5 years ago

Updated 2 days ago

ThunderKittens by HazyResearch

CUDA kernel framework for fast deep learning primitives

karpathy:

vincentweisser:

zhyncs:

gakonst:

Created 1 year ago

Updated 16 hours ago

mirage by mirage-project

Tool for fast GPU kernel generation via superoptimization

geohot:

zhuohan123:

binarybana:

Ying1123:

Created 1 year ago

Updated 3 days ago

AutoAWQ by casper-hansen

AutoAWQ is a tool for 4-bit quantized LLM inference

vincentweisser:

peakji:

winglian:

codekansas:

Created 2 years ago

Updated 8 months ago

grok-1 by xai-org

JAX example code for loading and running Grok-1 open-weights model

geohot:

yiranwu0:

omarsar:

handotdev:

Created 1 year ago

Updated 1 year ago

lm-evaluation-harness by EleutherAI

Framework for few-shot language model evaluation

aravindsrinivas:

zjasper666:

zhuohan123:

shizhediao:

Created 5 years ago

Updated 4 days ago

LLMSys-PaperList by AmberLJC

Curated list of LLM systems papers

pcmoritz:

Created 2 years ago

Updated 5 days ago

aici by microsoft

AICI constrains LLM output using (Wasm) programs

tobi:

hammer:

omarsar:

chiphuyen:

Created 2 years ago

Updated 11 months ago

mlc-llm by mlc-ai

Universal LLM deployment engine with ML compilation

tobi:

osanseviero:

zhiyuan8:

zhuohan123:

Created 2 years ago

Updated 1 week ago

mscclpp by microsoft

GPU-driven communication stack for scalable AI applications

zhyncs:

jrk:

Created 2 years ago

Updated 1 day ago

sglang by sgl-project

Fast serving framework for LLMs and vision language models

shimmyshimmer:

beyang:

samlambert:

ebursztein:

Created 2 years ago

Updated 14 hours ago

flashinfer by flashinfer-ai

Kernel library for LLM serving

chiphuyen:

hammer:

JustinLin610:

luiscape:

Created 2 years ago

Updated 15 hours ago

punica by punica-ai

LoRA serving system (research paper) for multi-tenant LLM inference

winglian:

hammer:

chiphuyen:

rodrigosnader:

Created 2 years ago

Updated 1 year ago

LLMCompiler by SqueezeAILab

LLM compiler for parallel function calling

hammer:

rodrigosnader:

pgarbacki:

ogabrielluiz:

Created 2 years ago

Updated 1 year ago

gpt-fast by meta-pytorch

PyTorch text generation for efficient transformer inference

karpathy:

antiagainst:

jamesr66a:

merrymercy:

Created 2 years ago

Updated 4 months ago

TensorRT-LLM by NVIDIA

LLM inference optimization SDK for NVIDIA GPUs

beyang:

hammer:

zhyncs:

shizhediao:

Created 2 years ago

Updated 15 hours ago

WizardLM by nlpxucan

LLMs built using Evol-Instruct for complex instruction following

vincentweisser:

chiphuyen:

ishaan-jaff:

thomwolf:

Created 2 years ago

Updated 7 months ago

outlines by dottxt-ai

SDK for structured LLM text generation

tobi:

kerollmops:

willingc:

jn2clark:

Created 2 years ago

Updated 2 days ago

Awesome-LLM by Hannibal046

Curated list of Large Language Model resources

rodrigosnader:

shizhediao:

zhiyuan8:

vincentweisser:

Created 2 years ago

Updated 5 months ago

gorilla by ShishirPatil

LLM tool-use framework for API invocation and function calling

lewtun:

gakonst:

chiphuyen:

parano:

Created 2 years ago

Updated 1 week ago

LLMSurvey by RUCAIBox

Survey paper for large language models

winglian:

transitive-bullshit:

omarsar:

Ying1123:

Created 2 years ago

Updated 10 months ago

CTranslate2 by OpenNMT

Fast inference engine for Transformer models

merrymercy:

simonw:

eugeneyan:

jph00:

Created 6 years ago

Updated 14 hours ago

SqueezeLLM by SqueezeAILab

Quantization framework for efficient LLM serving (ICML 2024 paper)

JustinLin610:

casper-hansen:

Created 2 years ago

Updated 1 year ago

vllm by vllm-project

LLM serving engine for high-throughput, memory-efficient inference

karpathy:

clmnt:

tobi:

danielhanchen:

Created 2 years ago

Updated 14 hours ago

Awesome-LLMOps by tensorchord

Curated list of LLMOps tools for developers

marcklingen:

shyamal-anadkat:

wsxiaoys:

hammer:

Created 3 years ago

Updated 1 week ago

FastChat by lm-sys

Open platform for training, serving, and evaluating LLM-based chatbots

zjasper666:

aangelopoulos:

osanseviero:

natolambert:

Created 2 years ago

Updated 7 months ago

llama by meta-llama

Inference code for Llama 2 models (deprecated)

froystig:

xiezhq-hermann:

fabhed:

borzunov:

Created 2 years ago

Updated 11 months ago

Megatron-LM by NVIDIA

Framework for training transformer models at scale

jiamings:

gravicle:

alexchen4ai:

parasj:

Created 6 years ago

Updated 18 hours ago

flash-attention by Dao-AILab

Fast, memory-efficient attention implementation

karpathy:

Jiayi-Pan:

zhiyuan8:

alexchen4ai:

Created 3 years ago

Updated 1 day ago

TransformerEngine by NVIDIA

Library for Transformer model acceleration on NVIDIA GPUs

luiscape:

sxyu:

pgarbacki:

hammer:

Created 3 years ago

Updated 1 day ago

AITemplate by facebookincubator

Generate high-performance inference engines

nat:

hammer:

transitive-bullshit:

jrk:

Created 3 years ago

Updated 3 weeks ago

x-transformers by lucidrains

Transformer library with extensive experimental features

chiphuyen:

hammer:

chenlin9:

Jiayi-Pan:

Created 5 years ago

Updated 5 days ago

compiler-and-arch by KnowingNothing

Compiler/architecture resources for emerging domains

zhuohan123:

merrymercy:

Created 3 years ago

Updated 1 year ago

skypilot by skypilot-org

Framework for cloud AI/batch jobs, unifying execution across diverse infrastructure

karpathy:

tobi:

amin3141:

luiscape:

Created 4 years ago

Updated 15 hours ago

metaseq by facebookresearch

Codebase for large-scale transformer model development and deployment

chiphuyen:

gakonst:

xiezhq-hermann:

Ying1123:

Created 3 years ago

Updated 1 year ago

FasterTransformer by NVIDIA

Optimized transformer library for inference

nat:

chiphuyen:

JustinLin610:

mfuntowicz:

Created 4 years ago

Updated 1 year ago

alpa by alpa-projects

Auto-parallelization framework for large-scale neural network training and serving

chiphuyen:

Jiayi-Pan:

transitive-bullshit:

soldni:

Created 4 years ago

Updated 2 years ago

transformers by huggingface

ML library for pretrained model inference and training

clmnt:

lilianweng:

karpathy:

tjbck:

Created 7 years ago

Updated 2 days ago

ray by ray-project

AI compute engine for scaling Python and AI applications

beyang:

hsbt:

gregpr07:

hiyouga:

Created 9 years ago

Updated 19 hours ago

awesome-tensor-compilers by merrymercy

Curated list of tensor compiler projects and papers

chiphuyen:

infwinston:

Ying1123:

luiscape:

Created 5 years ago

Updated 1 year ago

tvm by apache

Compiler stack for deep learning systems

aravindsrinivas:

transitive-bullshit:

guberti:

wesm:

Created 9 years ago

Updated 1 day ago

cutlass by NVIDIA

CUDA C++ and Python DSLs for high-performance linear algebra

tridao:

chiphuyen:

joker-eph:

mattjj:

Created 8 years ago

Updated 2 days ago

DeepLearningExamples by NVIDIA

Deep learning examples for training and deployment

codekansas:

pgarbacki:

khou22:

omarsar:

Created 7 years ago

Updated 1 year ago

Feedback? Help us improve.