Zhuohan Li

Coauthor of vLLM

Starred Projects (106)

TileGym by NVIDIA

CUDA Tile kernel library for efficient GPU programming

zhyncs:

Created 1 month ago

Updated 3 days ago

SkyRL by NovaSky-AI

RL training pipeline for multi-turn tool use LLMs, optimized for real-world tasks

lewtun:

hiyouga:

WoosukKwon:

JohannesHa:

Created 8 months ago

Updated 1 day ago

checkpoint-engine by MoonshotAI

Middleware for efficient LLM weight updates during inference

pgarbacki:

luiscape:

zhyncs:

didierrlopes:

Created 4 months ago

Updated 2 days ago

batch_invariant_ops by thinking-machines-lab

Enhance LLM inference determinism

Edward-Sun:

zhyncs:

ekzhang:

willccbb:

Created 4 months ago

Updated 2 months ago

wechat-bot by wangrongding

WeChat bot integrating multiple AI services

Created 4 years ago

Updated 3 days ago

harmony by openai

Renderer for OpenAI's harmony response format

chiphuyen:

shyamal-anadkat:

t3dotgg:

hiyouga:

Created 5 months ago

Updated 3 weeks ago

gpt-oss by openai

Open-weight LLMs for reasoning and agents

karpathy:

danielhanchen:

borzunov:

chiphuyen:

Created 6 months ago

Updated 2 months ago

mirage by mirage-project

Tool for fast GPU kernel generation via superoptimization

geohot:

binarybana:

Ying1123:

simon-mo:

Created 1 year ago

Updated 3 days ago

Hunyuan3D-2.1 by Tencent-Hunyuan

Image to 3D asset generation with PBR materials

zhyncs:

Created 7 months ago

Updated 2 months ago

torchtitan by pytorch

PyTorch platform for generative AI model training research

karpathy:

WoosukKwon:

pgarbacki:

lewtun:

Created 2 years ago

Updated 1 day ago

tilelang by tile-ai

DSL for high-performance GPU/CPU kernel development (GEMM, attention, etc.)

parano:

hiyouga:

JustinLin610:

luiscape:

Created 1 year ago

Updated 1 day ago

3FS by deepseek-ai

Distributed file system for AI training/inference workloads

chiphuyen:

joewalnes:

wesm:

alexey-milovidov:

Created 10 months ago

Updated 5 days ago

DeepGEMM by deepseek-ai

CUDA library for efficient FP8 GEMM kernels with fine-grained scaling

chiphuyen:

ekzhang:

natolambert:

patrickvonplaten:

Created 11 months ago

Updated 5 days ago

open-infra-index by deepseek-ai

AI infrastructure tools for efficient AGI development

lattner:

vincentweisser:

hammer:

pankajroark:

Created 10 months ago

Updated 8 months ago

mochi by genmoai

Video generation model

transitive-bullshit:

flaque:

s9xie:

parano:

Created 1 year ago

Updated 1 month ago

llm-compressor by vllm-project

Transformers-compatible library for LLM compression, optimized for vLLM deployment

chiphuyen:

dguido:

hammer:

patrickvonplaten:

Created 1 year ago

Updated 18 hours ago

Nanoflow by efeslab

LLM serving framework for high throughput

WoosukKwon:

Ying1123:

zhyncs:

Created 1 year ago

Updated 2 months ago

vattention by microsoft

Memory manager for LLM serving systems

Created 1 year ago

Updated 7 months ago

lm-evaluation-harness by EleutherAI

Framework for few-shot language model evaluation

aravindsrinivas:

zjasper666:

shizhediao:

simonw:

Created 5 years ago

Updated 4 days ago

mistral.rs by EricLBuehler

LLM inference engine for blazing fast performance

binarybana:

osanseviero:

tjbck:

chiphuyen:

Created 1 year ago

Updated 2 days ago

OpenHands by OpenHands

AI platform for software development agents

truell20:

pgarbacki:

khou22:

yiranwu0:

Created 1 year ago

Updated 18 hours ago

ThunderKittens by HazyResearch

CUDA kernel framework for fast deep learning primitives

karpathy:

vincentweisser:

zhyncs:

gakonst:

Created 1 year ago

Updated 16 hours ago

arena-hard-auto by lmarena

Automatic LLM benchmark for instruction-tuned models, correlating with human preference

hiyouga:

pgarbacki:

mlabonne:

merrymercy:

Created 2 years ago

Updated 6 months ago

simple-evals by openai

Lightweight library for evaluating language models

hammer:

patrickvonplaten:

simonw:

zhyncs:

Created 1 year ago

Updated 5 months ago

calm by zeux

Single-GPU inference engine for rapid LLM prototyping

winglian:

Created 2 years ago

Updated 7 months ago

dspy by stanfordnlp

Framework for programming language models, not prompting

tobi:

mateiz:

vincentweisser:

pgarbacki:

Created 3 years ago

Updated 3 days ago

Consistency_LLM by hao-ai-lab

Parallel decoder for efficient LLM inference

comaniac:

chiphuyen:

merrymercy:

Created 2 years ago

Updated 1 year ago

grok-1 by xai-org

JAX example code for loading and running Grok-1 open-weights model

geohot:

yiranwu0:

omarsar:

handotdev:

Created 1 year ago

Updated 1 year ago

mlc-llm by mlc-ai

Universal LLM deployment engine with ML compilation

tobi:

osanseviero:

zhiyuan8:

WoosukKwon:

Created 2 years ago

Updated 1 week ago

kserve by kserve

Kubernetes CRD for scalable ML model serving

hammer:

chiphuyen:

luiscape:

salanki:

Created 6 years ago

Updated 4 days ago

LMFlow by OptimalScale

Toolkit for finetuning and inference of large foundation models

tobi:

shizhediao:

ebursztein:

chiphuyen:

Created 2 years ago

Updated 3 days ago

TransformerEngine by NVIDIA

Library for Transformer model acceleration on NVIDIA GPUs

luiscape:

sxyu:

pgarbacki:

hammer:

Created 3 years ago

Updated 1 day ago

LWM by LargeWorldModel

Multimodal autoregressive model for long-context video/text

mateiz:

chiphuyen:

Jiayi-Pan:

pgarbacki:

Created 1 year ago

Updated 1 year ago

search_with_lepton by leptonai

Conversational search engine demo

chiphuyen:

zhiyuan8:

soumith:

khou22:

Created 1 year ago

Updated 1 month ago

llama_index by run-llama

Data framework for building LLM-powered agents

karpathy:

atroyn:

JustinLin610:

ogabrielluiz:

Created 3 years ago

Updated 3 days ago

marlin by IST-DASLab

FP16xINT4 kernel for fast LLM inference

jph00:

Created 2 years ago

Updated 1 year ago

sglang by sgl-project

Fast serving framework for LLMs and vision language models

shimmyshimmer:

beyang:

samlambert:

ebursztein:

Created 2 years ago

Updated 14 hours ago

LLaVA by haotian-liu

Multimodal assistant with GPT-4 level capabilities

shizhediao:

zhiyuan8:

transitive-bullshit:

patrickvonplaten:

Created 2 years ago

Updated 1 year ago

megablocks by databricks

Lightweight library for mixture-of-experts (MoE) training

mateiz:

hiyouga:

jfrankle:

CodeCreator:

Created 3 years ago

Updated 6 months ago

flashinfer by flashinfer-ai

Kernel library for LLM serving

chiphuyen:

hammer:

JustinLin610:

luiscape:

Created 2 years ago

Updated 14 hours ago

gpt-fast by meta-pytorch

PyTorch text generation for efficient transformer inference

karpathy:

antiagainst:

jamesr66a:

merrymercy:

Created 2 years ago

Updated 4 months ago

LookaheadDecoding by hao-ai-lab

Parallel decoding algorithm for faster LLM inference

chiphuyen:

Ying1123:

bryanhelmig:

Created 2 years ago

Updated 10 months ago

axolotl by axolotl-ai-cloud

CLI tool for streamlined post-training of AI models

tobi:

beyang:

zhyncs:

patrickvonplaten:

Created 2 years ago

Updated 2 days ago

TensorRT-LLM by NVIDIA

LLM inference optimization SDK for NVIDIA GPUs

beyang:

hammer:

zhyncs:

shizhediao:

Created 2 years ago

Updated 15 hours ago

letta by letta-ai

Agent framework for stateful agents with memory, reasoning, and context management

hiyouga:

xiezhq-hermann:

joewalnes:

c4pt0r:

Created 2 years ago

Updated 1 week ago

streaming-llm by mit-han-lab

Framework for efficient LLM streaming

gakonst:

chiphuyen:

ValentaTomas:

omarsar:

Created 2 years ago

Updated 1 year ago

llm-engine by scaleapi

Open-source engine for fine-tuning and serving LLMs

alexandr:

hammer:

bryanhelmig:

vincentweisser:

Created 2 years ago

Updated 1 day ago

scalene by plasma-umass

Python profiler with AI-powered optimization proposals

luiscape:

xiezhq-hermann:

Ying1123:

trishume:

Created 6 years ago

Updated 2 weeks ago

Medusa by FasterDecoding

Framework for accelerating LLM generation using multiple decoding heads

osanseviero:

parano:

luiscape:

zhyncs:

Created 2 years ago

Updated 1 year ago

outlines by dottxt-ai

SDK for structured LLM text generation

tobi:

kerollmops:

willingc:

jn2clark:

Created 2 years ago

Updated 2 days ago

llm-awq by mit-han-lab

Weight quantization research paper for LLM compression/acceleration

hiyouga:

chiphuyen:

jph00:

lysandrejik:

Created 2 years ago

Updated 5 months ago

llama-cookbook by meta-llama

Guide for building with Llama models

alexchen4ai:

didierrlopes:

transitive-bullshit:

ValentaTomas:

Created 2 years ago

Updated 2 months ago

openchat by imoneoi

Open-source LLM fine-tuned with C-RLFT, inspired by offline reinforcement learning

vincentweisser:

philschmid:

chiphuyen:

transitive-bullshit:

Created 2 years ago

Updated 1 year ago

flash-attention by Dao-AILab

Fast, memory-efficient attention implementation

karpathy:

Jiayi-Pan:

zhiyuan8:

alexchen4ai:

Created 3 years ago

Updated 1 day ago

Dromedary by IBM

Self-aligned language model research paper with minimal human supervision

JustinLin610:

lewtun:

Edward-Sun:

Created 2 years ago

Updated 3 months ago

LLMSurvey by RUCAIBox

Survey paper for large language models

winglian:

transitive-bullshit:

omarsar:

Ying1123:

Created 2 years ago

Updated 10 months ago

vllm by vllm-project

LLM serving engine for high-throughput, memory-efficient inference

karpathy:

clmnt:

tobi:

danielhanchen:

Created 2 years ago

Updated 14 hours ago

tabby by TabbyML

Self-hosted AI coding assistant for on-prem code completion

tobi:

joewalnes:

wsxiaoys:

pgarbacki:

Created 2 years ago

Updated 5 days ago

LongChat by DachengLi1

Long-context LLM chatbot training and evaluation framework

casper-hansen:

hiyouga:

huybery:

pgarbacki:

Created 2 years ago

Updated 1 year ago

gorilla by ShishirPatil

LLM tool-use framework for API invocation and function calling

lewtun:

gakonst:

chiphuyen:

parano:

Created 2 years ago

Updated 1 week ago

gorilla-cli by gorilla-llm

CLI tool using LLMs to generate commands

luiscape:

merrymercy:

pirroh:

infwinston:

Created 2 years ago

Updated 1 year ago

llama.cpp by ggml-org

C/C++ library for local LLM inference

karpathy:

nat:

tobi:

hiyouga:

Created 2 years ago

Updated 14 hours ago

ray-llm by ray-project

LLM deployment framework on Ray (now upstreamed to Ray)

casper-hansen:

marcklingen:

transitive-bullshit:

hammer:

Created 2 years ago

Updated 10 months ago

peft by huggingface

Parameter-efficient fine-tuning (PEFT) library

tobi:

gakonst:

chiphuyen:

Ying1123:

Created 3 years ago

Updated 2 days ago

bitsandbytes by bitsandbytes-foundation

PyTorch library for k-bit quantization, enabling accessible LLMs

tjbck:

alexchen4ai:

danielhanchen:

shimmyshimmer:

Created 4 years ago

Updated 3 days ago

ctransformers by marella

Python bindings for fast Transformer model inference

tobi:

chiphuyen:

lukas:

JustinLin610:

Created 2 years ago

Updated 1 year ago

CTranslate2 by OpenNMT

Fast inference engine for Transformer models

merrymercy:

simonw:

eugeneyan:

jph00:

Created 6 years ago

Updated 14 hours ago

EasyLM by young-geng

LLM training/finetuning framework in JAX/Flax

Jiayi-Pan:

chiphuyen:

shizhediao:

hammer:

Created 3 years ago

Updated 1 year ago

open_llama by openlm-research

Open-source reproduction of LLaMA models

chiphuyen:

shizhediao:

ebursztein:

winglian:

Created 2 years ago

Updated 2 years ago

text-generation-inference by huggingface

Rust/Python/gRPC server for fast LLM text generation

tobi:

clmnt:

tjbck:

chiphuyen:

Created 3 years ago

Updated 3 days ago

langchain by langchain-ai

Framework for building LLM-powered applications

karpathy:

MagMueller:

gregpr07:

willingc:

Created 3 years ago

Updated 1 day ago

web-llm by mlc-ai

In-browser LLM inference engine using WebGPU for hardware acceleration

tobi:

didierrlopes:

doriandarko:

osanseviero:

Created 2 years ago

Updated 1 month ago

FasterTransformer by NVIDIA

Optimized transformer library for inference

nat:

chiphuyen:

JustinLin610:

mfuntowicz:

Created 4 years ago

Updated 1 year ago

FastChat by lm-sys

Open platform for training, serving, and evaluating LLM-based chatbots

zjasper666:

aangelopoulos:

osanseviero:

natolambert:

Created 2 years ago

Updated 7 months ago

llama by meta-llama

Inference code for Llama 2 models (deprecated)

froystig:

xiezhq-hermann:

fabhed:

borzunov:

Created 2 years ago

Updated 11 months ago

FlexLLMGen by FMInference

High-throughput generation engine for LLMs with limited GPU memory

chiphuyen:

jrk:

jph00:

parano:

Created 2 years ago

Updated 1 year ago

PiPPy by pytorch

PyTorch tool for pipeline parallelism

yang-song:

jph00:

JohannesHa:

stas00:

Created 4 years ago

Updated 1 year ago

AITemplate by facebookincubator

Generate high-performance inference engines

nat:

hammer:

transitive-bullshit:

jrk:

Created 3 years ago

Updated 3 weeks ago

compiler-and-arch by KnowingNothing

Compiler/architecture resources for emerging domains

WoosukKwon:

merrymercy:

Created 3 years ago

Updated 1 year ago

server by triton-inference-server

AI model inference serving optimized for cloud and edge

hammer:

tjbck:

Edward-Sun:

jn2clark:

Created 7 years ago

Updated 2 days ago

cutlass by NVIDIA

CUDA C++ and Python DSLs for high-performance linear algebra

tridao:

chiphuyen:

joker-eph:

mattjj:

Created 8 years ago

Updated 2 days ago

skypilot by skypilot-org

Framework for cloud AI/batch jobs, unifying execution across diverse infrastructure

karpathy:

tobi:

amin3141:

luiscape:

Created 4 years ago

Updated 14 hours ago

paxml by google

Jax-based ML framework for large-scale model training and experimentation

hammer:

Created 3 years ago

Updated 3 weeks ago

metaseq by facebookresearch

Codebase for large-scale transformer model development and deployment

chiphuyen:

gakonst:

xiezhq-hermann:

Ying1123:

Created 3 years ago

Updated 1 year ago

alpa by alpa-projects

Auto-parallelization framework for large-scale neural network training and serving

chiphuyen:

Jiayi-Pan:

transitive-bullshit:

soldni:

Created 4 years ago

Updated 2 years ago

DeepSpeed by deepspeedai

Deep learning optimization library for distributed training and inference

aravindsrinivas:

ValentaTomas:

winglian:

stas00:

Created 6 years ago

Updated 15 hours ago

DPR by facebookresearch

Dense Passage Retriever for open-domain Q&A research

lilianweng:

hammer:

huybery:

parasj:

Created 5 years ago

Updated 2 years ago

flexflow-train by flexflow

Accelerating distributed deep learning training

parano:

jph00:

Edward-Sun:

NikolaBorisov:

Created 7 years ago

Updated 1 day ago

pytorch-lightning by Lightning-AI

Deep learning framework for pretraining, finetuning, and deploying AI models

albertfgu:

soldni:

omarsar:

zhangce:

Created 6 years ago

Updated 3 days ago

faiss by facebookresearch

Similarity search library for dense vectors

lilianweng:

aravindsrinivas:

hsbt:

khou22:

Created 9 years ago

Updated 4 days ago

tvm by apache

Compiler stack for deep learning systems

aravindsrinivas:

transitive-bullshit:

guberti:

wesm:

Created 9 years ago

Updated 1 day ago

universal-triggers by Eric-Wallace

NLP attack/analysis research paper (EMNLP 2019)

omarsar:

hiyouga:

thomwolf:

Created 6 years ago

Updated 1 year ago

gdrcopy by NVIDIA

GPU memory copy library using GPUDirect RDMA

luiscape:

trishume:

hammer:

Created 11 years ago

Updated 3 weeks ago

Megatron-LM by NVIDIA

Framework for training transformer models at scale

jiamings:

gravicle:

alexchen4ai:

parasj:

Created 6 years ago

Updated 18 hours ago

DeepLearningExamples by NVIDIA

Deep learning examples for training and deployment

codekansas:

pgarbacki:

khou22:

omarsar:

Created 7 years ago

Updated 1 year ago

gpt-2 by openai

Code for research paper "Language Models are Unsupervised Multitask Learners"

aravindsrinivas:

simonw:

0hq:

shyamal-anadkat:

Created 7 years ago

Updated 1 year ago

fairseq by facebookresearch

Sequence modeling toolkit for translation, language modeling, and text generation research

lilianweng:

aravindsrinivas:

tjbck:

pathak22:

Created 8 years ago

Updated 3 months ago

rl_a3c_pytorch by dgriff777

PyTorch implementation of A3C for Atari games

junxiaosong:

ppwwyyxx:

jn2clark:

jwyang:

Created 8 years ago

Updated 2 years ago

ray by ray-project

AI compute engine for scaling Python and AI applications

beyang:

hsbt:

gregpr07:

hiyouga:

Created 9 years ago

Updated 19 hours ago

bert by google-research

TensorFlow code and pre-trained models for BERT

aravindsrinivas:

pgarbacki:

jn2clark:

evhub:

Created 7 years ago

Updated 1 year ago

awesome-ai-residency by dangkhoasdc

Curated list of AI residency programs

shizhediao:

tjbck:

omarsar:

Created 7 years ago

Updated 9 months ago

3D-Machine-Learning by timzhang642

Resource list for 3D machine learning

ebursztein:

chuanli11:

snavely:

jeffchuber:

Created 8 years ago

Updated 1 year ago

tensor2tensor by tensorflow

Deprecated library for deep learning models/datasets, successor to Trax

lilianweng:

aravindsrinivas:

eiso:

JohannesHa:

Created 8 years ago

Updated 2 years ago

kit by HugoBlox

AI-powered static site builder for technical content

stefanv:

hammer:

omarsar:

Created 9 years ago

Updated 22 hours ago

generating-reviews-discovering-sentiment by openai

Language model code for generating reviews and discovering sentiment

aravindsrinivas:

shyamal-anadkat:

evhub:

vincentweisser:

Created 8 years ago

Updated 2 years ago

tensorflow by tensorflow

Open-source ML framework

norvig:

aravindsrinivas:

karpathy:

bcherny:

Created 10 years ago

Updated 13 hours ago

Feedback? Help us improve.