Ying Sheng

Coauthor of SGLang

Starred Projects (131)

mini-sglang by sgl-project

Lightweight LLM inference framework with advanced optimizations

bryanhelmig:

merrymercy:

Created 4 months ago

Updated 5 days ago

TileRT by tile-ai

Ultra-low-latency LLM inference runtime

Created 2 months ago

Updated 2 weeks ago

miles by radixark

Enterprise RL for large-scale MoE models

wsxiaoys:

lewtun:

ekzhang:

bryanhelmig:

Created 3 months ago

Updated 1 day ago

SpecForge by sgl-project

Train speculative decoding models for faster inference

merrymercy:

xiezhq-hermann:

zhyncs:

Created 7 months ago

Updated 9 hours ago

genai-bench by sgl-project

LLM serving performance benchmarking

merrymercy:

zhyncs:

Created 6 months ago

Updated 2 days ago

ome by sgl-project

Kubernetes operator for LLM serving

merrymercy:

zhyncs:

Created 7 months ago

Updated 1 day ago

ChatLearn by alibaba

Training framework for large-scale alignment tasks

Created 2 years ago

Updated 2 months ago

OpenRLHF by OpenRLHF

RLHF framework for scalable training of large language models

beyang:

parano:

vincentweisser:

binarybana:

Created 2 years ago

Updated 3 days ago

verl by volcengine

RL training library for LLMs

WoosukKwon:

hammer:

yiranwu0:

luiscape:

Created 1 year ago

Updated 11 hours ago

how-to-optim-algorithm-in-cuda by BBuf

CUDA optimization guide for common algorithms

peakji:

Created 7 years ago

Updated 3 days ago

sgl-learning-materials by sgl-project

Learning materials for SGLang, an efficient LLM serving engine

merrymercy:

Created 1 year ago

Updated 6 days ago

glake by antgroup

GPU optimization library for memory management and IO

Created 2 years ago

Updated 9 months ago

xgrammar by mlc-ai

Library for efficient structured generation

merrymercy:

ogabrielluiz:

youkaichao:

simonw:

Created 1 year ago

Updated 9 hours ago

NeMo by NVIDIA-NeMo

Scalable generative AI framework for LLMs, multimodal, and speech AI research

alexchen4ai:

shizhediao:

robinjhuang:

tjbck:

Created 6 years ago

Updated 14 hours ago

Nanoflow by efeslab

LLM serving framework for high throughput

WoosukKwon:

zhyncs:

zhuohan123:

Created 1 year ago

Updated 2 months ago

InfiniteBench by OpenBMB

Benchmark for evaluating language models on super-long contexts (100k+ tokens)

casper-hansen:

osanseviero:

Created 2 years ago

Updated 1 year ago

simple-evals by openai

Lightweight library for evaluating language models

hammer:

patrickvonplaten:

simonw:

zhyncs:

Created 1 year ago

Updated 5 months ago

ScaleLLM by vectorch-ai

LLM inference system for production environments

Created 2 years ago

Updated 3 weeks ago

mergekit by arcee-ai

CLI tool for merging pretrained language models, combining strengths without retraining

shizhediao:

transitive-bullshit:

thomwolf:

andreasjansson:

Created 2 years ago

Updated 1 week ago

RouteLLM by lm-sys

Framework for LLM routing and cost reduction (research paper)

chiphuyen:

ebursztein:

hammer:

ogabrielluiz:

Created 1 year ago

Updated 1 year ago

GPTQModel by ModelCloud

LLM compression toolkit for accelerated CPU/GPU inference

hiyouga:

Created 1 year ago

Updated 1 day ago

Mooncake by kvcache-ai

Research paper on a disaggregated architecture for LLM serving

jiamings:

luiscape:

merrymercy:

WoosukKwon:

Created 1 year ago

Updated 1 day ago

SWE-bench by SWE-bench

Benchmark for evaluating LLMs on real-world GitHub issues

patrickvonplaten:

pgarbacki:

shizhediao:

transitive-bullshit:

Created 2 years ago

Updated 6 days ago

inspect_ai by UKGovernmentBEIS

Framework for large language model evaluations

JohannesHa:

pgarbacki:

simonw:

eugeneyan:

Created 2 years ago

Updated 1 day ago

Quest by mit-han-lab

Inference framework for efficient long-context LLM inference

Created 1 year ago

Updated 6 months ago

dspy by stanfordnlp

Framework for programming language models, not prompting

tobi:

mateiz:

vincentweisser:

pgarbacki:

Created 3 years ago

Updated 3 days ago

DoRA by NVlabs

PyTorch code for weight-decomposed low-rank adaptation (DoRA)

chiphuyen:

hiyouga:

Created 1 year ago

Updated 1 year ago

RULER by NVIDIA

Evaluation suite for long-context language models research paper

chiphuyen:

didierrlopes:

pgarbacki:

Created 1 year ago

Updated 1 month ago

mirage by mirage-project

Tool for fast GPU kernel generation via superoptimization

geohot:

zhuohan123:

binarybana:

simon-mo:

Created 1 year ago

Updated 3 days ago

beir by beir-cellar

IR benchmark for evaluating NLP retrieval models

jerryjliu:

jn2clark:

omarsar:

hammer:

Created 5 years ago

Updated 2 months ago

storm by stanford-oval

LLM system for automated knowledge curation and article generation

chiphuyen:

casper-hansen:

pgarbacki:

ogabrielluiz:

Created 1 year ago

Updated 3 months ago

llm.c by karpathy

LLM training in pure C/CUDA, no PyTorch needed

norvig:

alexey-milovidov:

didierrlopes:

hiyouga:

Created 1 year ago

Updated 6 months ago

LeetCUDA by xlite-dev

CUDA learning notes for beginners using PyTorch

Created 3 years ago

Updated 4 days ago

candle by huggingface

Minimalist ML framework for Rust, emphasizing performance and ease of use

tobi:

binarybana:

alexchen4ai:

jn2clark:

Created 2 years ago

Updated 3 days ago

Open-Sora-Plan by PKU-YuanGroup

Open-source project aiming to reproduce Sora-like T2V model

omarsar:

pgarbacki:

parasj:

chiphuyen:

Created 1 year ago

Updated 2 months ago

MiniGPT4-video by Vision-CAIR

Video-language model for short and long video understanding

Created 1 year ago

Updated 1 year ago

adapters by adapter-hub

Unified library for parameter-efficient transfer learning in NLP

chiphuyen:

ogabrielluiz:

osanseviero:

rodrigosnader:

Created 5 years ago

Updated 3 months ago

lmdeploy by InternLM

Toolkit for LLM compression, deployment, and serving

shimmyshimmer:

wsxiaoys:

luiscape:

jn2clark:

Created 2 years ago

Updated 2 days ago

cutlass by NVIDIA

CUDA C++ and Python DSLs for high-performance linear algebra

tridao:

chiphuyen:

joker-eph:

mattjj:

Created 8 years ago

Updated 2 days ago

guidance by guidance-ai

Guidance is a programming paradigm for steering LLMs

tobi:

ekzhu:

stas00:

JustinLin610:

Created 3 years ago

Updated 5 days ago

trl by huggingface

Library for transformer RL

jeffchuber:

vincentweisser:

tjbck:

alexchen4ai:

Created 5 years ago

Updated 2 days ago

sglang by sgl-project

Fast serving framework for LLMs and vision language models

shimmyshimmer:

beyang:

samlambert:

ebursztein:

Created 2 years ago

Updated 9 hours ago

MultiPL-E by nuprl

Benchmark for evaluating code generation LLMs across multiple programming languages

Created 3 years ago

Updated 6 days ago

rags by run-llama

Streamlit app for building RAG pipelines via natural language

luiscape:

jerryjliu:

Created 2 years ago

Updated 1 year ago

llm-reasoners by maitrix-org

Library for advanced LLM reasoning with search algorithms

lewtun:

chiphuyen:

CodeCreator:

mlabonne:

Created 2 years ago

Updated 7 months ago

ToolBench by OpenBMB

Open platform for LLM tool learning (ICLR'24 spotlight)

pgarbacki:

chiphuyen:

andreasjansson:

parano:

Created 2 years ago

Updated 7 months ago

autogen by microsoft

Agentic framework for multi-agent AI applications

wesm:

chiphuyen:

ebursztein:

gagb:

Created 2 years ago

Updated 3 months ago

webarena by web-arena-x

Web environment for autonomous agent development

Jiayi-Pan:

transitive-bullshit:

Created 2 years ago

Updated 1 month ago

WebGLM by THUDM

Web-enhanced question answering system using a 10B GLM

Created 2 years ago

Updated 9 months ago

DeepSpeed-MII by deepspeedai

Python library for high-throughput, low-latency, and cost-effective model inference

hiyouga:

merrymercy:

Sanger2000:

casper-hansen:

Created 3 years ago

Updated 6 months ago

megablocks by databricks

Lightweight library for mixture-of-experts (MoE) training

mateiz:

hiyouga:

jfrankle:

CodeCreator:

Created 3 years ago

Updated 6 months ago

EAGLE by SafeAILab

Speculative decoding research paper for faster LLM inference

shizhediao:

zhyncs:

philschmid:

pgarbacki:

Created 2 years ago

Updated 3 weeks ago

gpt-fast by meta-pytorch

PyTorch text generation for efficient transformer inference

karpathy:

antiagainst:

jamesr66a:

merrymercy:

Created 2 years ago

Updated 4 months ago

VLM_survey by jingyi0000

VLM survey paper with links to models/methods for vision tasks

Created 2 years ago

Updated 2 months ago

dify by langgenius

Open-source LLM app development platform

tobi:

shizhediao:

ekzhu:

handotdev:

Created 2 years ago

Updated 12 hours ago

LookaheadDecoding by hao-ai-lab

Parallel decoding algorithm for faster LLM inference

chiphuyen:

bryanhelmig:

zhuohan123:

Created 2 years ago

Updated 10 months ago

ARES by stanford-futuredata

RAG evaluation framework

gregpr07:

transitive-bullshit:

Created 2 years ago

Updated 9 months ago

llm-analysis by cli99

CLI tool for LLM latency/memory analysis during training/inference

stas00:

Created 2 years ago

Updated 8 months ago

Awesome-LLM-Inference by xlite-dev

Curated list of LLM/VLM inference research papers with code

Created 2 years ago

Updated 1 month ago

gpt_paper_assistant by tatsu-lab

ArXiv scanner using GPT-4 for personalized paper recommendations

rodrigosnader:

hiyouga:

Edward-Sun:

soldni:

Created 2 years ago

Updated 1 year ago

MergeLM by yule-BUAA

Codebase for merging language models via parameter averaging

JohannesHa:

winglian:

hiyouga:

Created 2 years ago

Updated 1 year ago

ChatRTX by NVIDIA

Demo app for local RAG chatbot on Windows

omarsar:

JustinLin610:

jerryjliu:

merrymercy:

Created 2 years ago

Updated 9 months ago

DeepSeek-Coder by deepseek-ai

Code LLM for code completion and generation

willingc:

transitive-bullshit:

junxiaosong:

guoday:

Created 2 years ago

Updated 2 months ago

llm-decontaminator by lm-sys

LLM contamination detector for quantifying rephrased samples

casper-hansen:

huybery:

merrymercy:

infwinston:

Created 2 years ago

Updated 2 years ago

S-LoRA by S-LoRA

System for scalable LoRA adapter serving

chiphuyen:

JohannesHa:

winglian:

osanseviero:

Created 2 years ago

Updated 2 years ago

flashinfer by flashinfer-ai

Kernel library for LLM serving

chiphuyen:

hammer:

JustinLin610:

luiscape:

Created 2 years ago

Updated 10 hours ago

skypilot by skypilot-org

Framework for cloud AI/batch jobs, unifying execution across diverse infrastructure

karpathy:

tobi:

amin3141:

luiscape:

Created 4 years ago

Updated 10 hours ago

gpu_poor by RahulSChand

CLI tool for LLM memory and throughput estimation

Created 2 years ago

Updated 1 year ago

ggml by ggml-org

Tensor library for machine learning

tunguz:

alexchen4ai:

zhiyuan8:

hugs:

Created 3 years ago

Updated 10 hours ago

tensorrtllm_backend by triton-inference-server

Triton backend for serving TensorRT-LLM models

zhyncs:

NikolaBorisov:

guberti:

tuhins:

Created 2 years ago

Updated 2 days ago

TensorRT-LLM by NVIDIA

LLM inference optimization SDK for NVIDIA GPUs

beyang:

hammer:

zhyncs:

shizhediao:

Created 2 years ago

Updated 11 hours ago

leptonai by leptonai

Python framework for simplifying AI service building

hammer:

chiphuyen:

zhiyuan8:

JustinLin610:

Created 2 years ago

Updated 2 days ago

punica by punica-ai

LoRA serving system (research paper) for multi-tenant LLM inference

winglian:

hammer:

chiphuyen:

WoosukKwon:

Created 2 years ago

Updated 1 year ago

modular by modular

AI toolchain unifying fragmented AI deployment workflows

lattner:

tobi:

zhyncs:

flaque:

Created 2 years ago

Updated 19 hours ago

llm-numbers by ray-project

LLM developer's reference for key numbers

cournape:

chiphuyen:

pirroh:

vnivargi:

Created 2 years ago

Updated 2 years ago

LLaMA2-Accessory by Alpha-VLLM

Open-source toolkit for LLM development, pretraining, finetuning, and deployment

jn2clark:

chiphuyen:

transitive-bullshit:

omarsar:

Created 2 years ago

Updated 1 year ago

rerope by bojone

Position embeddings research paper

winglian:

jph00:

merrymercy:

Created 2 years ago

Updated 1 year ago

LightLLM by ModelTC

Python framework for LLM inference and serving

zhyncs:

chiphuyen:

pgarbacki:

JustinLin610:

Created 2 years ago

Updated 1 day ago

GPTCache by zilliztech

Semantic cache for LLM queries, integrated with LangChain and LlamaIndex

hiyouga:

chiphuyen:

hammer:

ogabrielluiz:

Created 2 years ago

Updated 6 months ago

llama by meta-llama

Inference code for Llama 2 models (deprecated)

froystig:

xiezhq-hermann:

fabhed:

borzunov:

Created 2 years ago

Updated 11 months ago

LLM-Training-Puzzles by srush

Hands-on puzzles for large language model training

Jiayi-Pan:

albertfgu:

willccbb:

hiyouga:

Created 2 years ago

Updated 2 years ago

ringattention by haoliuhl

Jax implementation of RingAttention for large context models (research paper)

vincentweisser:

JohannesHa:

Created 2 years ago

Updated 3 months ago

metal-flash-attention by philipturner

Metal port of FlashAttention for Apple silicon

winglian:

osanseviero:

jph00:

tridao:

Created 2 years ago

Updated 1 year ago

companion-app by a16z-infra

AI companion stack for personalized chatbots

tobi:

chiphuyen:

didierrlopes:

mckaywrigley:

Created 2 years ago

Updated 1 year ago

bitsandbytes by bitsandbytes-foundation

PyTorch library for k-bit quantization, enabling accessible LLMs

tjbck:

alexchen4ai:

danielhanchen:

shimmyshimmer:

Created 4 years ago

Updated 3 days ago

long_llama by CStanKonrad

LLM for long context handling, fine-tuned with Focused Transformer

merrymercy:

pgarbacki:

simonw:

Created 2 years ago

Updated 2 years ago

scalene by plasma-umass

Python profiler with AI-powered optimization proposals

luiscape:

xiezhq-hermann:

zhuohan123:

trishume:

Created 6 years ago

Updated 2 weeks ago

LLMSurvey by RUCAIBox

Survey paper for large language models

winglian:

transitive-bullshit:

omarsar:

shizhediao:

Created 2 years ago

Updated 10 months ago

Awesome-LLM-Compression by HuangOwen

LLM compression papers and tools for efficient training/inference

Created 2 years ago

Updated 2 months ago

fastllm by ztxz16

High-performance C++ LLM inference library

chiphuyen:

Created 2 years ago

Updated 1 month ago

H2O by FMInference

KV cache eviction research paper for efficient LLM inference

hiyouga:

pgarbacki:

Created 2 years ago

Updated 1 year ago

LongChat by DachengLi1

Long-context LLM chatbot training and evaluation framework

casper-hansen:

hiyouga:

huybery:

pgarbacki:

Created 2 years ago

Updated 1 year ago

xgen by salesforce

LLM research release with 8k sequence length

JustinLin610:

huybery:

jaredpalmer:

shizhediao:

Created 2 years ago

Updated 11 months ago

flexflow-train by flexflow

Accelerating distributed deep learning training

parano:

jph00:

Edward-Sun:

NikolaBorisov:

Created 7 years ago

Updated 1 day ago

peft by huggingface

Parameter-efficient fine-tuning (PEFT) library

tobi:

gakonst:

chiphuyen:

zhuohan123:

Created 3 years ago

Updated 2 days ago

AutoGPT by Significant-Gravitas

AI agent platform for building, deploying, and running autonomous workflows

lilianweng:

julien-c:

deshraj:

sxyu:

Created 2 years ago

Updated 18 hours ago

CTranslate2 by OpenNMT

Fast inference engine for Transformer models

merrymercy:

simonw:

eugeneyan:

jph00:

Created 6 years ago

Updated 9 hours ago

text-generation-inference by huggingface

Rust/Python/gRPC server for fast LLM text generation

tobi:

clmnt:

tjbck:

chiphuyen:

Created 3 years ago

Updated 3 days ago

mlc-llm by mlc-ai

Universal LLM deployment engine with ML compilation

tobi:

osanseviero:

zhiyuan8:

zhuohan123:

Created 2 years ago

Updated 1 week ago

awesome-mixture-of-experts by XueFuzhao

Curated list of resources for mixture-of-experts (MoE) research

Created 3 years ago

Updated 1 year ago

InternLM-techreport by InternLM

Multilingual LLM research paper with 104B parameters

hiyouga:

JustinLin610:

Created 2 years ago

Updated 2 years ago

awesome-chatgpt-dataset by voidful

Dataset repo for LLM training

infwinston:

Created 2 years ago

Updated 2 months ago

RWKV-LM by BlinkDL

RNN for LLM, transformer-level performance, parallelizable training

geohot:

danielgross:

nat:

sxyu:

Created 4 years ago

Updated 3 weeks ago

qdrant by qdrant

Vector database for similarity search in AI applications

tobi:

jeffchuber:

hsbt:

gregpr07:

Created 5 years ago

Updated 1 day ago

alpaca-lora by tloen

LoRA fine-tuning for LLaMA

JustinLin610:

vincentweisser:

nirga:

chiphuyen:

Created 2 years ago

Updated 1 year ago

trlx by CarperAI

Distributed RLHF for LLMs

nat:

chiphuyen:

eugeneyan:

huybery:

Created 3 years ago

Updated 2 years ago

MOSS by OpenMOSS

Open-source tool-augmented conversational language model

chiphuyen:

osanseviero:

hammer:

JustinLin610:

Created 2 years ago

Updated 1 year ago

CodeGeeX by zai-org

Code generation model for multilingual programming

Created 3 years ago

Updated 1 year ago

baize-chatbot by project-baize

Chat model trained via LoRA, using ChatGPT-generated dialogs

winglian:

pgarbacki:

tunguz:

teknium1:

Created 2 years ago

Updated 1 year ago

ChatGDB by pgosar

CLI tool for debugging with natural language via LLM

xiezhq-hermann:

handotdev:

Created 2 years ago

Updated 1 year ago

ByteTransformer by bytedance

High-performance BERT transformer inference on NVIDIA GPUs

Created 2 years ago

Updated 1 year ago

FastChat by lm-sys

Open platform for training, serving, and evaluating LLM-based chatbots

zjasper666:

aangelopoulos:

osanseviero:

natolambert:

Created 2 years ago

Updated 7 months ago

fastertransformer_backend by triton-inference-server

Triton backend for optimized transformer inference

Created 4 years ago

Updated 2 years ago

LMFlow by OptimalScale

Toolkit for finetuning and inference of large foundation models

tobi:

shizhediao:

ebursztein:

zhuohan123:

Created 2 years ago

Updated 2 days ago

langchain by langchain-ai

Framework for building LLM-powered applications

karpathy:

MagMueller:

gregpr07:

willingc:

Created 3 years ago

Updated 1 day ago

llama_index by run-llama

Data framework for building LLM-powered agents

karpathy:

atroyn:

zhuohan123:

JustinLin610:

Created 3 years ago

Updated 3 days ago

FlexLLMGen by FMInference

High-throughput generation engine for LLMs with limited GPU memory

chiphuyen:

jrk:

jph00:

parano:

Created 2 years ago

Updated 1 year ago

PiPPy by pytorch

PyTorch tool for pipeline parallelism

yang-song:

jph00:

JohannesHa:

stas00:

Created 4 years ago

Updated 1 year ago

best_AI_papers_2022 by louisfb01

AI paper list (2022) with video explanations and code

vincentweisser:

omarsar:

JustinLin610:

Created 4 years ago

Updated 2 years ago

metaseq by facebookresearch

Codebase for large-scale transformer model development and deployment

chiphuyen:

gakonst:

xiezhq-hermann:

soldni:

Created 3 years ago

Updated 1 year ago

dpm-solver by LuChengTHU

Fast ODE solver for diffusion probabilistic model sampling

Edward-Sun:

merrymercy:

patrickvonplaten:

PiotrDabkowski:

Created 3 years ago

Updated 1 year ago

FasterTransformer by NVIDIA

Optimized transformer library for inference

nat:

chiphuyen:

JustinLin610:

mfuntowicz:

Created 4 years ago

Updated 1 year ago

CodeGen by salesforce

Open-source model family for program synthesis

nat:

hiyouga:

hammer:

omarsar:

Created 3 years ago

Updated 2 months ago

GLM-130B by zai-org

Bilingual model for research and evaluation

soldni:

wassemgtk:

jiamings:

mckaywrigley:

Created 3 years ago

Updated 2 years ago

Megatron-LM by NVIDIA

Framework for training transformer models at scale

jiamings:

gravicle:

alexchen4ai:

parasj:

Created 6 years ago

Updated 14 hours ago

llm-seminar by craffel

Course reading list for large language models

hammer:

JohannesHa:

Created 3 years ago

Updated 3 years ago

paper-reading by mli

Deep learning paper readings

parano:

omarsar:

Jiayi-Pan:

merrymercy:

Created 4 years ago

Updated 9 months ago

CodeT5 by salesforce

Code LLMs for code understanding and generation research

hammer:

omarsar:

beyang:

eugeneyan:

Created 4 years ago

Updated 2 years ago

awesome-tensor-compilers by merrymercy

Curated list of tensor compiler projects and papers

chiphuyen:

infwinston:

luiscape:

suquark:

Created 5 years ago

Updated 1 year ago

Dive-into-DL-PyTorch by ShusenTang

PyTorch rewrite of "Dive into Deep Learning" book

Created 6 years ago

Updated 4 years ago

alpa by alpa-projects

Auto-parallelization framework for large-scale neural network training and serving

chiphuyen:

Jiayi-Pan:

transitive-bullshit:

soldni:

Created 4 years ago

Updated 2 years ago

Feedback? Help us improve.