neurips_llm_efficiency_challenge by llm-efficiency-challenge

Competition toolkit for efficient LLM inference on a single GPU

Created 2 years ago

259 stars

Top 97.9% on SourcePulse

View on GitHub

4 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Jeremy Howard

Cofounder of fast.ai

Sebastian Raschka

Author of "Build a Large Language Model (From Scratch)"

Artidoro Pagnoni

Coauthor of QLoRA; Research Scientist at Meta

Project Summary

This repository provides the framework and guidelines for the NeurIPS Large Language Model Efficiency Challenge, targeting researchers and engineers aiming to optimize LLM performance under strict resource constraints (1 LLM, 1 GPU, 1 Day). It facilitates reproducible submissions via Dockerfiles and evaluation using the HELM benchmark suite.

How It Works

Submissions are packaged as Dockerfiles containing all necessary code and dependencies. These Dockerfiles expose HTTP endpoints (/process and /tokenize) that are queried by the HELM evaluation framework. The approach emphasizes reproducibility and standardized evaluation, allowing participants to leverage provided sample submissions (Lit-GPT, llama-recipes) or custom frameworks for fine-tuning and deployment.

Quick Start & Requirements

Install/Run: Follow instructions in sample submissions (Lit-GPT, llama-recipes) and the main main.py for FastAPI server setup.
Prerequisites: Docker, Python, approved LLMs/datasets (list provided), access to a 40GB A100 or 4090 GPU for local evaluation. GPU funding options are available.
Resources: Local evaluation requires significant GPU resources. Submission evaluation can take 1-2 hours per submission.
Links: Sample Submissions, HELM Evaluation, Discord

Highlighted Details

Submissions are evaluated against a secret subset of HELM tasks, plus custom held-out tasks.
A Discord-based leaderboard bot (evalbot#4372) allows for early testing and performance feedback.
Final submissions require a single Dockerfile; model weights must be downloaded at build or runtime, not included directly.
AWS credits are available for eligible participants with proposals.

Maintenance & Community

The challenge is organized by the NeurIPS LLM Efficiency Challenge committee.
Community support and discussion are primarily channeled through their Discord server.
Key dates and timeline are available.

Licensing & Compatibility

Uses approved LLMs and datasets with specific licensing considerations. Participants must ensure their chosen models/datasets are permissible.
Compatibility for commercial use depends on the licenses of the chosen LLMs and datasets.

Limitations & Caveats

The exact evaluation tasks are not disclosed until after the submission deadline. Participants must ensure their Dockerfile correctly builds and runs the HTTP server according to the provided OpenAPI specification.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days