Competition toolkit for efficient LLM inference on a single GPU
Top 99.0% on sourcepulse
This repository provides the framework and guidelines for the NeurIPS Large Language Model Efficiency Challenge, targeting researchers and engineers aiming to optimize LLM performance under strict resource constraints (1 LLM, 1 GPU, 1 Day). It facilitates reproducible submissions via Dockerfiles and evaluation using the HELM benchmark suite.
How It Works
Submissions are packaged as Dockerfiles containing all necessary code and dependencies. These Dockerfiles expose HTTP endpoints (/process
and /tokenize
) that are queried by the HELM evaluation framework. The approach emphasizes reproducibility and standardized evaluation, allowing participants to leverage provided sample submissions (Lit-GPT, llama-recipes) or custom frameworks for fine-tuning and deployment.
Quick Start & Requirements
main.py
for FastAPI server setup.Highlighted Details
evalbot#4372
) allows for early testing and performance feedback.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The exact evaluation tasks are not disclosed until after the submission deadline. Participants must ensure their Dockerfile correctly builds and runs the HTTP server according to the provided OpenAPI specification.
1 year ago
1 day