Discover and explore top open-source AI tools and projects—updated daily.
HPMLLDataset for LLM serving optimization
Top 99.6% on SourcePulse
Summary
HPMLL/BurstGPT offers a real-world workload trace dataset for LLM serving systems, specifically capturing interactions with ChatGPT (GPT-3.5) and GPT-4. This resource benefits researchers and engineers by providing realistic usage patterns to optimize the performance and efficiency of LLM inference infrastructure.
How It Works
The project releases detailed CSV traces collected over 110-121 consecutive days, encompassing millions of requests. It captures key metrics such as timestamps, session IDs, elapsed response times, model types (GPT-3.5/GPT-4), token counts, and log types (conversation/API). This data allows for the modeling and simulation of diverse LLM serving workloads, enabling the evaluation and enhancement of system throughput and latency.
Quick Start & Requirements
The dataset is available in CSV format across multiple files, including versions with and without failed requests (zero response tokens). A simple request generator demo is provided in the example/ directory. No specific software installation is detailed, as the primary artifact is the data itself. Users will need standard tools for CSV processing.
Highlighted Details
Maintenance & Community
Users can report issues or ask questions via a provided mailing list. No other community channels or explicit contributor information are detailed in the README.
Licensing & Compatibility
The README does not specify a software license or data usage terms. This lack of explicit licensing information may pose compatibility concerns for commercial use or integration into proprietary systems.
Limitations & Caveats
The README does not detail specific limitations of the dataset or its intended use. It focuses on its utility for optimizing LLM serving systems.
3 weeks ago
Inactive
kagisearch
langwatch
Maciek-roboblog
anthropics
risingwavelabs