LLMCompass by PrincetonUniversity

LLM inference hardware design and simulation framework

Created 2 years ago

260 stars

Top 97.5% on SourcePulse

Project Summary

Summary LLMCompass provides a research implementation for designing efficient hardware accelerators specifically for Large Language Model (LLM) inference. It targets researchers and engineers in computer architecture and machine learning hardware who need to explore and optimize the complex design space of LLM inference systems. The primary benefit is enabling the creation of more performant and energy-efficient hardware solutions for the growing demands of LLM deployment.

How It Works This repository contains the implementation details from the LLMCompass paper presented at ISCA 2024. The core approach centers on a simulation-driven methodology to evaluate and design hardware architectures for LLM inference. Utilizing the scalesim library, the framework allows for the modeling of various hardware configurations and their performance characteristics, facilitating systematic exploration of design trade-offs to achieve inference efficiency. This simulation-based approach enables cost-effective evaluation of numerous design choices before physical implementation.

Quick Start & Requirements

Installation: Recommended installation involves cloning the repository from GitHub using the ISCA_AE branch (git clone -b ISCA_AE https://github.com/PrincetonUniversity/LLMCompass) and initializing Git submodules (git submodule init && git submodule update --recursive). A Dockerfile is also available for a self-contained environment.
Prerequisites: Requires Python 3.9, PyTorch 2.0.0 (installed via conda), scalesim, matplotlib, seaborn, and scipy. An external cost model component (cost_model\supply_chain) must be downloaded separately from a Zenodo repository.
Experiments: The repository includes a suite of bash scripts (ae/figureX.sh) designed to run simulations and generate figures (5 through 12) corresponding to the research paper. Estimated execution times vary significantly, from approximately 1 minute for run_figure6.sh to 4 hours for run_figure12.sh, indicating substantial computational requirements for full experimentation.

Highlighted Details

The project provides a direct implementation of the LLMCompass framework, enabling users to reproduce or extend the research findings presented in the ISCA 2024 paper.
Includes executable scripts for generating specific figures (5-12), offering concrete examples of how the simulation framework can be used to analyze hardware design choices for LLM inference.
The dependency on scalesim suggests a focus on architectural simulation and performance modeling rather than direct hardware synthesis or deployment.

Limitations & Caveats The setup requires specific versions of Python (3.9) and PyTorch (2.0.0), potentially limiting compatibility with newer environments. The external dependency on a separate Zenodo download for the cost model adds an extra step and potential point of failure in the setup process. The extensive runtime for experiments (up to 4 hours) suggests significant computational resources are needed for thorough evaluation.

LLMCompass by PrincetonUniversity

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

LLM-Calc by RayFernando1337

xinfer by guoqingbao

ssd by tanishqkumar

atlas by Avarok-Cybersecurity

vLLM-Kunlun by baidu

LLM-Viewer by hahnyuan

ollm by Mega4alik

lucebox by Luce-Org

tpu-inference by vllm-project

llmfit by AlexsJones

airllm by lyogavin