LLMCompass  by PrincetonUniversity

LLM inference hardware design and simulation framework

Created 2 years ago
257 stars

Top 98.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary LLMCompass provides a research implementation for designing efficient hardware accelerators specifically for Large Language Model (LLM) inference. It targets researchers and engineers in computer architecture and machine learning hardware who need to explore and optimize the complex design space of LLM inference systems. The primary benefit is enabling the creation of more performant and energy-efficient hardware solutions for the growing demands of LLM deployment.

How It Works This repository contains the implementation details from the LLMCompass paper presented at ISCA 2024. The core approach centers on a simulation-driven methodology to evaluate and design hardware architectures for LLM inference. Utilizing the scalesim library, the framework allows for the modeling of various hardware configurations and their performance characteristics, facilitating systematic exploration of design trade-offs to achieve inference efficiency. This simulation-based approach enables cost-effective evaluation of numerous design choices before physical implementation.

Quick Start & Requirements

  • Installation: Recommended installation involves cloning the repository from GitHub using the ISCA_AE branch (git clone -b ISCA_AE https://github.com/PrincetonUniversity/LLMCompass) and initializing Git submodules (git submodule init && git submodule update --recursive). A Dockerfile is also available for a self-contained environment.
  • Prerequisites: Requires Python 3.9, PyTorch 2.0.0 (installed via conda), scalesim, matplotlib, seaborn, and scipy. An external cost model component (cost_model\supply_chain) must be downloaded separately from a Zenodo repository.
  • Experiments: The repository includes a suite of bash scripts (ae/figureX.sh) designed to run simulations and generate figures (5 through 12) corresponding to the research paper. Estimated execution times vary significantly, from approximately 1 minute for run_figure6.sh to 4 hours for run_figure12.sh, indicating substantial computational requirements for full experimentation.

Highlighted Details

  • The project provides a direct implementation of the LLMCompass framework, enabling users to reproduce or extend the research findings presented in the ISCA 2024 paper.
  • Includes executable scripts for generating specific figures (5-12), offering concrete examples of how the simulation framework can be used to analyze hardware design choices for LLM inference.
  • The dependency on scalesim suggests a focus on architectural simulation and performance modeling rather than direct hardware synthesis or deployment.

Limitations & Caveats The setup requires specific versions of Python (3.9) and PyTorch (2.0.0), potentially limiting compatibility with newer environments. The external dependency on a separate Zenodo download for the cost model adds an extra step and potential point of failure in the setup process. The extensive runtime for experiments (up to 4 hours) suggests significant computational resources are needed for thorough evaluation.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

1.5%
20k
Inference optimization for LLMs on low-resource hardware
Created 3 years ago
Updated 3 months ago
Feedback? Help us improve.