LLMDrift by lchen001

Research paper and dataset for LLM service behavior analysis over time

Created 2 years ago

339 stars

Top 81.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Paul Gauthier

Founder of Aider

Project Summary

This repository addresses the opacity surrounding updates to large language models (LLMs) like GPT-4 and GPT-3.5 by providing datasets and historical generations. It enables researchers and users to track and understand behavioral shifts in LLM services over time, highlighting performance variations and potential degradations.

How It Works

The project collects diverse datasets and prompts LLMs, capturing their responses across different time points. This allows for quantitative analysis of performance drifts, as demonstrated by examples like GPT-4's accuracy drop in prime number identification between March and June 2023. The approach facilitates empirical study of LLM evolution without requiring direct API access for reproduction.

Quick Start & Requirements

Reproduce Figures: Run the provided Google Colab Notebook. No API keys are needed.
Obtain Generations: Use the Python system (requires OpenAI API key).
Datasets & Generations: Available under generation/. Each CSV contains model, query parameters, query, reference answer, generated answer, and latency.
Links: Twitter Threads, Academic Paper

Highlighted Details

Demonstrates significant performance shifts in GPT-4 and GPT-3.5 across tasks like math, coding, and reasoning.
Includes a Python system for generating LLM responses on custom datasets.
Provides detailed CSVs with model outputs, latency, and query parameters for in-depth analysis.
Added performance monitoring code in January 2024.

Maintenance & Community

The project is associated with researchers from Stanford University. Updates are logged in the changelog, with the initial release in July 2023.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Obtaining new generations requires an OpenAI API key, and the specific LLM versions tested are tied to the dates of generation. The README does not detail the underlying infrastructure or specific hardware used for the original generations.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days