mend  by eric-mitchell

Fast model editing for LLMs

Created 4 years ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

MEND (Model Editing Networks using Gradient Decomposition) offers a method for efficiently editing large language models at scale. It targets researchers and practitioners needing to modify model behavior without full retraining, providing a faster alternative for knowledge injection or correction.

How It Works

The project implements model editing via gradient decomposition, enabling targeted modifications to model parameters. It supports various algorithms (MEND, EFK, ENN) and experiments, including text generation (gen), fact-checking (fc), and question-answering (qa), accommodating different model architectures like GPT, seq2seq, and BERT.

Quick Start & Requirements

  • Installation: Requires Python 3.7.9; other versions may work. Set up a virtual environment (python -m venv env, source env/bin/activate) and install dependencies (pip install -r requirements.txt).
  • Data: Download from a provided Google Drive link and extract into the mend/data directory.
  • Execution: Run training/evaluation via python -m run +alg=mend +experiment=gen +model=distilgpt2 data.wiki_webtext=False.
  • Configuration: Specific data flags (data.wiki_webtext, data.zsre_nq) may need adjustment based on the model and experiment. Multi-edit experiments require careful batch size configuration (e.g., data.n_edits=5 batch_size=6).
  • Dependencies: Pre-trained BERT and BART models are required for fc and qa experiments, respectively, sourced from De Cao et al.

Highlighted Details

  • "Fast Model Editing at Scale" is the core performance claim.
  • Supports multiple editing algorithms (MEND, EFK, ENN) and diverse downstream tasks (generation, fact-checking, QA).
  • Accommodates various model architectures, including GPT-style, seq2seq, and BERT.
  • Enables multi-edit experiments with configurable batching strategies for applying multiple edits.

Maintenance & Community

  • Primary contact for issues is via GitHub issues or direct email to the author (eric.mitchell@cs.stanford.edu). No community forums, sponsorships, or active development signals are present in the README.

Licensing & Compatibility

  • The README does not specify a software license. Compatibility for commercial use or integration into closed-source projects is undetermined.

Limitations & Caveats

  • Strict Python 3.7.9 requirement may limit compatibility with newer environments.
  • Model and experiment compatibility is not universal; specific combinations are required (e.g., GPT for gen, seq2seq for qa, BERT for fc).
  • Data configuration requires careful attention to avoid incorrect drawdown computations.
  • Absence of a stated license poses a significant adoption blocker for many use cases.
Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

FastEdit by hiyouga

0%
1k
Tool for fast edits to large language models
Created 2 years ago
Updated 2 years ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
2 more.

rome by kmeng01

0.3%
673
Model editing research paper for GPT-2 and GPT-J
Created 3 years ago
Updated 1 year ago
Starred by Alex Atallah Alex Atallah(Cofounder of OpenRouter, OpenSea), Shyamal Anadkat Shyamal Anadkat(Research Scientist at OpenAI), and
1 more.

gpt-llm-trainer by mshumer

0.0%
4k
LLM fine-tuning pipeline
Created 2 years ago
Updated 5 months ago
Feedback? Help us improve.