mend by eric-mitchell

Fast model editing for LLMs

Created 4 years ago

253 stars

Top 99.3% on SourcePulse

2 Experts Love This Project

hiyouga

Author of LLaMA-Factory

hammer

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

MEND (Model Editing Networks using Gradient Decomposition) offers a method for efficiently editing large language models at scale. It targets researchers and practitioners needing to modify model behavior without full retraining, providing a faster alternative for knowledge injection or correction.

How It Works

The project implements model editing via gradient decomposition, enabling targeted modifications to model parameters. It supports various algorithms (MEND, EFK, ENN) and experiments, including text generation (gen), fact-checking (fc), and question-answering (qa), accommodating different model architectures like GPT, seq2seq, and BERT.

Quick Start & Requirements

Installation: Requires Python 3.7.9; other versions may work. Set up a virtual environment (python -m venv env, source env/bin/activate) and install dependencies (pip install -r requirements.txt).
Data: Download from a provided Google Drive link and extract into the mend/data directory.
Execution: Run training/evaluation via python -m run +alg=mend +experiment=gen +model=distilgpt2 data.wiki_webtext=False.
Configuration: Specific data flags (data.wiki_webtext, data.zsre_nq) may need adjustment based on the model and experiment. Multi-edit experiments require careful batch size configuration (e.g., data.n_edits=5 batch_size=6).
Dependencies: Pre-trained BERT and BART models are required for fc and qa experiments, respectively, sourced from De Cao et al.

Highlighted Details

"Fast Model Editing at Scale" is the core performance claim.
Supports multiple editing algorithms (MEND, EFK, ENN) and diverse downstream tasks (generation, fact-checking, QA).
Accommodates various model architectures, including GPT-style, seq2seq, and BERT.
Enables multi-edit experiments with configurable batching strategies for applying multiple edits.

Maintenance & Community

Primary contact for issues is via GitHub issues or direct email to the author (eric.mitchell@cs.stanford.edu). No community forums, sponsorships, or active development signals are present in the README.

Licensing & Compatibility

The README does not specify a software license. Compatibility for commercial use or integration into closed-source projects is undetermined.

Limitations & Caveats

Strict Python 3.7.9 requirement may limit compatibility with newer environments.
Model and experiment compatibility is not universal; specific combinations are required (e.g., GPT for gen, seq2seq for qa, BERT for fc).
Data configuration requires careful attention to avoid incorrect drawdown computations.
Absence of a stated license poses a significant adoption blocker for many use cases.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

3 stars in the last 30 days

Explore Similar Projects

kanana by kakao

Bilingual language models for Korean/English, compute-efficient vs. SOTA

Created 9 months ago

Updated 4 months ago

Mengzi3 by Langboat

LLM for multilingual generation, especially Chinese

Created 1 year ago

Updated 1 year ago

ChineseErrorCorrector by TW-NLP

Chinese text error correction models

Created 1 year ago

Updated 5 days ago

AlphaEdit by jianghoucheng

Knowledge editing via null-space projection

Created 1 year ago

Updated 1 month ago

Seed-Coder by ByteDance-Seed

Code LLM for code generation, completion, and reasoning tasks

Created 7 months ago

Updated 5 months ago

fmeval by aws

Evaluate foundation models for various NLP tasks

Created 2 years ago

Updated 3 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

1 more.

FastEdit by hiyouga

Tool for fast edits to large language models

Created 2 years ago

Updated 2 years ago

textgen by shibing624

Text generation models, including LLaMA, BLOOM, GPT2, BART, T5, etc

Created 4 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Lysandre Debut

Lysandre Debut(Chief Open-Source Officer at Hugging Face).

dllm by ZHZisZZ

Framework for diffusion language modeling

Created 2 months ago

Updated 22 hours ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

2 more.

rome by kmeng01

Model editing research paper for GPT-2 and GPT-J

Created 3 years ago

Updated 1 year ago

zero_nlp by yuanzhoulvpi2017

NLP solution for Chinese language models, data, training, and inference

Created 2 years ago

Updated 3 months ago

Starred by

Alex Atallah

Alex Atallah(Cofounder of OpenRouter, OpenSea),

Shyamal Anadkat

Shyamal Anadkat(Research Scientist at OpenAI), and

1 more.

gpt-llm-trainer by mshumer

LLM fine-tuning pipeline

Created 2 years ago

Updated 7 months ago

Feedback? Help us improve.