doc-to-lora by SakanaAI

Instantly internalize factual context into LLMs using hypernetworks

Created 4 months ago

742 stars

Top 46.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Cofounder of Fireworks AI

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Doc-to-LoRA (D2L) provides a method for Large Language Models (LLMs) to instantly internalize factual information from documents without full retraining. This technique, utilizing hypernetworks, allows LLMs to recall specific, dynamic contexts on demand, benefiting applications requiring up-to-date or specialized knowledge. It targets researchers and developers seeking efficient LLM adaptation for factual recall.

How It Works

D2L employs hypernetworks to dynamically update LLM weights, enabling them to "instantly internalize contexts." This approach avoids costly fine-tuning by injecting factual knowledge directly. The core ModulatedPretrainedModel allows for context internalization via model.internalize(doc) and subsequent removal via model.reset(), influencing generation outputs based on the learned information. This design offers a novel way to manage LLM knowledge dynamically.

Quick Start & Requirements

Installation: Use uv for installation: curl -LsSf https://astral.sh/uv/install.sh | sh then ./install.sh.
Pre-trained Models: Download via Hugging Face CLI after logging in: uv run huggingface-cli login and uv run huggingface-cli download SakanaAI/doc-to-lora --local-dir trained_d2l --include "*/".
Prerequisites: uv package manager, Hugging Face account and CLI access, PyTorch.
Links: GitHub repository available at https://github.com/SakanaAI/doc-to-lora. A paper, Hugging Face models, and an interactive demo are also mentioned.

Highlighted Details

Features an interactive web demo and a video demonstration.
Includes experimental scripts for main experiments, NIAH (Not-In-A-Hundred) data generation, and a self-generated data viewer.
Supports dynamic context management, allowing information to be added and then completely removed from the model's active memory.

Maintenance & Community

The project is associated with Sakana AI. No specific community channels (e.g., Discord, Slack) or details on active contributors/sponsorships are provided in the README.

Licensing & Compatibility

The README does not specify the software license. This omission requires further investigation for compatibility, especially for commercial use.

Limitations & Caveats

The current Python API for ModulatedPretrainedModel explicitly states it supports only non-batched inputs. The presence of "experimental scripts" suggests ongoing development and potential instability.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days