KBLaM  by microsoft

Knowledge augmentation research paper

created 1 year ago
1,343 stars

Top 30.5% on sourcepulse

GitHubView on GitHub
Project Summary

KBLaM is a research project that augments Large Language Models (LLMs) with external knowledge bases without requiring retrieval modules. It targets researchers seeking to enhance LLM grounding and factual accuracy, offering a linear scaling computational overhead with knowledge base size, unlike quadratic scaling in in-context learning.

How It Works

KBLaM integrates knowledge by transforming knowledge base entries into special "knowledge tokens" that the LLM ingests via trained adapters. This approach leaves the base LLM's text input processing unmodified, ensuring that without a knowledge base, the model behaves identically to its base counterpart. The method's advantage lies in its efficient, linear scaling with KB size and its ability to ground responses in provided knowledge.

Quick Start & Requirements

  • Install: pip install -e .
  • Prerequisites: Hugging Face account and token for Llama models (huggingface-cli login). Azure OpenAI endpoint required for synthetic dataset generation. Supports text-embedding-ada-002 and all-MiniLM-L6-v2 for KB embeddings.
  • Links: Official Implementation, Hugging Face Hub

Highlighted Details

  • Supports Hugging Face models: meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Llama-3.2-1B-Instruct, Phi-3-mini-4k-instruct.
  • Evaluated on accuracy, refusal rate, and answer precision/recall against the knowledge base.
  • Training script train.py allows customization of dataset, batch size, steps, encoder, and embedding source.
  • Dataset generation scripts are available for synthetic KBs and QA pairs.

Maintenance & Community

  • Contributions are welcomed via pull requests, requiring agreement to a Contributor License Agreement (CLA).
  • Feedback can be provided by opening issues in the repository.
  • Follows the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

KBLaM is intended for research use and is not recommended for production settings. When used with knowledge bases significantly different from its training data, it may produce incomplete, reworded, or incorrect answers. Effective use requires training and use-case knowledge bases to be sufficiently similar.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
5
Issues (30d)
7
Star History
73 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.