VardaGPT  by ixaxaar

Associative memory-enhanced GPT-2 model

created 2 years ago
336 stars

Top 83.0% on sourcepulse

GitHubView on GitHub
Project Summary

VardaGPT enhances GPT-2 with an associative memory powered by FAISS, aiming to improve context retrieval and text generation. It's designed for researchers and developers interested in memory-augmented language models.

How It Works

VardaGPT integrates a FAISS-based associative memory with a GPT-2 model. During inference and training, it retrieves relevant information from the memory based on input embeddings. This retrieved information is concatenated with the original input embeddings before being processed by the GPT-2 transformer. This approach allows the model to access and utilize a larger, external knowledge base, potentially leading to more coherent and contextually relevant text generation.

Quick Start & Requirements

  • Primary install / run command: pip install -r requirements.txt followed by python train_varda_gpt_associative.py
  • Non-default prerequisites: Python 3.7+, PyTorch 1.8.1+, FAISS (CPU version specified).
  • Links: GitHub Repo

Highlighted Details

  • Leverages FAISS for efficient similarity search in the associative memory.
  • Modifies GPT-2 architecture to incorporate memory retrieval and concatenation.
  • Includes custom loss functions and training scripts for the memory-enhanced model.
  • Supports training on datasets like WikiText-2.

Maintenance & Community

  • The repository is maintained by ixaxaar.
  • No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

  • The README primarily focuses on training the VardaGPTAssociative model with FAISS CPU; GPU support for FAISS is not explicitly detailed in the provided text.
  • The project appears to be focused on GPT-2, and compatibility with newer or larger models is not discussed.
Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0.5%
393
Minimal GPT-2 implementation for educational purposes
created 1 year ago
updated 1 year ago
Feedback? Help us improve.