VardaGPT  by ixaxaar

Associative memory-enhanced GPT-2 model

Created 2 years ago
337 stars

Top 81.6% on SourcePulse

GitHubView on GitHub
Project Summary

VardaGPT enhances GPT-2 with an associative memory powered by FAISS, aiming to improve context retrieval and text generation. It's designed for researchers and developers interested in memory-augmented language models.

How It Works

VardaGPT integrates a FAISS-based associative memory with a GPT-2 model. During inference and training, it retrieves relevant information from the memory based on input embeddings. This retrieved information is concatenated with the original input embeddings before being processed by the GPT-2 transformer. This approach allows the model to access and utilize a larger, external knowledge base, potentially leading to more coherent and contextually relevant text generation.

Quick Start & Requirements

  • Primary install / run command: pip install -r requirements.txt followed by python train_varda_gpt_associative.py
  • Non-default prerequisites: Python 3.7+, PyTorch 1.8.1+, FAISS (CPU version specified).
  • Links: GitHub Repo

Highlighted Details

  • Leverages FAISS for efficient similarity search in the associative memory.
  • Modifies GPT-2 architecture to incorporate memory retrieval and concatenation.
  • Includes custom loss functions and training scripts for the memory-enhanced model.
  • Supports training on datasets like WikiText-2.

Maintenance & Community

  • The repository is maintained by ixaxaar.
  • No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

  • The README primarily focuses on training the VardaGPTAssociative model with FAISS CPU; GPU support for FAISS is not explicitly detailed in the provided text.
  • The project appears to be focused on GPT-2, and compatibility with newer or larger models is not discussed.
Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0%
397
Minimal GPT-2 implementation for educational purposes
Created 1 year ago
Updated 1 year ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Casper Hansen Casper Hansen(Author of AutoAWQ), and
1 more.

GPT2 by ConnorJL

0%
1k
GPT2 training implementation, supporting TPUs and GPUs
Created 6 years ago
Updated 2 years ago
Feedback? Help us improve.