Optimus  by ChunyuanLI

VAE language model for latent space sentence manipulation

created 5 years ago
390 stars

Top 74.7% on sourcepulse

GitHubView on GitHub
Project Summary

Optimus is a pre-trained Variational Autoencoder (VAE) language model designed for organizing and manipulating sentences within a compact, smooth latent space. It targets researchers and practitioners in Natural Language Processing (NLP) looking to explore latent space properties for tasks like sentence interpolation, analogy, and guided generation. The primary benefit is enabling structured control and understanding of sentence semantics.

How It Works

Optimus employs a VAE architecture, comprising an encoder for representation learning and a decoder for generation. Sentences are mapped into a pre-trained latent space, allowing for manipulation and organization. This approach is advantageous for its ability to create a smooth and disentangled latent representation, facilitating semantically meaningful operations on text.

Quick Start & Requirements

  • Installation: Docker image chunyl/pytorch-transformers:v2 is recommended. Detailed environment setup instructions are in doc/env.md.
  • Prerequisites: PyTorch, Python. Specific dependencies are detailed in doc/env.md.
  • Data: Datasets need to be downloaded or prepared following instructions in data/download_datasets.md.
  • Resources: Pre-training was conducted on Microsoft's internal Philly compute cluster, suggesting significant multi-node, multi-GPU resources are required for reproduction.

Highlighted Details

  • Enables latent space manipulation, including sentence interpolation and analogy.
  • Provides fine-tuning code for language modeling and guided language generation.
  • Includes scripts for low-resource language understanding tasks.
  • Offers tools to collect and plot results, with an IPython notebook for visualization.

Maintenance & Community

The project is associated with Microsoft Research and the EMNLP 2020 paper "Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space." Contact information for questions is provided.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The pre-training code is specialized for Microsoft's internal Philly compute cluster, requiring adjustments for other distributed training environments. The README does not specify a license, which may impact commercial adoption.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.