Optimus  by ChunyuanLI

VAE language model for latent space sentence manipulation

Created 5 years ago
391 stars

Top 73.5% on SourcePulse

GitHubView on GitHub
Project Summary

Optimus is a pre-trained Variational Autoencoder (VAE) language model designed for organizing and manipulating sentences within a compact, smooth latent space. It targets researchers and practitioners in Natural Language Processing (NLP) looking to explore latent space properties for tasks like sentence interpolation, analogy, and guided generation. The primary benefit is enabling structured control and understanding of sentence semantics.

How It Works

Optimus employs a VAE architecture, comprising an encoder for representation learning and a decoder for generation. Sentences are mapped into a pre-trained latent space, allowing for manipulation and organization. This approach is advantageous for its ability to create a smooth and disentangled latent representation, facilitating semantically meaningful operations on text.

Quick Start & Requirements

  • Installation: Docker image chunyl/pytorch-transformers:v2 is recommended. Detailed environment setup instructions are in doc/env.md.
  • Prerequisites: PyTorch, Python. Specific dependencies are detailed in doc/env.md.
  • Data: Datasets need to be downloaded or prepared following instructions in data/download_datasets.md.
  • Resources: Pre-training was conducted on Microsoft's internal Philly compute cluster, suggesting significant multi-node, multi-GPU resources are required for reproduction.

Highlighted Details

  • Enables latent space manipulation, including sentence interpolation and analogy.
  • Provides fine-tuning code for language modeling and guided language generation.
  • Includes scripts for low-resource language understanding tasks.
  • Offers tools to collect and plot results, with an IPython notebook for visualization.

Maintenance & Community

The project is associated with Microsoft Research and the EMNLP 2020 paper "Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space." Contact information for questions is provided.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The pre-training code is specialized for Microsoft's internal Philly compute cluster, requiring adjustments for other distributed training environments. The README does not specify a license, which may impact commercial adoption.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

pytorch-nlp-notebooks by scoutbee

0%
419
PyTorch tutorials for NLP tasks
Created 6 years ago
Updated 5 years ago
Starred by Andrew Kane Andrew Kane(Author of pgvector), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
11 more.

xlnet by zihangdai

0.0%
6k
Language model research paper using generalized autoregressive pretraining
Created 6 years ago
Updated 2 years ago
Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
18 more.

lectures by oxford-cs-deepnlp-2017

0.0%
16k
NLP course (lecture slides) for deep learning approaches to language
Created 8 years ago
Updated 2 years ago
Feedback? Help us improve.