unilm  by microsoft

Foundation models for language, vision, speech, and multimodal tasks

Created 6 years ago
21,730 stars

Top 2.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a central hub for Microsoft's foundational AI research, focusing on large-scale self-supervised pre-training across diverse tasks, languages, and modalities. It offers a comprehensive collection of models and architectures for NLP, computer vision, speech, and multimodal AI, targeting researchers and developers building advanced AI systems.

How It Works

The project's core strength lies in its "Big Convergence" philosophy, unifying pre-training methodologies across text, vision, speech, and their combinations. It leverages novel architectures like RetNet and BitNet for improved efficiency and scalability, and explores multimodal grounding with models like Kosmos-2.5. This unified approach aims for greater generality and capability in foundation models.

Quick Start & Requirements

  • Installation typically involves cloning repositories and following individual model instructions, often requiring PyTorch.
  • Many models require significant GPU resources (e.g., multiple A100s) and large datasets for training or fine-tuning.
  • Specific models may have unique dependencies detailed in their respective sub-directories.
  • Links to model releases and demos are provided throughout the README.

Highlighted Details

  • Features groundbreaking architectures like RetNet (Retentive Network) and BitNet (1-bit Transformers).
  • Includes state-of-the-art multimodal models such as Kosmos-2.5 and BEiT-3.
  • Offers a wide array of pre-trained models for over 100 languages and various modalities (vision, speech, document AI).
  • Provides toolkits for sequence-to-sequence fine-tuning and efficient decoding.

Maintenance & Community

  • Actively updated with recent releases (e.g., RedStone, LongNet, TextDiffuser-2).
  • Primary contact is Furu Wei (fuwei@microsoft.com) for inquiries. GitHub issues are used for model support.

Licensing & Compatibility

  • The project's license is specified in the LICENSE file.
  • Portions of the code are based on the Hugging Face transformers project. Specific model licenses may vary.

Limitations & Caveats

  • Many models are research prototypes and may require substantial computational resources and expertise for effective use or reproduction.
  • The sheer volume of models and research areas means some may be less actively maintained than others.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
115 stars in the last 30 days

Explore Similar Projects

Starred by Elvis Saravia Elvis Saravia(Founder of DAIR.AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-transformer-nlp by cedrickchee

0%
1k
Curated list of NLP resources for Transformer networks
Created 6 years ago
Updated 10 months ago
Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
18 more.

lectures by oxford-cs-deepnlp-2017

0.0%
16k
NLP course (lecture slides) for deep learning approaches to language
Created 8 years ago
Updated 2 years ago
Feedback? Help us improve.