AliceMind  by alibaba

Collection of pre-trained encoder-decoder models and optimization techniques

created 4 years ago
2,045 stars

Top 22.2% on sourcepulse

GitHubView on GitHub
Project Summary

AliceMind is a comprehensive collection of pre-trained encoder-decoder models and optimization techniques from Alibaba's MinD Lab, targeting researchers and developers in NLP and multimodal AI. It offers a wide array of models for tasks spanning text, image, and video understanding and generation, alongside efficient fine-tuning and compression methods.

How It Works

AliceMind provides a modularized foundation for large multimodal language models (LMMs), enabling modal collaboration. Its models are pre-trained on large-scale datasets using both discriminative and generative objectives. Key innovations include parameter-efficient fine-tuning methods like ChildTuning and PST, and compression techniques like ContrastivePruning, all designed to enhance generalization and reduce resource requirements.

Quick Start & Requirements

  • Installation and usage are facilitated by the SOFA modeling toolkit, designed for easy distribution and access to AliceMind models.
  • Specific model requirements (e.g., GPU, CUDA versions) are not explicitly detailed in the README but are typical for large language models.
  • Links to official resources: AliceMind Official Website, AliceMind Open Platform.

Highlighted Details

  • Features mPLUG-Owl2, a multimodal LLM for LLM/MLLM collaboration, accepted by CVPR 2024.
  • Includes mPLUG-DocOwl, an OCR-free multimodal LLM for document understanding, accepted by EMNLP 2023.
  • Offers Youku-mPLUG, a large Chinese video-language dataset and model.
  • Provides a diverse range of models including PLUG (Chinese LLM), mPLUG-2 (multimodal), SDCUP (table understanding), LatticeBERT (Chinese multi-granularity), StructuralLM (document-image), StructVBERT (vision-language), VECO (cross-lingual), PALM (NLG), and StructBERT (NLU).

Maintenance & Community

  • Active development with recent updates and publications (CVPR 2024, EMNLP 2023, ICML 2023).
  • Support is available via GitHub issues. A DingTalk group (ID: 35738533) is provided for user interaction. Business inquiries can be directed to nlp-support@list.alibaba-inc.com.

Licensing & Compatibility

  • Released under the Apache 2.0 license.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The README lists numerous models and techniques, but specific installation instructions, hardware requirements, and detailed benchmarks for each are not consolidated in one place, potentially requiring users to consult individual model papers or documentation.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.