AliceMind by alibaba

Collection of pre-trained encoder-decoder models and optimization techniques

Created 4 years ago

2,048 stars

Top 21.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Meng Zhang

Cofounder of TabbyML

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

AliceMind is a comprehensive collection of pre-trained encoder-decoder models and optimization techniques from Alibaba's MinD Lab, targeting researchers and developers in NLP and multimodal AI. It offers a wide array of models for tasks spanning text, image, and video understanding and generation, alongside efficient fine-tuning and compression methods.

How It Works

AliceMind provides a modularized foundation for large multimodal language models (LMMs), enabling modal collaboration. Its models are pre-trained on large-scale datasets using both discriminative and generative objectives. Key innovations include parameter-efficient fine-tuning methods like ChildTuning and PST, and compression techniques like ContrastivePruning, all designed to enhance generalization and reduce resource requirements.

Quick Start & Requirements

Installation and usage are facilitated by the SOFA modeling toolkit, designed for easy distribution and access to AliceMind models.
Specific model requirements (e.g., GPU, CUDA versions) are not explicitly detailed in the README but are typical for large language models.
Links to official resources: AliceMind Official Website, AliceMind Open Platform.

Highlighted Details

Features mPLUG-Owl2, a multimodal LLM for LLM/MLLM collaboration, accepted by CVPR 2024.
Includes mPLUG-DocOwl, an OCR-free multimodal LLM for document understanding, accepted by EMNLP 2023.
Offers Youku-mPLUG, a large Chinese video-language dataset and model.
Provides a diverse range of models including PLUG (Chinese LLM), mPLUG-2 (multimodal), SDCUP (table understanding), LatticeBERT (Chinese multi-granularity), StructuralLM (document-image), StructVBERT (vision-language), VECO (cross-lingual), PALM (NLG), and StructBERT (NLU).

Maintenance & Community

Active development with recent updates and publications (CVPR 2024, EMNLP 2023, ICML 2023).
Support is available via GitHub issues. A DingTalk group (ID: 35738533) is provided for user interaction. Business inquiries can be directed to nlp-support@list.alibaba-inc.com.

Licensing & Compatibility

Released under the Apache 2.0 license.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The README lists numerous models and techniques, but specific installation instructions, hardware requirements, and detailed benchmarks for each are not consolidated in one place, potentially requiring users to consult individual model papers or documentation.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days