AliceMind  by alibaba

Collection of pre-trained encoder-decoder models and optimization techniques

Created 4 years ago
2,048 stars

Top 21.7% on SourcePulse

GitHubView on GitHub
Project Summary

AliceMind is a comprehensive collection of pre-trained encoder-decoder models and optimization techniques from Alibaba's MinD Lab, targeting researchers and developers in NLP and multimodal AI. It offers a wide array of models for tasks spanning text, image, and video understanding and generation, alongside efficient fine-tuning and compression methods.

How It Works

AliceMind provides a modularized foundation for large multimodal language models (LMMs), enabling modal collaboration. Its models are pre-trained on large-scale datasets using both discriminative and generative objectives. Key innovations include parameter-efficient fine-tuning methods like ChildTuning and PST, and compression techniques like ContrastivePruning, all designed to enhance generalization and reduce resource requirements.

Quick Start & Requirements

  • Installation and usage are facilitated by the SOFA modeling toolkit, designed for easy distribution and access to AliceMind models.
  • Specific model requirements (e.g., GPU, CUDA versions) are not explicitly detailed in the README but are typical for large language models.
  • Links to official resources: AliceMind Official Website, AliceMind Open Platform.

Highlighted Details

  • Features mPLUG-Owl2, a multimodal LLM for LLM/MLLM collaboration, accepted by CVPR 2024.
  • Includes mPLUG-DocOwl, an OCR-free multimodal LLM for document understanding, accepted by EMNLP 2023.
  • Offers Youku-mPLUG, a large Chinese video-language dataset and model.
  • Provides a diverse range of models including PLUG (Chinese LLM), mPLUG-2 (multimodal), SDCUP (table understanding), LatticeBERT (Chinese multi-granularity), StructuralLM (document-image), StructVBERT (vision-language), VECO (cross-lingual), PALM (NLG), and StructBERT (NLU).

Maintenance & Community

  • Active development with recent updates and publications (CVPR 2024, EMNLP 2023, ICML 2023).
  • Support is available via GitHub issues. A DingTalk group (ID: 35738533) is provided for user interaction. Business inquiries can be directed to nlp-support@list.alibaba-inc.com.

Licensing & Compatibility

  • Released under the Apache 2.0 license.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The README lists numerous models and techniques, but specific installation instructions, hardware requirements, and detailed benchmarks for each are not consolidated in one place, potentially requiring users to consult individual model papers or documentation.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 4 months ago
Feedback? Help us improve.