mllm_interview_note by wdndev

Multimodal LLM technical notes and resources

Created 2 years ago

268 stars

Top 95.9% on SourcePulse

Project Summary

This repository compiles essential knowledge for Multimodal Large Language Model (MLLM) algorithm and application engineers. It serves as a curated collection of concepts, research papers, and practical techniques, aimed at aiding engineers in understanding and applying MLLMs, particularly in interview preparation and low-resource environments.

How It Works

The project aggregates and organizes information on key MLLM topics, including foundational concepts, specific model architectures like Qwen VL, and advanced techniques such as fine-tuning with LoRA. It also details cutting-edge developments like Sora, referencing relevant papers and preparation steps. A related project, tiny-llm-zh, demonstrates building small-parameter Chinese LLMs for hands-on practice in resource-constrained settings.

Quick Start & Requirements

Online Reading: Access compiled notes via the provided online reading link.
Demo Experience: Interact with a deployed small-parameter LLM at ModeScope Tiny LLM.
Prerequisites: Primarily requires an interest in MLLMs and potentially resources for practical implementation if following fine-tuning guides.

Highlighted Details

In-depth coverage of Sora, including technical principles, the transformers_diffusion paper, and training preparation.
Detailed notes on MLLM papers, specifically "From Visual Representation to Multimodal Large Models" and the Qwen VL model.
Practical guidance on fine-tuning MLLMs, focusing on the LoRA (Low-Rank Adaptation) technique.
Development and deployment of tiny-llm-zh for low-resource LLM experimentation.

Maintenance & Community

The project is maintained by the author, who welcomes feedback and corrections. Updates on LLM content and interview experiences are shared via a WeChat public account.

Licensing & Compatibility

The repository does not explicitly state a software license. Users should exercise caution regarding the use of any code or content, especially in commercial or closed-source applications, until licensing is clarified.

Limitations & Caveats

The content represents the author's personal compilation and understanding based on network resources. Answers and explanations are self-written and may contain inaccuracies or areas needing improvement, requiring user discretion and verification.

mllm_interview_note by wdndev

Explore Similar Projects

DreamLLM by RunpeiDong

LAMM by OpenGVLab

Awesome-Multimodal-LLM by HenryHZY

Awesome_Multimodel_LLM by Atomic-man007

Awesome-LLMs-meet-Multimodal-Generation by YingqingHe

ml_timeline by osanseviero

Awesome-Multimodal-Large-Language-Models by yfzhang114

intro-llm-code by intro-llm

TinyLLaVA_Factory by TinyLLaVA

LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing by ghimiresunil

ms-swift by modelscope

self-llm by datawhalechina