MiniCPM-CookBook by OpenBMB

Small language models for edge deployment

Created 1 year ago

302 stars

Top 88.6% on SourcePulse

Project Summary

This repository serves as a comprehensive user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) from ModelBest. It targets developers and researchers looking to deploy, fine-tune, and apply these lightweight, high-performance models on edge devices, offering capabilities that rival larger models.

How It Works

The MiniCPM series focuses on achieving exceptional performance on edge devices through efficient architecture and optimization techniques. The models are designed for low-resource environments, enabling applications on smartphones, computers, and other smart terminals. The guide details various deployment methods, including transformers, vLLM, llama.cpp, and MLX, catering to diverse hardware and software stacks.

Quick Start & Requirements

Installation: Primarily involves cloning the repository and following specific deployment instructions for chosen models and frameworks (e.g., transformers, vllm, llama.cpp).
Prerequisites: Varies by deployment method; common requirements include Python, specific CUDA versions for GPU acceleration, and potentially libraries like transformers, vllm, llama-cpp-python, or mlx. Hardware support includes GPU, CPU, and NPU across various operating systems (Linux, macOS, Windows, Android, iOS).
Resources: Model sizes range from 1.2B to 4B parameters, with specific hardware requirements (e.g., 4GB VRAM for RAG with Langchain) detailed within the guide.
Links: MiniCPM Repo, MiniCPM-V Repo, Knowledge Base, English Readme, Discord, WeChat Group.

Highlighted Details

Comprehensive guides for inference, quantization (AWQ, GGUF, GPTQ, BNB), fine-tuning (SFT, RLHF), and multimodal applications.
Demonstrates advanced use cases like RAG, function calling, agent construction, and real-time video/image understanding.
Supports a wide array of hardware including GPUs, CPUs, and NPUs across multiple operating systems and mobile platforms.
Provides technical reports detailing model architecture, attention mechanisms, and decoding principles.

Maintenance & Community

The project is a collaborative effort involving ModelBest, OpenBMB, and Tsinghua NLP Lab. It actively encourages community contributions for tutorials, usage experiences, and ecosystem adaptations. Links to Discord and WeChat groups are provided for community engagement.

Licensing & Compatibility

The repository itself is open-source, but specific model licenses should be verified. Compatibility for commercial use or closed-source linking depends on the individual model licenses, which are not explicitly detailed in the README.

Limitations & Caveats

While the guide covers numerous deployment scenarios, users must consult specific model documentation for detailed hardware requirements and potential performance variations. Some advanced features or specific model versions might require particular software versions or configurations.

MiniCPM-CookBook by OpenBMB

Explore Similar Projects

vLLM-Kunlun by baidu

binary-mlc-llm-libs by mlc-ai

LightCompress by ModelTC

embedded-ai.bi-weekly by ysh329

InferLLM by MegEngine

FlagScale by flagos-ai

GPTQModel by ModelCloud

CTranslate2 by OpenNMT

chitu by thu-pacman

FastDeploy by PaddlePaddle

airllm by lyogavin

ktransformers by kvcache-ai