HPT by HyperGAI

Open multimodal LLM framework for vision-language tasks

Created 1 year ago

315 stars

Top 85.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

HPT (Hyper-Pretrained Transformers) is a multimodal LLM framework from HyperGAI, designed for vision-language understanding. It offers several open-source models, including HPT 1.5 Edge (<5B parameters) for edge devices and HPT 1.5 Air (8B parameters) built with Llama 3, both achieving competitive results on benchmarks like MMMU.

How It Works

HPT models are built by hyper-pretraining existing large language models with visual encoders. This approach leverages established LLM architectures (like Llama 3, Phi-3, Yi) and visual encoders (like SigLIP, CLIP) to create efficient and capable vision-language models. The framework focuses on achieving state-of-the-art performance on multimodal benchmarks with relatively smaller model sizes.

Quick Start & Requirements

Install via pip: pip install -r requirements.txt and pip install -e .
Download model weights from HuggingFace: git lfs install followed by git clone https://huggingface.co/HyperGAI/HPT1_5-Edge [Local Path]
Requires Python and PyTorch. Specific CUDA versions are not explicitly stated but are generally expected for GPU acceleration.
Demo: python demo/demo.py --image_path [Image] --text [Text] --model [Config]
Evaluation: torchrun --nproc-per-node=8 run.py --data [Dataset] --model [Config]
More details: technical blog post

Highlighted Details

HPT 1.5 Edge (<5B parameters) is optimized for edge devices.
HPT 1.5 Air (8B parameters) uses Llama 3 and claims state-of-the-art results among <10B models on benchmarks like MMMU, POPE, SEED-I.
HPT 1.0 Air is noted as a cost-effective solution for vision-and-language tasks.
Evaluation code is extended from the VLMEvalKit project.

Maintenance & Community

Contact: HPT@hypergai.com
Follow on Twitter and LinkedIn.
Website: HyperGAI

Licensing & Compatibility

Released under Apache 2.0 license.
Parts of the project may use code/models from other sources with their own licenses, potentially impacting commercial use.

Limitations & Caveats

The models do not have moderation mechanisms and provide no guarantees on results, requiring community engagement for guardrail implementation for real-world applications.

HPT by HyperGAI

Explore Similar Projects

MIC by HaozheZhao

Awesome-Multimodal-LLM by HenryHZY

Cheetah by DCDmllm

llava-phi by xmoanvaf

Bunny by BAAI-DCAI

TinyGPT-V by DLYuanGod

Aria by rhymes-ai

InternVL by OpenGVLab

prismatic-vlms by TRI-ML

DeepSeek-VL2 by deepseek-ai

LLaVA by haotian-liu

MiniGPT-4 by Vision-CAIR