UForm  by unum-cloud

Multimodal AI library for content understanding and generation

Created 2 years ago
1,188 stars

Top 32.8% on SourcePulse

GitHubView on GitHub
Project Summary

UForm is a compact, multimodal AI library designed for efficient content understanding and generation across text, images, and video. It targets developers and researchers needing to deploy AI capabilities on diverse platforms, from servers to smartphones, offering significant speedups and reduced resource footprints compared to larger models.

How It Works

UForm leverages custom-trained, compact transformer models. Its embedding models utilize Matryoshka-style embeddings, allowing for flexible dimensionality (64-768) and efficient retrieval. Generative models are built on efficient architectures like Qwen and LLaMA, enabling tasks such as chat, image captioning, and Visual Question Answering (VQA). The library emphasizes portability via ONNX and native support for quantization (f32 to i8, b1) and embedding slicing, integrating tightly with SimSIMD for numerical operations and USearch for vector indexing.

Quick Start & Requirements

  • Install via pip: pip install uform
  • Requires Python 3.10+.
  • Generative models require transformers and torch.
  • ONNX export is supported for broader deployment.
  • Official Python, JavaScript, and Swift documentation is available.

Highlighted Details

  • Offers 64-dimensional Matryoshka embeddings for fast search.
  • Claims 2-4x faster inference than competitors due to small model size.
  • Supports ONNX for cross-platform deployment and quantization (f32 to i8, b1).
  • Integrates with SimSIMD for up to 2500x speedup in numerical operations and USearch for high-throughput vector search.

Maintenance & Community

  • Developed by unum-cloud.
  • Links to Python, JavaScript, and Swift documentation are provided.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is marked with a ⚠️ emoji for the uform-gen model, suggesting potential instability or deprecation. Future support for video is indicated with 🔜 emojis. The license is not explicitly stated, which may pose a barrier for commercial adoption.

Health Check
Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Travis Fischer Travis Fischer(Founder of Agentic), and
5 more.

fromage by kohjingyu

0%
482
Multimodal model for grounding language models to images
Created 2 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

gill by kohjingyu

0.4%
468
Multimodal LLM for generating/retrieving images and generating text
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
1 more.

InternGPT by OpenGVLab

0.0%
3k
Interactive demo platform for showcasing AI models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.