OmAgent  by om-ai-lab

Python library for multimodal agent building

created 1 year ago
2,534 stars

Top 18.9% on sourcepulse

GitHubView on GitHub
Project Summary

OmAgent is a Python library designed for building multimodal language agents, targeting developers and researchers who need to prototype and deploy agents capable of processing text, image, video, and audio inputs. It simplifies complex agent engineering by abstracting worker orchestration and task queues, offering reusable agent components and native multimodal support.

How It Works

OmAgent employs a graph-based workflow orchestration engine with various memory types for contextual reasoning. Its core advantage lies in its native multimodal interaction capabilities, including support for Vision-Language Models (VLMs), real-time APIs, computer vision models, and mobile device connections. This approach allows agents to go beyond text-based reasoning, incorporating diverse data modalities.

Quick Start & Requirements

  • Install via pip: pip install omagent-core
  • Requires Python >= 3.10.
  • Configuration involves generating a container.yaml file and setting LLM configurations (e.g., OpenAI API key via environment variables).
  • Demo execution: cd examples/step1_simpleVQA && python run_webpage.py
  • Documentation: https://om-ai-lab.github.io/OmAgent/

Highlighted Details

  • Supports local model deployment via Ollama or LocalAI.
  • Offers a fully distributed architecture with custom scaling and a Lite mode.
  • Includes state-of-the-art agent algorithms like ReAct, CoT, and SC-Cot.
  • Benchmarks show SC-COT achieving 73.69 average score on gsm8k and 67.32 on AQuA with gpt-3.5-turbo.

Maintenance & Community

  • Active community engagement is encouraged via X and Discord.
  • Related research includes papers on detection generalization and multimodal pre-training.

Licensing & Compatibility

  • The library's license is not explicitly stated in the README.

Limitations & Caveats

  • The README does not specify the license, which could impact commercial use or closed-source integration.
Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
65 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.