OmAgent by om-ai-lab

Python library for multimodal agent building

Created 1 year ago

2,621 stars

Top 17.7% on SourcePulse

Project Summary

OmAgent is a Python library designed for building multimodal language agents, targeting developers and researchers who need to prototype and deploy agents capable of processing text, image, video, and audio inputs. It simplifies complex agent engineering by abstracting worker orchestration and task queues, offering reusable agent components and native multimodal support.

How It Works

OmAgent employs a graph-based workflow orchestration engine with various memory types for contextual reasoning. Its core advantage lies in its native multimodal interaction capabilities, including support for Vision-Language Models (VLMs), real-time APIs, computer vision models, and mobile device connections. This approach allows agents to go beyond text-based reasoning, incorporating diverse data modalities.

Quick Start & Requirements

Install via pip: pip install omagent-core
Requires Python >= 3.10.
Configuration involves generating a container.yaml file and setting LLM configurations (e.g., OpenAI API key via environment variables).
Demo execution: cd examples/step1_simpleVQA && python run_webpage.py
Documentation: https://om-ai-lab.github.io/OmAgent/

Highlighted Details

Supports local model deployment via Ollama or LocalAI.
Offers a fully distributed architecture with custom scaling and a Lite mode.
Includes state-of-the-art agent algorithms like ReAct, CoT, and SC-Cot.
Benchmarks show SC-COT achieving 73.69 average score on gsm8k and 67.32 on AQuA with gpt-3.5-turbo.

Maintenance & Community

Active community engagement is encouraged via X and Discord.
Related research includes papers on detection generalization and multimodal pre-training.

Licensing & Compatibility

The library's license is not explicitly stated in the README.

Limitations & Caveats

The README does not specify the license, which could impact commercial use or closed-source integration.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

19 stars in the last 30 days

Explore Similar Projects

kolosal-cli by KolosalAI

AI command-line workflow tool for developers

Created 6 months ago

Updated 2 weeks ago

project_alice by MarianoMolina

Agentic workflow framework for creating and deploying AI agents

Created 1 year ago

Updated 11 months ago

Starred by

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow) and

Rodrigo Nader

Rodrigo Nader(Cofounder of Langflow).

agentchain by jina-ai

Agentic framework for complex task automation

Created 2 years ago

Updated 2 years ago

Starred by

Harrison Chase

Harrison Chase(Founder of LangChain).

langchain-code by zamalali

Unified AI coding assistant CLI

Created 4 months ago

Updated 1 month ago

LLMTornado by lofcz

.NET SDK for building advanced AI systems

Created 2 years ago

Updated 21 hours ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Joe Walnes

Joe Walnes(Head of Experimental Projects at Stripe), and

4 more.

lsp-ai by SilasMarvin

LSP server for AI-powered code-editor functionality

Created 2 years ago

Updated 1 year ago

flock by Onelevenvy

Low-code platform for chatbot, RAG, and multi-agent app dev

Created 1 year ago

Updated 4 months ago

Youtube_demos by yeyu2

Collection of demos for multimodal AI applications

Created 2 years ago

Updated 1 week ago

agent-hub by DjangoPeng

AI Agent hub for enterprise workflows and multimodal interaction

Created 1 year ago

Updated 1 year ago

Starred by

Michael Han

Michael Han(Cofounder of Unsloth),

Tim Suchanek

Tim Suchanek(Founder of expand.ai), and

7 more.

cactus by cactus-compute

Framework for on-device AI, targeting mobile and wearables

Created 8 months ago

Updated 19 hours ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen), and

2 more.

inference by xorbitsai

Model serving library for language, speech, and multimodal models

Created 2 years ago

Updated 1 day ago

PaddleFormers by PaddlePaddle

Pre-trained model toolkit for multimodal AI workflows

Created 7 years ago

Updated 1 day ago

Feedback? Help us improve.