Discover and explore top open-source AI tools and projects—updated daily.
BBufAgent-ready playbooks for AI infrastructure operations
Top 70.4% on SourcePulse
AI-Infra-Auto-Driven-SKILLS provides agent-ready playbooks for AI infrastructure engineers to automate LLM serving benchmarks, profiler triage, SGLang optimization, production incident debugging, and model PR intelligence. It equips agents with operational memory to perform complex tasks, aiming to reduce manual effort in performance tuning and incident resolution.
How It Works
This repository offers a collection of focused "skills" or playbooks designed for AI agents. The core approach emphasizes automation for critical AI infrastructure tasks. Key differentiators include a stage-separated profiler workflow that isolates prefill and decode evidence, a framework-neutral benchmark schema for consistent comparisons across serving frameworks (SGLang, vLLM, TensorRT-LLM), and a replay-first incident triage methodology that prioritizes evidence preservation and reproduction before code changes.
Quick Start & Requirements
Installation involves copying desired skills directly into an agent's skill directory (e.g., cp -r skills/llm-serving-auto-benchmark <agent-skill-dir>/llm-serving-auto-benchmark). No specific software prerequisites are detailed beyond the need for an agent environment capable of executing these Python-based skills. The H100 operator runbooks require specific remote environment configuration, including SSH aliases, container names, and workspace paths.
Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (like Discord or Slack), sponsorships, or roadmaps are present in the provided README.
Licensing & Compatibility
The README does not explicitly state a software license. This omission presents a significant caveat for potential adoption, especially for commercial use or integration into closed-source projects.
Limitations & Caveats
The H100-specific skills necessitate careful configuration of remote environments and adherence to security practices for handling secrets. The absence of a declared license is a primary limitation for widespread or commercial adoption.
1 day ago
Inactive
aisa-group
alibaba
huggingface