KataCR by wty-yy

Non-embedded AI for Clash Royale using RL and CV

Created 2 years ago

382 stars

Top 74.8% on SourcePulse

Project Summary

This project provides a non-embedded AI agent for the mobile game Clash Royale, utilizing Reinforcement Learning (RL) and Computer Vision (CV). It aims to create an intelligent agent that operates solely by processing screen input from a mobile device, making it suitable for researchers and advanced players interested in AI game agents.

How It Works

The AI employs a multi-stage approach: generative dataset construction for object recognition, YOLOv8 for object detection, and offline RL for decision-making. It processes video streams from a mobile device, identifying game elements like cards and elixir levels using CV models (ResNet for classification, YOLOv8 for detection). These perceptions are then fed into RL models (StARformer, DT) to predict and execute actions.

Quick Start & Requirements

Installation: Requires miniforge for environment management. Create an environment (conda create -n katacr python==3.11), install CUDA (e.g., conda install -c conda-forge cudatoolkit=11.8 cudnn=8.9), JAX with CUDA support (e.g., pip install "jax[cuda11]==0.4.25 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html"), PyTorch 2.2.2, PaddlePaddle 2.6.1, and other dependencies via pip install -r requirements.txt.
Prerequisites: Linux OS (Ubuntu 24.04 LTS recommended) is mandatory due to V4L2 dependency for mobile video streams and JAX's CUDA support. An NVIDIA GPU is required.
Configuration: Screen resolution and aspect ratio must be configured in constant.py if not 1080x2400.
Resources: Setup involves installing multiple ML frameworks and CUDA. Model inference time is reported as ~120ms for decision-making and ~240ms for feature fusion.

Highlighted Details

Uses a combination of YOLOv8 models for object detection, trained on a custom dataset.
Implements three decision-making models: continuous action prediction (with delay), discrete action prediction (no delay), and continuous action prediction for all cards.
Supports offline dataset creation from video replays using OCR and frame extraction.
Includes tools for dataset generation, model training (YOLOv8, RL), validation, and inference.

Maintenance & Community

The project is presented as undergraduate thesis code. Links to Bilibili for demonstration videos and GitHub for the detection dataset are provided. No explicit community channels (Discord, Slack) or roadmap are mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. The code is presented as undergraduate thesis work, implying potential restrictions on commercial use or derivative works without explicit permission.

Limitations & Caveats

The project is tied to specific hardware configurations and screen resolutions, requiring manual adjustments for different setups. The dependency on Linux and specific CUDA versions can be a barrier. Some components (CRNN, original YOLOv5 implementation) have been deprecated in favor of newer versions.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days