DeepLearningVideoGames by nikitasrivatsan

Deep Q learning research paper for video game strategy

Created 10 years ago

1,095 stars

Top 34.8% on SourcePulse

View on GitHub

8 Experts Love This Project

Boris Cherny

Creator of Claude Code; MTS at Anthropic

Deshraj Yadav

Cofounder of Mem0

Deepak Pathak

Cofounder of Skild AI; Professor at CMU

William Falcon

Founder of Lightning AI

and 4 more!

Project Summary

This project implements Deep Q-Networks (DQN) to enable AI agents to learn strategies for playing video games like Pong and Tetris directly from raw pixel input. It targets researchers and developers interested in applying reinforcement learning to complex visual environments without prior game knowledge. The primary benefit is demonstrating human-level performance in Pong, showcasing the power and generalizability of deep learning for control tasks.

How It Works

The project utilizes a deep convolutional neural network (CNN) to approximate the action-value (Q) function. This Q-function estimates the expected future reward for taking a specific action in a given game state. The CNN processes raw pixel data, preprocessed into grayscale, resized, and stacked frames, to learn relevant features. Training employs Q-learning with experience replay and target networks, sampling minibatches from a memory of past transitions to stabilize learning and improve data efficiency.

Quick Start & Requirements

Install: Requires TensorFlow.
Prerequisites: Amazon Web Services G2 large instance (GPU recommended for efficient training).
Setup: Initial population of replay memory takes 50,000 time steps; linear annealing of epsilon over 500,000 frames. Training for Pong achieved good results after ~1.38 million time steps (~25 hours).
Links: Videos of DQN in action, Visualization of convolutional layers and Q function

Highlighted Details

Achieved better-than-human performance in Pong.
CNN architecture includes 8x8, 4x4, and 3x3 convolutional layers with max pooling, followed by fully connected layers.
Uses Adam optimizer with a learning rate of 0.000001.
Replay memory size of 500,000 observations.

Maintenance & Community

Based on the seminal work by Mnih et al. (2015).
Project appears to be a research demonstration rather than an actively maintained library.

Licensing & Compatibility

The README does not explicitly state a license. The underlying research paper is published in Nature.

Limitations & Caveats

Tetris implementation is still under development.
Max pooling might discard useful information; further parameter tuning is suggested.
Convergence speed may vary significantly based on reward frequency in different game genres.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days