streetlearn by google-deepmind

Street navigation environment for agent training, based on Street View images

Created 7 years ago

315 stars

Top 85.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Aravind Srinivas

Cofounder of Perplexity

Project Summary

This repository provides the StreetLearn C++ engine and Python environment for training navigation agents in real-world Google Street View imagery. It enables research into goal-driven navigation, instruction following, and spatial reasoning for AI agents, building upon prior work in deep reinforcement learning.

How It Works

StreetLearn utilizes a C++ engine to efficiently load, cache, and project Google Street View panoramas. It models the urban environment as a graph of interconnected panoramas, allowing agents to navigate by moving between adjacent views. The Python interface exposes this engine, adhering to OpenAI Gym specifications, and supports various game modes like coin collection, goal-based navigation, and instruction following. Agents are trained using TensorFlow, leveraging architectures like IMPALA for scalable distributed deep reinforcement learning.

Quick Start & Requirements

Installation: Requires Bazel (version up to 0.24.0) for building. Detailed compilation steps involve installing Protocol Buffers, CLIF, OpenCV 2.4.13, and Python dependencies.
Prerequisites: Ubuntu 18.04 tested. Requires g++, cmake, python-virtualenv, tensorflow-gpu, and pygame.
Dataset: Google Street View panoramas must be requested separately from the StreetLearn project website (e.g., Manhattan, Pittsburgh datasets).
Running: Use bazel run commands to launch agents (e.g., bazel run streetlearn/python/ui:human_agent -- --dataset_path=<dataset_path>).
Documentation: Links to cited papers ([1], [2], [3], [4]) and the scalable_agent repository are provided for detailed understanding.

Highlighted Details

Supports multiple game modes: coin_game, courier_game, curriculum_courier_game, and various instruction-following games.
Offers rich observation spaces including view_image, graph_image, latlng, instructions, and ground_truth_direction.
Includes interactive agents (human_agent, oracle_agent) for testing and visualization.
The C++ engine is optimized for panorama projection and graph traversal.

Maintenance & Community

This project is associated with Google DeepMind. No specific community channels (like Discord/Slack) or active maintenance indicators are mentioned in the README.

Licensing & Compatibility

The project's core components and dependencies (like Abseil C++) are licensed under the Apache License. However, the use of Google Street View data may be subject to separate terms. Compatibility with closed-source applications is not explicitly detailed.

Limitations & Caveats

The build process is complex and has only been tested on Ubuntu 18.04, requiring specific versions of dependencies like Bazel and OpenCV. Obtaining and integrating the Street View datasets is a necessary prerequisite. The project is not an officially supported Google product.

Health Check

Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days