cloudless by BradNeuberg

Deep learning pipeline for orbital satellite imagery analysis

Created 10 years ago

301 stars

Top 88.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Eric Jang

VP AI at 1X

Project Summary

This project provides a deep learning pipeline for detecting clouds in orbital satellite imagery, primarily targeting users and startups like Planet Labs who need to pre-process vast amounts of data. It enables automated cloud detection and localization, facilitating downstream tasks such as change detection and deforestation monitoring, and is adaptable for other satellite imagery analysis tasks.

How It Works

The system comprises three core components: an annotation tool for bootstrapping training data by drawing bounding boxes on satellite images, a training pipeline that fine-tunes an AlexNet model using Caffe on GPU instances (e.g., AWS EC2) with annotated data, and a bounding box inference system that applies the trained model to new imagery to identify and delineate clouds. This modular approach allows for customization for various satellite detection challenges beyond cloud identification.

Quick Start & Requirements

Annotation Tool: Requires brew install gdal, virtualenv, virtualenvwrapper, and pip install -r requirements.txt within a dedicated virtual environment (annotate-django). Data import involves downloading Planet Labs imagery via download_planetlabs.py and populating a database with populate_db.
Training Pipeline: Requires Caffe installation with Python bindings and pip install -r requirements.txt in the root directory. The PYTHONPATH must include Caffe's Python bindings and ./src. Data preparation involves creating LevelDB files using prepare_data.py, followed by training with train.py.
Bounding Box/Inference System: Requires a Python 2.7 fork of Selective Search and Caffe. CAFFE_HOME and SELECTIVE_SEARCH environment variables must be set.
Prerequisites: GPU (for training and inference), CUDA, Planet Labs API key, AWS account and configured EC2/S3 for distributed training. Raw satellite imagery is not provided due to copyright.
Links: Technical report available in the announcement blog post.

Highlighted Details

Achieves 89.69% accuracy and a 0.91 F1 score with a fine-tuned AlexNet model.
Leverages pre-trained BVLC AlexNet model from ImageNet.
The pipeline is designed to be generalizable to other satellite detection tasks with minor modifications.
Includes detailed scripts for data preparation, training, testing, and inference, with options for AWS integration.

Maintenance & Community

Contributors include Brad Neuberg, Johann Hauswald, and Max Nova. Parts of the project originated from Dropbox's Hack Week. It is released as version 1.0, with special thanks to Dropbox and Planet Labs. No specific community channels (like Discord/Slack) or roadmap are detailed in the README.

Licensing & Compatibility

The project is available under the Apache 2.0 license. While the code is permissively licensed, the use of Planet Labs data is subject to their ownership, and raw imagery is not publicly available.

Limitations & Caveats

The setup process is complex, requiring specific environment configurations, including Caffe, Python 2.7 for a critical dependency (Selective Search fork), and potentially AWS infrastructure for efficient training. Raw training data is not included due to copyright restrictions, necessitating users to acquire and prepare their own data.

Health Check

Last Commit

9 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days