Seeing the Roads Through the Trees: A Benchmark for Modeling Spatial Dependencies with Aerial Imagery
Caleb Robinson1 · Isaac Corley2 · Anthony Ortiz1 · Rahul Dodhia1 · Juan M. Lavista Ferres1 · Peyman Najafirad (Paul Rad)2
1Microsoft AI for Good Research Lab 2University of Texas at San Antonio
Figure 1. Example images and labels from the dataset. Labels are shown over the corresponding NAIP aerial imagery with the "Road" class colored in blue and the "Tree Canopy over Road" class in red.
We introduce a novel remote sensing dataset for evaluating a geospatial machine learning model's ability to learn long range dependencies and spatial context understanding. We create a task to use as a proxy for this by training models to extract roads which have been broken into disjoint pieces due to tree canopy occluding large portions of the road.
The dataset consists of 30,000 RGBN NAIP images and land cover annotations from the Chesapeake Conservacy containing significant amounts of the "Tree Canopy Over Road" category.
Models are trained to perform semantic segmentation to extract roads from the background but are additionally evaluated by how they perform on the "Tree Canopy Over Road" class. Furthermore, we weight each "Tree Canopy Over Road" pixel based on the L1 distance to the nearest "Road" pixel resulting in a distance-weighted recall (DWR) metric which we propose as a better proxy for long range modeling performance.
We have included the download_dataset.py
script that demonstrates how we created the aligned NAIP / land cover patches. This script uses the pre-sampled locations in data/patches.gpkg
and the Maryland land cover dataset from here (it expects the data/md_lc_2018_2022-Edition/md_lc_2018_2022-Edition.tif
to exist).
We provide a train.py
script for reproducing experiments in the paper.
See below for train.py
usage and arguments:
usage: train.py [-h] [--batch_size BATCH_SIZE] [--model {deeplabv3+,fcn,custom_fcn,unet,unet++}] [--num_epochs NUM_EPOCHS]
[--num_filters NUM_FILTERS]
[--backbone {resnet18,resnet34,resnet50,resnet101,resnet152,resnext50_32x4d,resnext101_32x8d}] [--lr LR] [--tmax TMAX]
[--experiment_name EXPERIMENT_NAME] [--gpu_id GPU_ID] [--root_dir ROOT_DIR]
Train a semantic segmentation model.
options:
-h, --help show this help message and exit
--batch_size BATCH_SIZE
Size of each mini-batch.
--model {deeplabv3+,fcn,custom_fcn,unet,unet++}
Model architecture to use.
--num_epochs NUM_EPOCHS
Number of epochs to train for.
--num_filters NUM_FILTERS
Number of filters to use with FCN models.
--backbone {resnet18,resnet34,resnet50,resnet101,resnet152,resnext50_32x4d,resnext101_32x8d}
Backbone architecture to use.
--lr LR Learning rate to use for training.
--tmax TMAX Cycle size for cosine lr scheudler.
--experiment_name EXPERIMENT_NAME
Name of the experiment to run.
--gpu_id GPU_ID GPU ID to use (defaults to all GPUs if none).
--root_dir ROOT_DIR Root directory of the dataset.
We provide an eval.py
script for evaluating a pretrained checkpoint on the test set. The notebooks
directory contains jupyter notebooks for reproducing the figures.
See below for eval.py
usage and arguments:
usage: eval.py [-h] --model_fn MODEL_FN [--three_class] [--gpu GPU] [--eval_set {test,val}] [--quiet]
options:
-h, --help show this help message and exit
--model_fn MODEL_FN Model checkpoint to load
--three_class Whether to use three classes metrics
--gpu GPU GPU to use for inference (default: 0)
--eval_set {test,val}
Which set to run over
--quiet Whether to use TQDM progress bar
If you use this dataset in your work please cite our paper:
@misc{robinson2024seeingroadstreesbenchmark,
title={Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery},
author={Caleb Robinson and Isaac Corley and Anthony Ortiz and Rahul Dodhia and Juan M. Lavista Ferres and Peyman Najafirad},
year={2024},
eprint={2401.06762},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2401.06762},
}