The WorldStrat Dataset
This is the companion code repository for the WorldStrat dataset and its article, used to generate the dataset and train several super-resolution benchmarks on it. The associated article and datasheet for dataset is available on arXiv.
git clone https://github.com/worldstrat/worldstrat
.mamba env create -n worldstrat --file environment.yml
.Dataset Exploration
notebook using the worldstrat
environment.dataset
folder in the repository root (worldstrat/dataset
) and unpack the dataset there.Dataset Exploration
notebook, or any of the other notebooks, using the worldstrat
environment.Nearly 10,000 km² of free high-resolution satellite imagery of unique locations which ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities.
Those locations are also enriched with typically under-represented locations in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk.
Each high-resolution image (1.5 m/pixel) comes with multiple temporally-matched low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites (10 m/pixel).
We accompany this dataset with a paper, datasheet for datasets and an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox.
We hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop the same power of analysis allowed by costly private high-resolution imagery from free public low-resolution Sentinel2 imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution.
The main repository for this dataset is Zenodo. It contains:
Due to Kaggle's size limitation of ~107 GB, we've uploaded what we call the "core dataset" there, which consists of:
We used this core dataset to train the models we used as benchmarks in our paper, and which we distribute as pre-trained models.
We recommend starting with the downloading and unpacking the dataset, and using the Dataset Exploration
notebook to explore the data.
After that, you can also check out our source code which contains notebooks that demonstrate:
If you use this package or the associated dataset, please kindly cite these following BibTeX entries:
@misc{cornebise_open_2022,
title = {Open {{High-Resolution Satellite Imagery}}: {{The WorldStrat Dataset}} -- {{With Application}} to {{Super-Resolution}}},
author = {Cornebise, Julien and Or{\v s}oli{\'c}, Ivan and Kalaitzis, Freddie},
year = {2022},
month = jul,
number = {arXiv:2207.06418},
eprint = {2207.06418},
eprinttype = {arxiv},
publisher = {{arXiv}},
doi = {10.48550/arXiv.2207.06418},
archiveprefix = {arXiv}
}
@article{cornebise_worldstrat_zenodo_2022,
title = {The {{WorldStrat Dataset}}},
author = {Cornebise, Julien and Orsolic, Ivan and Kalaitzis, Freddie},
year = {2022},
month = jul,
journal = {Dataset on Zendodo},
doi = {10.5281/zenodo.6810792}
}