On-Demand Earth System Data Cubes (ESDCs) in Python
On-Demand Earth System Data Cubes (ESDCs) in Python
GitHub: https://github.com/davemlz/cubo
Documentation: https://cubo.readthedocs.io/
PyPI: https://pypi.org/project/cubo/
Conda-forge: https://anaconda.org/conda-forge/cubo
Tutorials: https://cubo.readthedocs.io/en/latest/tutorials.html
Paper: https://arxiv.org/abs/2404.13105
SpatioTemporal Asset Catalogs (STAC) provide a standardized format that describes geospatial information. Multiple platforms are using this standard to provide clients several datasets. Nice platforms such as Planetary Computer use this standard. Additionally, Google Earth Engine (GEE) also provides a gigantic catalogue that users can harness for different tasks in Python.
cubo
is a Python package that provides users of STAC and GEE an easy way to create On-Demand Earth System Data Cubes (ESDCs). This is perfectly suitable for Deep Learning (DL) tasks. You can easily create a lot of ESDCs by just knowing a pair of coordinates and the edge size of the cube in pixels!
Check the simple usage of cubo
with STAC here:
import cubo
import xarray as xr
da = cubo.create(
lat=4.31, # Central latitude of the cube
lon=-76.2, # Central longitude of the cube
collection="sentinel-2-l2a", # Name of the STAC collection
bands=["B02","B03","B04"], # Bands to retrieve
start_date="2021-06-01", # Start date of the cube
end_date="2021-06-10", # End date of the cube
edge_size=64, # Edge size of the cube (px)
resolution=10, # Pixel size of the cube (m)
)
This chunk of code just created an xr.DataArray
object given a pair of coordinates, the edge size of the cube (in pixels), and additional information to get the data from STAC (Planetary Computer by default, but you can use another provider!). Note that you can also use the resolution you want (in meters) and the bands that you require.
Now check the simple usage of cubo
with GEE here:
import cubo
import xarray as xr
da = cubo.create(
lat=51.079225, # Central latitude of the cube
lon=10.452173, # Central longitude of the cube
collection="COPERNICUS/S2_SR_HARMONIZED", # Id of the GEE collection
bands=["B2","B3","B4"], # Bands to retrieve
start_date="2016-06-01", # Start date of the cube
end_date="2017-07-01", # End date of the cube
edge_size=128, # Edge size of the cube (px)
resolution=10, # Pixel size of the cube (m)
gee=True # Use GEE instead of STAC
)
This chunk of code is very similar to the STAC-based cubo code. Note that the collection
is now the ID of the GEE collection to use, and note that the gee
argument must be set to
True
.
The thing is super easy and simple.
stackstac
and pystac_client
the cube is retrieved as a xr. DataArray
. In the case of GEE, the cube is retrieved
via xee
.cubo
is doing for you, and you just need to provide the coordinates, the edge size, and the additional info to get the cube.Install the latest version from PyPI:
pip install cubo
Install cubo
with the required GEE dependencies from PyPI:
pip install cubo[ee]
Upgrade cubo
by running:
pip install -U cubo
Install the latest version from conda-forge:
conda install -c conda-forge cubo
Install the latest dev version from GitHub by running:
pip install git+https://github.com/davemlz/cubo
create()
cubo
is pretty straightforward, everything you need is in the create()
function:
da = cubo.create(
lat=4.31,
lon=-76.2,
collection="sentinel-2-l2a",
bands=["B02","B03","B04"],
start_date="2021-06-01",
end_date="2021-06-10",
edge_size=64,
resolution=10,
)
edge_size
By default, the units of edge_size
are pixels. But you can modify this using the units
argument:
da = cubo.create(
lat=4.31,
lon=-76.2,
collection="sentinel-2-l2a",
bands=["B02","B03","B04"],
start_date="2021-06-01",
end_date="2021-06-10",
edge_size=1500,
units="m",
resolution=10,
)
[!TIP] You can use "px" (pixels), "m" (meters), or any unit available in
scipy.constants
.
da = cubo.create(
lat=4.31,
lon=-76.2,
collection="sentinel-2-l2a",
bands=["B02","B03","B04"],
start_date="2021-06-01",
end_date="2021-06-10",
edge_size=1.5,
units="kilo",
resolution=10,
)
By default, cubo
uses Planetary Computer. But you can use another STAC provider endpoint if you want:
da = cubo.create(
lat=4.31,
lon=-76.2,
collection="sentinel-s2-l2a-cogs",
bands=["B05","B06","B07"],
start_date="2020-01-01",
end_date="2020-06-01",
edge_size=128,
resolution=20,
stac="https://earth-search.aws.element84.com/v0"
)
You can pass kwargs
to pystac_client.Client.search()
if required:
da = cubo.create(
lat=4.31,
lon=-76.2,
collection="sentinel-2-l2a",
bands=["B02","B03","B04"],
start_date="2021-01-01",
end_date="2021-06-10",
edge_size=64,
resolution=10,
query={"eo:cloud_cover": {"lt": 10}} # kwarg to pass
)
The project is licensed under the MIT license.
If you use this work, please consider citing the following paper:
@misc{montero2024ondemand,
title={On-Demand Earth System Data Cubes},
author={David Montero and César Aybar and Chaonan Ji and Guido Kraemer and Maximilian Söchting and Khalil Teber and Miguel D. Mahecha},
year={2024},
eprint={2404.13105},
archivePrefix={arXiv},
primaryClass={cs.DB}
}
The logo and images were created using dice icons created by Freepik - Flaticon.