Example Notebooks

Skim_Features_Visualize_Queries_Distributions

`medaprep.skim` and `medaprep.visualize` modules to investigate Sentinel-2 data¶

In this notebook, we will show how using medaprep.skim.features,medaprep.visualize.query, and medaprep.visualize.distributions can help prepare and visualize large sets of data. We will load a set of Sentinel-2 imagery over Austin, TX and quickly retreive statistics about the features within to assist with downstream processing. We will use the following process:

Initialize Dask cluster to load xarray Dataset from a public S3 bucket.
visualize.query bounding boxes to check geographic coverage.
skim.features from the dataset for a quick set of summary statistics.
visualize.distributions of variables within the dataset.

In [1]:

from bokeh.io import show, output_notebook
from bokeh.layouts import layout
import dask.distributed
import folium
import geopandas as gpd
from IPython.display import HTML, display
from odc.stac import configure_rio, stac_load
import pandas as pd
from medaprep import skim, visualize
from pystac_client import Client
import shapely.geometry
import xarray

client = dask.distributed.Client()
configure_rio(cloud_defaults=True, aws={"aws_unsigned": True}, client=client)

/Users/dylan/opt/anaconda3/envs/odc-stac/lib/python3.10/site-packages/distributed/node.py:179: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 49607 instead
  warnings.warn(

Load Sentinel-2 data from public STAC catalog¶

The following cell loads an xarray Dataset containing Sentinel-2 imagery from a STAC catalog. We initialize the center point of our query to be the center of Austin, TX and search for imagery from July, 2022 within a 100 km radius.

In [2]:

config = {
    "sentinel-s2-l2a-cogs": {
        "assets": {
            "*": {"data_type": "uint16", "nodata": 0},
            "SCL": {"data_type": "uint8", "nodata": 0},
            "visual": {"data_type": "uint8", "nodata": 0},
        },
        "aliases": {"red": "B04", "green": "B03", "blue": "B02"},
    },
    "*": {"warnings": "ignore"},
}

km2deg = 1.0 / 111

x, y = (-97.744, 30.266)

r = 100 * km2deg

bbox = (x - r, y - r, x + r, y + r)

catalog = Client.open("https://earth-search.aws.element84.com/v0")

query = catalog.search(
    collections=["sentinel-s2-l2a-cogs"], datetime="2022-07-01/2022-07-31", limit=100, bbox=bbox
)

items = list(query.get_items())
stac_json = query.get_all_items_as_dict()

crs = "epsg:3857"
zoom = 2**5

data = stac_load(
    items,
    crs=crs,
    resolution=10*zoom,
    chunks={},
    groupby="solar_day",
    stac_cfg=config,
)
data

Out[2]:

<xarray.Dataset>
Dimensions:      (y: 1142, x: 1137, time: 12)
Coordinates:
  * y            (y) float64 3.716e+06 3.715e+06 ... 3.351e+06 3.351e+06
  * x            (x) float64 -1.102e+07 -1.102e+07 ... -1.066e+07 -1.066e+07
    spatial_ref  int32 3857
  * time         (time) datetime64[ns] 2022-07-03T17:25:22 ... 2022-07-30T17:...
Data variables: (12/16)
    visual       (time, y, x) uint8 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    B01          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    B02          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    B03          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    B04          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    B05          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    ...           ...
    B09          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    B11          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    B12          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    AOT          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    WVP          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
    SCL          (time, y, x) uint8 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>

`visualize` the bounding boxes of the input query and the returned data using `visualize.query`¶

visualize.query returns a visualization of the input and returned bounding boxes.

In [3]:

gdf = gpd.GeoDataFrame.from_features(stac_json, 3857)

bbox2 = (gdf.bounds['minx'].min(),gdf.bounds['miny'].min(),gdf.bounds['maxx'].max(),gdf.bounds['maxy'].max())
bbox3 = tuple([b+2 for b in bbox2])

visualize.query(bbox=[bbox,bbox2],
                    name=["Query","Returned"],
                    folium_map=folium.Map(),
                    color=["red","green"])

Out[3]:

Make this Notebook Trusted to load map: File -> Trust Notebook

`skim` through `features` of the data using `skim.features`¶

skim.features returns basic statistics and properties of an xarray Dataset.

In [4]:

small_data = data.sel(time='2022-07-03').compute()

df = skim.features(small_data)
df

Out[4]:

	variables	data_types	NaNs	mean	std	maximums
0	visual	uint8	False	83.206896	85.090139	255
1	B01	uint16	False	694.111545	1057.890177	12671
2	B02	uint16	False	715.116638	1019.690505	11615
3	B03	uint16	False	831.311709	1033.465046	10787
4	B04	uint16	False	897.211163	1064.686861	10284
5	B05	uint16	False	1144.481014	1258.273113	11399
6	B06	uint16	False	1453.429606	1447.058901	10691
7	B07	uint16	False	1585.919234	1543.215470	10361
8	B08	uint16	False	1592.048660	1549.807719	10519
9	B8A	uint16	False	1737.786114	1662.727114	10352
10	B09	uint16	False	2445.116097	3098.090223	15918
11	B11	uint16	False	1869.718284	1790.813024	8431
12	B12	uint16	False	1358.351765	1357.018681	8239
13	AOT	uint16	False	59.987095	57.874945	173
14	WVP	uint16	False	1847.768434	1681.527584	6201
15	SCL	uint8	False	3.550229	3.664158	10

`visualize` the distributions of the variables using `visualize.distributions`¶

visualize.distributions returns a list of bokeh figures containing estimated distributions of the variables.

In [5]:

output_notebook()
show(layout(visualize.distributions(small_data, df, 500)))

Loading BokehJS ...

Example Notebooks

medaprep.skim and medaprep.visualize modules to investigate Sentinel-2 data¶

Load Sentinel-2 data from public STAC catalog¶

visualize the bounding boxes of the input query and the returned data using visualize.query¶

skim through features of the data using skim.features¶

visualize the distributions of the variables using visualize.distributions¶

`medaprep.skim` and `medaprep.visualize` modules to investigate Sentinel-2 data¶

`visualize` the bounding boxes of the input query and the returned data using `visualize.query`¶

`skim` through `features` of the data using `skim.features`¶

`visualize` the distributions of the variables using `visualize.distributions`¶