medaprep package

Subpackages

Submodules

medaprep.skim module

Skim module.

This module implements utility functions that provide high level details of raster data.

This module is part of the core medaprep library and is intended to be called by user code.

medaprep.skim.features(indata: Dataset) → DataFrame

This function returns a dataframe with information about the variables, data types, null values, means, standard deviations, maximums, and minimums for a given dataset.

Parameters: indata (xarray.Dataset) – datset to be skimmed.
Returns: table containing basic information about the dataset.
Return type: (pandas.DataFrame)

Example

>>> import numpy as np
>>> import pandas as pd
>>> import xarray as xr
>>> from medaprep import skim

>>> temp = 15 + 8 * np.random.randn(2, 2, 3)
>>> precip = 10 * np.random.rand(2, 2, 3)
>>> lon = [[-99.83, -99.32], [-99.79, -99.23]]
>>> lat = [[42.25, 42.21], [42.63, 42.59]]
>>> ds = xr.Dataset(
         {
             "temperature": (["x", "y", "time"], temp),
             "precipitation": (["x", "y", "time"], precip),
             },
         coords={
             "lon": (["x", "y"], lon),
             "lat": (["x", "y"], lat),
             "time": pd.date_range("2014-09-06", periods=3),
             "reference_time": pd.Timestamp("2014-09-05"),
             },
         )

>>> df = skim.features(ds)
>>> df
    variables       data_types  NaNs    mean    std     maximums    minimums
0   temperature     float64     False   14.3177 9.08339 30.3361     -7.76803
1   precipitation   float64     False   4.62568 3.03081 9.89768     0.147005

medaprep.skim.memory(indata: Dataset) → Series

This function uses utilities from pandas and dask (for dask-backed datasets) to check the memory size of the input dataset.

Parameters: indata (xarray.Dataset) – dataset to be skimmed.
Returns: series containing number of bytes for each column.
Return type: (pandas.Series)

Example

>>> print(data)
    <xarray.Dataset>
    Dimensions:      (time: 1, y: 1142, x: 1137)
    Coordinates:
        * y            (y) float64 3.716e+06 3.715e+06 ... 3.351e+06 3.351e+06
        * x            (x) float64 -1.102e+07 -1.102e+07 ... -1.066e+07 -1.066e+07
        spatial_ref  int32 3857
        * time         (time) datetime64[ns] 2022-07-03T17:25:22
    Data variables:
        visual       (time, y, x) uint8 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
        B01          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
>>> print(skim.memory(data))
Index          6576899
visual         1298454
B01            2596908
spatial_ref    5193816
dtype: int64

medaprep.visualize module

Visualize module for use with data processed by medaprep.

This module implements visualization functionality that enables displaying the results of data processing outputs from medaprep.

This module is part of the core medaprep library and is intended to be called by user code.

medaprep.visualize.distributions(indata: Dataset, skim_table: DataFrame, sample_size: int) → list

This function returns a list of bokeh figures containing estimated distributions of the variables within the dataset.

Parameters

indata (xarray.Dataset) – input data containing variables which distributions will be estimated from.
skim_table (pandas.DataFrame) – dataframe containing basic info about the dataset.

Returns

Bokeh figures containing estimated distributions of each variable.

medaprep.visualize.query(bbox: [tuple | list[tuple]], name: [str | list[str]], folium_map: Map, color: [str | list[str]]) → Map

query takes in a list of bounding boxes (bbox), a list of names corresponding to the bounding boxes (name), and a folium map (m). It adds the bounding boxes to the map (m) with the corresponding names, and colors the boxes based on the list of colors (color). It then sets the bounds of the map based on the largest provided bounding box, and returns the map.