medaprep package

Subpackages

Submodules

medaprep.skim module

Skim module.

This module implements utility functions that provide high level details of raster data.

This module is part of the core medaprep library and is intended to be called by user code.

medaprep.skim.features(indata: Dataset) DataFrame

This function returns a dataframe with information about the variables, data types, null values, means, standard deviations, maximums, and minimums for a given dataset.

Parameters

indata (xarray.Dataset) – datset to be skimmed.

Returns

table containing basic information about the dataset.

Return type

(pandas.DataFrame)

Example

>>> import numpy as np
>>> import pandas as pd
>>> import xarray as xr
>>> from medaprep import skim

>>> temp = 15 + 8 * np.random.randn(2, 2, 3)
>>> precip = 10 * np.random.rand(2, 2, 3)
>>> lon = [[-99.83, -99.32], [-99.79, -99.23]]
>>> lat = [[42.25, 42.21], [42.63, 42.59]]
>>> ds = xr.Dataset(
         {
             "temperature": (["x", "y", "time"], temp),
             "precipitation": (["x", "y", "time"], precip),
             },
         coords={
             "lon": (["x", "y"], lon),
             "lat": (["x", "y"], lat),
             "time": pd.date_range("2014-09-06", periods=3),
             "reference_time": pd.Timestamp("2014-09-05"),
             },
         )

>>> df = skim.features(ds)
>>> df
    variables       data_types  NaNs    mean    std     maximums    minimums
0   temperature     float64     False   14.3177 9.08339 30.3361     -7.76803
1   precipitation   float64     False   4.62568 3.03081 9.89768     0.147005
medaprep.skim.memory(indata: Dataset) Series

This function uses utilities from pandas and dask (for dask-backed datasets) to check the memory size of the input dataset.

Parameters

indata (xarray.Dataset) – dataset to be skimmed.

Returns

series containing number of bytes for each column.

Return type

(pandas.Series)

Example

>>> print(data)
    <xarray.Dataset>
    Dimensions:      (time: 1, y: 1142, x: 1137)
    Coordinates:
        * y            (y) float64 3.716e+06 3.715e+06 ... 3.351e+06 3.351e+06
        * x            (x) float64 -1.102e+07 -1.102e+07 ... -1.066e+07 -1.066e+07
        spatial_ref  int32 3857
        * time         (time) datetime64[ns] 2022-07-03T17:25:22
    Data variables:
        visual       (time, y, x) uint8 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
        B01          (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray>
>>> print(skim.memory(data))
Index          6576899
visual         1298454
B01            2596908
spatial_ref    5193816
dtype: int64

medaprep.visualize module

Visualize module for use with data processed by medaprep.

This module implements visualization functionality that enables displaying the results of data processing outputs from medaprep.

This module is part of the core medaprep library and is intended to be called by user code.

medaprep.visualize.distributions(indata: Dataset, skim_table: DataFrame, sample_size: int) list

This function returns a list of bokeh figures containing estimated distributions of the variables within the dataset.

Parameters
  • indata (xarray.Dataset) – input data containing variables which distributions will be estimated from.

  • skim_table (pandas.DataFrame) – dataframe containing basic info about the dataset.

Returns

Bokeh figures containing estimated distributions of each variable.

medaprep.visualize.query(bbox: [tuple | list[tuple]], name: [str | list[str]], folium_map: Map, color: [str | list[str]]) Map

query takes in a list of bounding boxes (bbox), a list of names corresponding to the bounding boxes (name), and a folium map (m). It adds the bounding boxes to the map (m) with the corresponding names, and colors the boxes based on the list of colors (color). It then sets the bounds of the map based on the largest provided bounding box, and returns the map.

Parameters
  • bbox (tuple(s)) – containing (x1, y1, x2, y2) latitude and longitude coordinates of bounding boxes.

  • name (str) – containing a name for each bbox.

  • m (folium.Map) – map to plot boxes on.

  • color (str) – color for each bounding box.

Returns

folium.Map containing bounding boxes

Module contents