medaprep package
Subpackages
Submodules
medaprep.skim module
Skim module.
This module implements utility functions that provide high level details of raster data.
This module is part of the core medaprep library and is intended to be called by user code.
- medaprep.skim.features(indata: Dataset) DataFrame
This function returns a dataframe with information about the variables, data types, null values, means, standard deviations, maximums, and minimums for a given dataset.
- Parameters
indata (xarray.Dataset) – datset to be skimmed.
- Returns
table containing basic information about the dataset.
- Return type
(pandas.DataFrame)
Example
>>> import numpy as np >>> import pandas as pd >>> import xarray as xr >>> from medaprep import skim >>> temp = 15 + 8 * np.random.randn(2, 2, 3) >>> precip = 10 * np.random.rand(2, 2, 3) >>> lon = [[-99.83, -99.32], [-99.79, -99.23]] >>> lat = [[42.25, 42.21], [42.63, 42.59]] >>> ds = xr.Dataset( { "temperature": (["x", "y", "time"], temp), "precipitation": (["x", "y", "time"], precip), }, coords={ "lon": (["x", "y"], lon), "lat": (["x", "y"], lat), "time": pd.date_range("2014-09-06", periods=3), "reference_time": pd.Timestamp("2014-09-05"), }, ) >>> df = skim.features(ds) >>> df variables data_types NaNs mean std maximums minimums 0 temperature float64 False 14.3177 9.08339 30.3361 -7.76803 1 precipitation float64 False 4.62568 3.03081 9.89768 0.147005
- medaprep.skim.memory(indata: Dataset) Series
This function uses utilities from pandas and dask (for dask-backed datasets) to check the memory size of the input dataset.
- Parameters
indata (xarray.Dataset) – dataset to be skimmed.
- Returns
series containing number of bytes for each column.
- Return type
(pandas.Series)
Example
>>> print(data) <xarray.Dataset> Dimensions: (time: 1, y: 1142, x: 1137) Coordinates: * y (y) float64 3.716e+06 3.715e+06 ... 3.351e+06 3.351e+06 * x (x) float64 -1.102e+07 -1.102e+07 ... -1.066e+07 -1.066e+07 spatial_ref int32 3857 * time (time) datetime64[ns] 2022-07-03T17:25:22 Data variables: visual (time, y, x) uint8 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray> B01 (time, y, x) uint16 dask.array<chunksize=(1, 1142, 1137), meta=np.ndarray> >>> print(skim.memory(data)) Index 6576899 visual 1298454 B01 2596908 spatial_ref 5193816 dtype: int64
medaprep.visualize module
Visualize module for use with data processed by medaprep.
This module implements visualization functionality that enables displaying the results of data processing outputs from medaprep.
This module is part of the core medaprep library and is intended to be called by user code.
- medaprep.visualize.distributions(indata: Dataset, skim_table: DataFrame, sample_size: int) list
This function returns a list of bokeh figures containing estimated distributions of the variables within the dataset.
- Parameters
indata (xarray.Dataset) – input data containing variables which distributions will be estimated from.
skim_table (pandas.DataFrame) – dataframe containing basic info about the dataset.
- Returns
Bokeh figures containing estimated distributions of each variable.
- medaprep.visualize.query(bbox: [tuple | list[tuple]], name: [str | list[str]], folium_map: Map, color: [str | list[str]]) Map
query takes in a list of bounding boxes (bbox), a list of names corresponding to the bounding boxes (name), and a folium map (m). It adds the bounding boxes to the map (m) with the corresponding names, and colors the boxes based on the list of colors (color). It then sets the bounds of the map based on the largest provided bounding box, and returns the map.
- Parameters
bbox (tuple(s)) – containing (x1, y1, x2, y2) latitude and longitude coordinates of bounding boxes.
name (str) – containing a name for each bbox.
m (folium.Map) – map to plot boxes on.
color (str) – color for each bounding box.
- Returns
folium.Map containing bounding boxes