Standardizer
|
Container for dict of mean (μ) and variance (σ2) for every parameter. |
Methods
|
Construct from wide-form DataFrame |
|
Transforms, mean-centers, and scales a parameter, distribution, or Series |
|
Transforms a parameter, distribution, or Series |
|
Untransforms, un-centers, and un-scales a parameter, distribution, or Series |
|
Untransforms a parameter, distribution, or Series |
Ensures provided dictionary has all required attributes |
Attributes
List of log-normal variables |
|
List of logit-normal variables |
|
Function that transforms the mean of a distribution. |
|
Collection of forward and reverse transform functions for each variable |
|
Function that transforms the variance of a distribution. |
- class gumbi.aggregation.Standardizer(log_vars=None, logit_vars=None, **kwargs)
Bases:
dict
Container for dict of mean (μ) and variance (σ2) for every parameter.
Standardizer
objects allow transformation and normalization of datasets. The main methods arestdz()
, which attempts to coerce the values of a given variable to a standard normal distribution (z-scores), and its complementunstdz()
. The steps are\[\mathbf{\text{tidy}} \rightarrow \text{transform} \rightarrow \text{mean-center} \rightarrow \text{scale} \rightarrow \mathbf{\text{tidy.z}}\]For example, reaction rate must clearly be strictly positive, so we use a log transformation so that it behaves as a normally-distributed random variable. We then mean-center and scale this transformed value to obtain z-scores indicating how similar a given estimate is to all the other estimates we’ve observed. Standardizer stores the transforms and population mean and variance for every parameter, allowing us to convert back and forth between natural space (\(rate\)), transformed space (\(\text{ln}\; rate\)), and standardized space (\(\left( \text{ln}\; rate - \mu_{\text{ln}\; rate} \right)/\sigma_{\text{ln}\; rate}\)).
Typically, a
Standardizer
will be constructed from a dataframe (from_DataFrame()
), but the individual means and variances can be provided at instantiation as well. Note, however, that these should be the mean/std of the transformed variable. For example, if r should be treated as log-normal with a natural-space mean of 1 and variance of 0.1, the right way to instantiate the class would be Standardizer(d={‘μ’: 0, ‘σ2’: 0.1}, log_vars=[‘d’]).Notes
Standardizer
is just a dictionary with some extra methods and defaults, so standard dictionary methods likedict.update()
still work.- Parameters:
log_vars (list, optional) – List of input and output variables to be treated as log-normal.
logit_vars (list, optional) – List of input and output variables to be treated as logit-normal.
**kwargs – Mean and variance of each variable as a dictionary, e.g. d={‘μ’: 0, ‘σ2’: 0.1}
Examples
>>> import numpy as np >>> import pandas as pd >>> from gumbi import Standardizer >>> stdzr = Standardizer(x={'μ': 1, 'σ2': 0.1}, d={'μ': 0, 'σ2': 0.1}, log_vars=['d'])
Transforming and standardizing a single parameter:
>>> stdzr.transform('x', μ=1) 1 >>> stdzr.stdz('x', 1) 0.0 >>> stdzr.unstdz('x', 0) 1.0 >>> stdzr.stdz('x', 1+0.1**0.5) 1.0 # approximately >>> stdzr.unstdz('x', 1) 1.316227766016838 >>> stdzr.stdz('d', 1) 0.0 >>> stdzr.stdz('d', np.exp(0.1**0.5)) 1.0 # approximately
Transforming and standardizing a distribution:
>>> stdzr.transform('x', μ=1., σ2=0.1) (1, 0.1) >>> stdzr.stdz('x', 1, 0.1) (0.0, 1.0) >>> stdzr.stdz('d', 1, 0.1) (0.0, 1.0) >>> stdzr.transform('d', 1, 0.1) (0.0, 0.1)
Standardizing a series:
>>> x_series = pd.Series(np.arange(1,5), name='x') >>> stdzr.stdz(x_series) 0 0.000000 1 3.162278 2 6.324555 3 9.486833 Name: x, dtype: float64 >>> r_series = pd.Series(np.arange(1,5), name='d') >>> stdzr.stdz(r_series) 0 0.000000 1 2.191924 2 3.474117 3 4.383848 Name: r, dtype: float64
- classmethod from_DataFrame(df: DataFrame, log_vars=None, logit_vars=None)
Construct from wide-form DataFrame
- property log_vars: list[str]
List of log-normal variables
- property logit_vars: list[str]
List of logit-normal variables
- property mean_transforms
Function that transforms the mean of a distribution.
These transform’s should follow scipy’s conventions such that a distribution can be defined in the given space by passing (loc=μ, scale=σ2**0.5). For a lognormal variable, an RV defined as
lognorm(loc=μ, scale=σ2**0.5)
in “natural” space is equivalent tonorm(loc=np.log(μ), scale=σ2**0.5)
in log space, so this transform should returnnp.log(μ)
when converting from natural to log space, andnp.exp(μ)
when converting from log to natural space. Similarly for a logit-normal variable, an RV defined aslogitnorm(loc=μ, scale=σ2**0.5))
in natural space is equivalent tonorm(loc=logit(μ), scale=σ2**0.5)
in logit space, so this transform should returnlogit(μ)
when converting from natural to logit space, andexpit(μ)
when converting from logit to natural space.
- stdz(name: str | pd.Series, μ: float = None, σ2: float = None) float | tuple | pd.Series
Transforms, mean-centers, and scales a parameter, distribution, or Series
- Parameters:
name (str or pd.Series) – Name of parameter. If a Series is supplied, the name of the series must be the parameter name.
μ (float, optional) – Value of parameter or mean of parameter distribution. Only optional if first argument is a Series.
σ2 (float, optional) – Variance of parameter distribution.
- Returns:
Standardized parameter, (mean, variance) of standardized distribution, or standardized Series
- Return type:
float, tuple, or pd.Series
- transform(name: str | pd.Series, μ: float = None, σ2: float = None) float | tuple | pd.Series
Transforms a parameter, distribution, or Series
- Parameters:
name (str or pd.Series) – Name of parameter. If a Series is supplied, the name of the series must be the parameter name.
μ (float, optional) – Value of parameter or mean of parameter distribution. Only optional if first argument is a Series.
σ2 (float, optional) – Variance of parameter distribution.
- Returns:
Transformed parameter, (mean, variance) of untransformed distribution, or untransformed Series
- Return type:
float, tuple, or pd.Series
- property transforms: dict
Collection of forward and reverse transform functions for each variable
- unstdz(name: str | pd.Series, μ: float = None, σ2: float = None) float | tuple | pd.Series
Untransforms, un-centers, and un-scales a parameter, distribution, or Series
- Parameters:
name (str or pd.Series) – Name of parameter. If a Series is supplied, the name of the series must be the parameter name.
μ (float, optional) – Value of parameter or mean of parameter distribution. Only optional if first argument is a Series.
σ2 (float, optional) – Variance of parameter distribution.
- Returns:
Unstandardized parameter, (mean, variance) of unstandardized distribution, or unstandardized Series
- Return type:
float, tuple, or pd.Series
- untransform(name: str | pd.Series, μ: float = None, σ2: float = None) float | tuple | pd.Series
Untransforms a parameter, distribution, or Series
- Parameters:
name (str or pd.Series) – Name of parameter. If a Series is supplied, the name of the series must be the parameter name.
μ (float, optional) – Value of parameter or mean of parameter distribution. Only optional if first argument is a Series.
σ2 (float, optional) – Variance of parameter distribution.
- Returns:
Untransformed parameter, (mean, variance) of untransformed distribution, or untransformed Series
- Return type:
float, tuple, or pd.Series
- classmethod validate(dct: dict)
Ensures provided dictionary has all required attributes
- property var_transforms
Function that transforms the variance of a distribution.
These transform’s should follow scipy’s conventions such that a distribution can be defined in the given space by passing (loc=μ, scale=σ2**0.5). Accordingly, since both log-normal and logit-normal variables are defined in terms of the scale (standard deviation) in their respective transformed spaces, this function simply returns the variance unchanged in these cases.