UncertainParameterArray
|
Structured array of parameter means and variances, allowing transformation with uncertainty handling. |
Methods
|
|
|
The natural-space distribution parameters which represent the mean of the transformed-space distributions |
|
Summation with uncertainty propagation |
Attributes
Array of |
|
Transformed values |
|
Standardized values |
- class gumbi.arrays.UncertainParameterArray(name: str, μ: ndarray, σ2: ndarray, stdzr: Standardizer, stdzd=False)
Bases:
UncertainArrayStructured array of parameter means and variances, allowing transformation with uncertainty handling.
The primary role of this class is to compactly store the outputs of our regression models (e.g.,
gumbi.GP). We typically use these models to produce parameter predictions or estimates, but under some transformation. For example, reaction rate must clearly be strictly positive, so we fit a GP to the log of rate in order to more appropriately conform to the assumption of normality. For prediction and visualization, however, we often need to switch back and forth between natural space (\(rate\)), transformed space (\(\text{ln}\; rate\)), and standardized space (\(\left( \text{ln}\; rate - \mu_{\text{ln}\; rate} \right)/\sigma_{\text{ln}\; rate}\)), meanwhile calculating summary statistics such as means and percentiles. This class is intended to facilitate switching between those different contexts.UncertainParameterArray, also accessible through the aliasuparray, combines the functionality ofParameterArrayandUncertainArray. A uparray stores the mean and variance of the variable itself as well as aStandardizerinstance. This makes it simple to switch between the natural scale of the parameter and its transformed and standardized values through thetandzproperties, respectively, with the accompanying variance transformed and scaled appropriately. This uncertainty is propagated under transformation, as withUncertainArray, and a scipy distribution object can be created at each point through thedistproperty, allowing access to that objects such asrvs(),ppf(),pdf(), etc.Notes
The name argument is intended to be the general name of the value held, not unique to this instance. Combining two
UncertainParameterArrayobjects with the same name results in a new object with that name; combining two objects with different names results in a new name that reflects this combination (so'A'+'B'becomes'(A+B)').The behavior of this object depends on the transformation associated with it, as indicated by its name in its stored
Standardizerinstance. If this transformation isnp.log(), the parameter is treated as a LogNormal variable; otherwise it’s treated as a Normal variable. This affects which distribution is returned bydist(lognorm vs norm) and also the interpretation ofμandσ2.For a Normal random variable, these are simply parameter’s mean and variance in unstandardized space,
t.μandt.σ2are identical toμandσ2, andz.μandz.σ2are the parameter’s mean and variance in standardized space.For a LogNormal random variable
Y, however,t.μandt.σ2are the mean and variance of a Normal variableXsuch thatexp(X)=Y(z.μandz.σ2are this mean and variance in standardized space). In this case,μandσ2are the scale and shape descriptors ofY, soself.μ = np.exp(self.t.μ)andself.σ2 = self.t.σ2. Thus,μandσ2are not strictly the mean and variance of the random variable in natural space, these can be obtained from thedist.This behavior is most important, and potentially most confusing, when calculating the
mean(). Averaging is performed in transformed space, where the random variable exhibits a Normal distribution and the mean also exhibits a Normal distribution, allowing error propagation to be applied analytically. Theμandσ2returned are the descriptors of the LogNormal distribution that represents the reverse transformation of this new Normal distribution. Therefore, the result is more akin to marginalizing out the given dimensions in the underlying model than a true natural-space average.
- Parameters:
name (str) – Name of variable.
μ (array) – Mean at each point
σ2 (array) – Variance at each point
stdzr (Standardizer) – An instance of
Standardizer, converted internally toStandardizerstdzd (bool, default False) – Whether the supplied values are on standardized scale instead of the natural scale
Examples
Create a LogNormal random variable, as indicated by its
Standardizer>>> from gumbi import uparray, Standardizer >>> import numpy as np >>> stdzr = Standardizer(m = {'μ': -5.30, 'σ': 0.582}, log_vars=['c']) >>> upa = uparray('c', np.arange(1,5)/10, np.arange(1,5)/100, stdzr) >>> upa m['μ', 'σ2']: [(0.1, 0.01) (0.2, 0.02) (0.3, 0.03) (0.4, 0.04)] >>> stdzr.transforms['c'] [<ufunc 'log'>, <ufunc 'exp'>]
Mean and variance of the parameter in standardized space:
>>> upa.z m_z['μ', 'σ2']: [(5.15019743, 0.02952256) (6.34117197, 0.05904512) (7.03784742, 0.08856768) (7.53214651, 0.11809024)]
Verify round-trip transformation:
>>> upa.stdzr.unstdz(upa.name, upa.z.μ, upa.z.σ2) (array([0.1, 0.2, 0.3, 0.4]), array([0.01, 0.02, 0.03, 0.04]))
Create a uparray from already-standardized values and verify round-trip transformation:
>>> uparray('c', np.arange(-2,3), np.arange(1,6)/10, stdzr, stdzd=True).z m_z['μ', 'σ2']: [(-2., 0.1) (-1., 0.2) ( 0., 0.3) ( 1., 0.4) ( 2., 0.5)]
For LogNormal parameters, uparray follows the scipy.stats convention of parameterizing a lognormal random variable in terms of it’s natural-space mean and its log-space standard deviation. Thus, a LogNormal uparray defined as m[‘μ’, ‘σ2’]: (0.1, 0.01) represents exp(Normal(log(0.1), 0.01)).
Note that the mean is not simply the mean of each component, it is the parameters of the LogNormal distribution that corresponds to the mean of the underlying Normal distributions in log (transformed) space.
>>> upa.μ.mean() 0.25 >>> upa.σ2.mean() 0.025 >>> upa.mean() m['μ', 'σ2']: (0.22133638, 0.00625)
You can verify the mean and variance returned by averaging over the random variable explicitly.
>>> upa.mean().dist.mean() 2.2202914201059437e-01 >>> np.exp(upa.t.mean().dist.rvs(10000, random_state=2021).mean()) 2.2133371283050837e-01 >>> upa.mean().dist.var() 3.0907071428047016e-04 >>> np.log(upa.mean().dist.rvs(10000, random_state=2021)).var() 6.304628046829242e-03
Calculate percentiles
>>> upa.dist.ppf(0.025) array([0.08220152, 0.1515835 , 0.21364308, 0.27028359]) >>> upa.dist.ppf(0.975) array([0.12165225, 0.26388097, 0.42126336, 0.59197082])
Draw samples
>>> upa.dist.rvs([3, *upa.shape], random_state=2021) array([[0.11605116, 0.22006429, 0.27902589, 0.34041327], [0.10571616, 0.1810085 , 0.36491077, 0.45507622], [0.10106982, 0.21230397, 0.3065239 , 0.33827997]])
You can compose the variable with numpy functions, though you may get a warning if the operation is poorly defined for the distribution (which is most transforms on LogNormal distributions). Transformations are applied in transformed space.
>>> (upa+1+np.tile(upa, (3,1))[2,3]).mean().t.dist.ppf(0.5) UserWarning: Transform is poorly defined for <ufunc 'log'>; results may be unexpected. -1.8423623672812148
- name
Name of variable.
- Type:
str
- μ
Mean at each point
- Type:
array
- σ2
Variance at each point
- Type:
array
- fields
Names of each level held in the array
- Type:
list of str
- stdzr
An instance of
Standardizercreated from the suppliedStandardizerobject- Type:
- property dist: rv_continuous
Array of
scipy.stats.rv_continuous()objects.If the transformation associated with the array’s parameter is log/exp, this is a lognorm distribution object with
scale=self.μands=self.t.σ. Otherwise it is a norm distribution withloc=self.μandscale=self.σ. See the scipy documentation on LogNormal and Normal random variables for more explanation and a list of methods.
- mean(axis=None, dtype=None, out=None, keepdims=False, **kwargs)
The natural-space distribution parameters which represent the mean of the transformed-space distributions
- sum(axis=None, dtype=None, out=None, keepdims=False, **kwargs)
Summation with uncertainty propagation
- property t: UncertainArray
Transformed values
- property z: UncertainArray
Standardized values