UncertainParameterArray
|
Structured array of parameter means and variances, allowing transformation with uncertainty handling. |
Methods
|
The natural-space distribution parameters which represent the mean of the transformed-space distributions |
|
Summation with uncertainty propagation |
Attributes
Array of |
|
Transformed values |
|
Standardized values |
- class gumbi.arrays.UncertainParameterArray(name: str, μ: ndarray, σ2: ndarray, stdzr: Standardizer, stdzd=False)
Bases:
UncertainArray
Structured array of parameter means and variances, allowing transformation with uncertainty handling.
The primary role of this class is to compactly store the outputs of our regression models (e.g.,
gumbi.GP
). We typically use these models to produce parameter predictions or estimates, but under some transformation. For example, reaction rate must clearly be strictly positive, so we fit a GP to the log of rate in order to more appropriately conform to the assumption of normality. For prediction and visualization, however, we often need to switch back and forth between natural space (\(rate\)), transformed space (\(\text{ln}\; rate\)), and standardized space (\(\left( \text{ln}\; rate - \mu_{\text{ln}\; rate} \right)/\sigma_{\text{ln}\; rate}\)), meanwhile calculating summary statistics such as means and percentiles. This class is intended to facilitate switching between those different contexts.UncertainParameterArray
, also accessible through the aliasuparray
, combines the functionality ofParameterArray
andUncertainArray
. A uparray stores the mean and variance of the variable itself as well as aStandardizer
instance. This makes it simple to switch between the natural scale of the parameter and its transformed and standardized values through thet
andz
properties, respectively, with the accompanying variance transformed and scaled appropriately. This uncertainty is propagated under transformation, as withUncertainArray
, and a scipy distribution object can be created at each point through thedist
property, allowing access to that objects such asrvs()
,ppf()
,pdf()
, etc.Notes
The name argument is intended to be the general name of the value held, not unique to this instance. Combining two
UncertainParameterArray
objects with the same name results in a new object with that name; combining two objects with different names results in a new name that reflects this combination (so'A'+'B'
becomes'(A+B)'
).The behavior of this object depends on the transformation associated with it, as indicated by its name in its stored
Standardizer
instance. If this transformation isnp.log()
, the parameter is treated as a LogNormal variable; otherwise it’s treated as a Normal variable. This affects which distribution is returned bydist
(lognorm vs norm) and also the interpretation ofμ
andσ2
.For a Normal random variable, these are simply parameter’s mean and variance in unstandardized space,
t.μ
andt.σ2
are identical toμ
andσ2
, andz.μ
andz.σ2
are the parameter’s mean and variance in standardized space.For a LogNormal random variable
Y
, however,t.μ
andt.σ2
are the mean and variance of a Normal variableX
such thatexp(X)=Y
(z.μ
andz.σ2
are this mean and variance in standardized space). In this case,μ
andσ2
are the scale and shape descriptors ofY
, soself.μ = np.exp(self.t.μ)
andself.σ2 = self.t.σ2
. Thus,μ
andσ2
are not strictly the mean and variance of the random variable in natural space, these can be obtained from thedist
.This behavior is most important, and potentially most confusing, when calculating the
mean()
. Averaging is performed in transformed space, where the random variable exhibits a Normal distribution and the mean also exhibits a Normal distribution, allowing error propagation to be applied analytically. Theμ
andσ2
returned are the descriptors of the LogNormal distribution that represents the reverse transformation of this new Normal distribution. Therefore, the result is more akin to marginalizing out the given dimensions in the underlying model than a true natural-space average.
- Parameters:
name (str) – Name of variable.
μ (array) – Mean at each point
σ2 (array) – Variance at each point
stdzr (Standardizer) – An instance of
Standardizer
, converted internally toStandardizer
stdzd (bool, default False) – Whether the supplied values are on standardized scale instead of the natural scale
Examples
Create a LogNormal random variable, as indicated by its
Standardizer
>>> from gumbi import uparray, Standardizer >>> import numpy as np >>> stdzr = Standardizer(m = {'μ': -5.30, 'σ': 0.582}, log_vars=['c']) >>> upa = uparray('c', np.arange(1,5)/10, np.arange(1,5)/100, stdzr) >>> upa m['μ', 'σ2']: [(0.1, 0.01) (0.2, 0.02) (0.3, 0.03) (0.4, 0.04)] >>> stdzr.transforms['c'] [<ufunc 'log'>, <ufunc 'exp'>]
Mean and variance of the parameter in standardized space:
>>> upa.z m_z['μ', 'σ2']: [(5.15019743, 0.02952256) (6.34117197, 0.05904512) (7.03784742, 0.08856768) (7.53214651, 0.11809024)]
Verify round-trip transformation:
>>> upa.stdzr.unstdz(upa.name, upa.z.μ, upa.z.σ2) (array([0.1, 0.2, 0.3, 0.4]), array([0.01, 0.02, 0.03, 0.04]))
Create a uparray from already-standardized values and verify round-trip transformation:
>>> uparray('c', np.arange(-2,3), np.arange(1,6)/10, stdzr, stdzd=True).z m_z['μ', 'σ2']: [(-2., 0.1) (-1., 0.2) ( 0., 0.3) ( 1., 0.4) ( 2., 0.5)]
For LogNormal parameters, uparray follows the scipy.stats convention of parameterizing a lognormal random variable in terms of it’s natural-space mean and its log-space standard deviation. Thus, a LogNormal uparray defined as m[‘μ’, ‘σ2’]: (0.1, 0.01) represents exp(Normal(log(0.1), 0.01)).
Note that the mean is not simply the mean of each component, it is the parameters of the LogNormal distribution that corresponds to the mean of the underlying Normal distributions in log (transformed) space.
>>> upa.μ.mean() 0.25 >>> upa.σ2.mean() 0.025 >>> upa.mean() m['μ', 'σ2']: (0.22133638, 0.00625)
You can verify the mean and variance returned by averaging over the random variable explicitly.
>>> upa.mean().dist.mean() 2.2202914201059437e-01 >>> np.exp(upa.t.mean().dist.rvs(10000, random_state=2021).mean()) 2.2133371283050837e-01 >>> upa.mean().dist.var() 3.0907071428047016e-04 >>> np.log(upa.mean().dist.rvs(10000, random_state=2021)).var() 6.304628046829242e-03
Calculate percentiles
>>> upa.dist.ppf(0.025) array([0.08220152, 0.1515835 , 0.21364308, 0.27028359]) >>> upa.dist.ppf(0.975) array([0.12165225, 0.26388097, 0.42126336, 0.59197082])
Draw samples
>>> upa.dist.rvs([3, *upa.shape], random_state=2021) array([[0.11605116, 0.22006429, 0.27902589, 0.34041327], [0.10571616, 0.1810085 , 0.36491077, 0.45507622], [0.10106982, 0.21230397, 0.3065239 , 0.33827997]])
You can compose the variable with numpy functions, though you may get a warning if the operation is poorly defined for the distribution (which is most transforms on LogNormal distributions). Transformations are applied in transformed space.
>>> (upa+1+np.tile(upa, (3,1))[2,3]).mean().t.dist.ppf(0.5) UserWarning: Transform is poorly defined for <ufunc 'log'>; results may be unexpected. -1.8423623672812148
- name
Name of variable.
- Type:
str
- μ
Mean at each point
- Type:
array
- σ2
Variance at each point
- Type:
array
- fields
Names of each level held in the array
- Type:
list of str
- stdzr
An instance of
Standardizer
created from the suppliedStandardizer
object- Type:
- property dist: rv_continuous
Array of
scipy.stats.rv_continuous()
objects.If the transformation associated with the array’s parameter is log/exp, this is a lognorm distribution object with
scale=self.μ
ands=self.t.σ
. Otherwise it is a norm distribution withloc=self.μ
andscale=self.σ
. See the scipy documentation on LogNormal and Normal random variables for more explanation and a list of methods.
- mean(axis=None, dtype=None, out=None, keepdims=False, **kwargs)
The natural-space distribution parameters which represent the mean of the transformed-space distributions
- sum(axis=None, dtype=None, out=None, keepdims=False, **kwargs)
Summation with uncertainty propagation
- property t: UncertainArray
Transformed values
- property z: UncertainArray
Standardized values