UncertainParameterArray

UncertainParameterArray(name, μ, σ2, stdzr)

Structured array of parameter means and variances, allowing transformation with uncertainty handling.

Methods

`UncertainParameterArray.extract`(field)
`UncertainParameterArray.mean`([axis, dtype, ...])	The natural-space distribution parameters which represent the mean of the transformed-space distributions
`UncertainParameterArray.sum`([axis, dtype, ...])	Summation with uncertainty propagation

Attributes

`UncertainParameterArray.dist`	Array of `scipy.stats.rv_continuous()` objects.
`UncertainParameterArray.t`	Transformed values
`UncertainParameterArray.z`	Standardized values

class gumbi.arrays.UncertainParameterArray(name: str, μ: ndarray, σ2: ndarray, stdzr: Standardizer, stdzd=False)

Bases: UncertainArray

Structured array of parameter means and variances, allowing transformation with uncertainty handling.

The primary role of this class is to compactly store the outputs of our regression models (e.g., gumbi.GP). We typically use these models to produce parameter predictions or estimates, but under some transformation. For example, reaction rate must clearly be strictly positive, so we fit a GP to the log of rate in order to more appropriately conform to the assumption of normality. For prediction and visualization, however, we often need to switch back and forth between natural space (\(rate\)), transformed space (\(\text{ln}\; rate\)), and standardized space (\(\left( \text{ln}\; rate - \mu_{\text{ln}\; rate} \right)/\sigma_{\text{ln}\; rate}\)), meanwhile calculating summary statistics such as means and percentiles. This class is intended to facilitate switching between those different contexts.

UncertainParameterArray, also accessible through the alias uparray, combines the functionality of ParameterArray and UncertainArray. A uparray stores the mean and variance of the variable itself as well as a Standardizer instance. This makes it simple to switch between the natural scale of the parameter and its transformed and standardized values through the t and z properties, respectively, with the accompanying variance transformed and scaled appropriately. This uncertainty is propagated under transformation, as with UncertainArray, and a scipy distribution object can be created at each point through the dist property, allowing access to that objects such as rvs(), ppf(), pdf(), etc.

Notes

The name argument is intended to be the general name of the value held, not unique to this instance. Combining two UncertainParameterArray objects with the same name results in a new object with that name; combining two objects with different names results in a new name that reflects this combination (so 'A'+'B' becomes '(A+B)').

The behavior of this object depends on the transformation associated with it, as indicated by its name in its stored Standardizer instance. If this transformation is np.log(), the parameter is treated as a LogNormal variable; otherwise it’s treated as a Normal variable. This affects which distribution is returned by dist (lognorm vs norm) and also the interpretation of μ and σ2.

For a Normal random variable, these are simply parameter’s mean and variance in unstandardized space, t.μ and t.σ2 are identical to μ and σ2, and z.μ and z.σ2 are the parameter’s mean and variance in standardized space.

For a LogNormal random variable Y, however, t.μ and t.σ2 are the mean and variance of a Normal variable X such that exp(X)=Y (z.μ and z.σ2 are this mean and variance in standardized space). In this case, μ and σ2 are the scale and shape descriptors of Y, so self.μ = np.exp(self.t.μ) and self.σ2 = self.t.σ2. Thus, μ and σ2 are not strictly the mean and variance of the random variable in natural space, these can be obtained from the dist.

This behavior is most important, and potentially most confusing, when calculating the mean(). Averaging is performed in transformed space, where the random variable exhibits a Normal distribution and the mean also exhibits a Normal distribution, allowing error propagation to be applied analytically. The μ and σ2 returned are the descriptors of the LogNormal distribution that represents the reverse transformation of this new Normal distribution. Therefore, the result is more akin to marginalizing out the given dimensions in the underlying model than a true natural-space average.

See also

norm: scipy Normal random variable

lognorm: scipy LogNormal random variable

ParameterArray

UncertainArray

Standardizer

Parameters:

name (str) – Name of variable.
μ (array) – Mean at each point
σ2 (array) – Variance at each point
stdzr (Standardizer) – An instance of Standardizer, converted internally to Standardizer
stdzd (bool, default False) – Whether the supplied values are on standardized scale instead of the natural scale

Examples

Create a LogNormal random variable, as indicated by its Standardizer

>>> from gumbi import uparray, Standardizer
>>> import numpy as np
>>> stdzr = Standardizer(m = {'μ': -5.30, 'σ': 0.582}, log_vars=['c'])
>>> upa = uparray('c', np.arange(1,5)/10, np.arange(1,5)/100, stdzr)
>>> upa
m['μ', 'σ2']: [(0.1, 0.01) (0.2, 0.02) (0.3, 0.03) (0.4, 0.04)]
>>> stdzr.transforms['c']
[<ufunc 'log'>, <ufunc 'exp'>]

Mean and variance of the parameter in standardized space:

>>> upa.z
m_z['μ', 'σ2']: [(5.15019743, 0.02952256) (6.34117197, 0.05904512)
                 (7.03784742, 0.08856768) (7.53214651, 0.11809024)]

Verify round-trip transformation:

>>> upa.stdzr.unstdz(upa.name, upa.z.μ, upa.z.σ2)
(array([0.1, 0.2, 0.3, 0.4]), array([0.01, 0.02, 0.03, 0.04]))

Create a uparray from already-standardized values and verify round-trip transformation:

>>> uparray('c', np.arange(-2,3), np.arange(1,6)/10, stdzr, stdzd=True).z
m_z['μ', 'σ2']: [(-2., 0.1) (-1., 0.2) ( 0., 0.3) ( 1., 0.4) ( 2., 0.5)]

For LogNormal parameters, uparray follows the scipy.stats convention of parameterizing a lognormal random variable in terms of it’s natural-space mean and its log-space standard deviation. Thus, a LogNormal uparray defined as m[‘μ’, ‘σ2’]: (0.1, 0.01) represents exp(Normal(log(0.1), 0.01)).

Note that the mean is not simply the mean of each component, it is the parameters of the LogNormal distribution that corresponds to the mean of the underlying Normal distributions in log (transformed) space.

>>> upa.μ.mean()
0.25
>>> upa.σ2.mean()
0.025
>>> upa.mean()
m['μ', 'σ2']: (0.22133638, 0.00625)

You can verify the mean and variance returned by averaging over the random variable explicitly.

>>> upa.mean().dist.mean()
2.2202914201059437e-01
>>> np.exp(upa.t.mean().dist.rvs(10000, random_state=2021).mean())
2.2133371283050837e-01
>>> upa.mean().dist.var()
3.0907071428047016e-04
>>> np.log(upa.mean().dist.rvs(10000, random_state=2021)).var()
6.304628046829242e-03

Calculate percentiles

>>> upa.dist.ppf(0.025)
array([0.08220152, 0.1515835 , 0.21364308, 0.27028359])
>>> upa.dist.ppf(0.975)
array([0.12165225, 0.26388097, 0.42126336, 0.59197082])

Draw samples

>>> upa.dist.rvs([3, *upa.shape], random_state=2021)
array([[0.11605116, 0.22006429, 0.27902589, 0.34041327],
       [0.10571616, 0.1810085 , 0.36491077, 0.45507622],
       [0.10106982, 0.21230397, 0.3065239 , 0.33827997]])

You can compose the variable with numpy functions, though you may get a warning if the operation is poorly defined for the distribution (which is most transforms on LogNormal distributions). Transformations are applied in transformed space.

>>> (upa+1+np.tile(upa, (3,1))[2,3]).mean().t.dist.ppf(0.5)
UserWarning: Transform is poorly defined for <ufunc 'log'>; results may be unexpected.
-1.8423623672812148

name

Name of variable.

Type:: str

μ

Mean at each point

Type:: array

σ2

Variance at each point

Type:: array

fields

Names of each level held in the array

Type:: list of str

stdzr

An instance of Standardizer created from the supplied Standardizer object

Type:: Standardizer

property dist: rv_continuous

Array of scipy.stats.rv_continuous() objects.

If the transformation associated with the array’s parameter is log/exp, this is a lognorm distribution object with scale=self.μ and s=self.t.σ. Otherwise it is a norm distribution with loc=self.μ and scale=self.σ. See the scipy documentation on LogNormal and Normal random variables for more explanation and a list of methods.

mean(axis=None, dtype=None, out=None, keepdims=False, **kwargs): The natural-space distribution parameters which represent the mean of the transformed-space distributions

sum(axis=None, dtype=None, out=None, keepdims=False, **kwargs): Summation with uncertainty propagation

property t: UncertainArray: Transformed values

property z: UncertainArray: Standardized values