DataSet
|
Container for tabular data, allowing simple access to standardized values and wide or tidy dataframe formats. |
Methods
|
Constructs a DataSet from a tidy-form dataframe. |
|
Constructs a DataSet from a wide-form dataframe. |
Updates internal |
Attributes
Columns of dataframe with "float64" dtype. |
|
Columns of dataframe not contained in |
|
|
|
|
|
|
|
Provides keyword arguments for easy instantiation of a similar |
|
|
|
Tidy-form view of data |
|
|
|
Wide-form view of data |
|
|
|
|
- class gumbi.aggregation.DataSet(data: DataFrame, outputs: list, names_column: str = 'Variable', values_column: str = 'Value', log_vars: list | None = None, logit_vars: list | None = None, stdzr: Standardizer | None = None)
Bases:
object
Container for tabular data, allowing simple access to standardized values and wide or tidy dataframe formats.
DataSet
is instantiated with a wide-form dataframe, with all outputs of a given observation in a single row, but allows easy access to the corresponding tidy dataframe, with each output in a separate row ( thefrom_tidy()
also allows construction from tidy data`). The titles of the tidy-form columns for the output names and their values are supplied at instantiation, defaulting to “Variable” and “Value”. For example, say we have an observation at position (x,y) with measurements of i, j, and k. The wide-form dataframe would have one column for each of x, y, i, j, and k, while the tidy-form dataframe would have a column for each of x and y, a “Variable” column where each row contains either “i”, “j”, or “k” as strings, and a “Value” column containing the corresponding measurement. Wide data is more space-efficient and perhaps more intuitive to construct and inspect, while tidy data more clearly distinguishes inputs and outputs. These views are accessible through thewide
andtidy
attributes as instances ofWideData
andTidyData
, respectively.As a container for
WideData
andTidyData
, this class also provides simple access to standardized values of the data through wide.z and tidy.z or transformed values through wide.t and tidy.t. AStandardizer
instance can be supplied as a keyword argument, otherwise one will be constructed automatically from the supplied dataframe with the supplied values of log_vars and logit_vars. UnlikeWideData
andTidyData
, thewide
andtidy
attributes of a DataSet can be altered and sliced while retaining their functionality, with a cursory integrity check. TheStandardizer
instance can be updated withupdate_stdzr()
, for example following manipulation of the data or alteration oflog_vars
andlogit_vars
.- Parameters:
data (pd.DataFrame) – A wide-form dataframe. See class method
from_tidy()
for instantiation from tidy data.outputs (list) – Columns of data to be treated as outputs.
names_column (str, default 'Variable') – Name to be used in tidy view for column containing output names.
values_column (str, default 'Value') – Name to be used in tidy view for column containing output values.
log_vars (list, optional) – List of input and output variables to be treated as log-normal. Ignored if stdzr is supplied.
logit_vars (list, optional) – List of input and output variables to be treated as logit-normal. Ignored if stdzr is supplied.
stdzr (Standardizer, optional) – An
Standardizer
instance. If not supplied, one will be created automatically.
Examples
>>> df = pd.read_pickle(test_data / 'estimates_test_data.pkl') >>> ds = DataSet.from_tidy(df, names_column='Parameter', log_vars=['Y', 'c', 'b'], logit_vars=['X', 'e']) >>> ds DataSet: wide: [66 rows x 13 columns] tidy: [396 rows x 9 columns] outputs: ['e', 'f', 'b', 'c', 'a', 'd'] inputs: ['Code', 'Target', 'Y', 'X', 'Reaction', 'lg10_Z', 'Metric']
>>> ds.wide = ds.wide.drop(range(0,42,2)) DataSet: wide: [45 rows x 13 columns] tidy: [270 rows x 9 columns] outputs: ['e', 'f', 'b', 'c', 'a', 'd'] inputs: ['Code', 'Target', 'Y', 'X', 'Reaction', 'lg10_Z', 'Metric']
>>> ds.tidy.z # tidy-form dataframe with standardized values >>> ds.wide.z # wide-form dataframe with standardized values
- property float_inputs
Columns of dataframe with “float64” dtype.
- classmethod from_tidy(tidy, outputs=None, names_column='Variable', values_column='Value', stdzr=None, log_vars=None, logit_vars=None)
Constructs a DataSet from a tidy-form dataframe. See
DataSet
for explanation of arguments.
- classmethod from_wide(wide, outputs=None, names_column='Variable', values_column='Value', stdzr=None, log_vars=None, logit_vars=None)
Constructs a DataSet from a wide-form dataframe. See
DataSet
for explanation of arguments.
- property inputs
Columns of dataframe not contained in
outputs
.
- update_stdzr()
Updates internal
Standardizer
with current data,log_vars
, andlogit_vars
.