DataSet
|
Container for tabular data, allowing simple access to standardized values and wide or tidy dataframe formats. |
Methods
|
Constructs a DataSet from a tidy-form dataframe. |
|
Constructs a DataSet from a wide-form dataframe. |
Updates internal |
Attributes
Columns of dataframe with "float64" dtype. |
|
Columns of dataframe not contained in |
|
|
|
|
|
|
|
|
|
Provides keyword arguments for easy instantiation of a similar |
|
|
|
Tidy-form view of data |
|
|
|
Wide-form view of data |
|
|
|
|
- class gumbi.aggregation.DataSet(data: DataFrame, outputs: list, names_column: str = 'Variable', values_column: str = 'Value', log_vars: list = None, logit_vars: list = None, isotropic_vars: list = None, stdzr: Standardizer = None)
Bases:
objectContainer for tabular data, allowing simple access to standardized values and wide or tidy dataframe formats.
DataSetis instantiated with a wide-form dataframe, with all outputs of a given observation in a single row, but allows easy access to the corresponding tidy dataframe, with each output in a separate row ( thefrom_tidy()also allows construction from tidy data`). The titles of the tidy-form columns for the output names and their values are supplied at instantiation, defaulting to “Variable” and “Value”. For example, say we have an observation at position (x,y) with measurements of i, j, and k. The wide-form dataframe would have one column for each of x, y, i, j, and k, while the tidy-form dataframe would have a column for each of x and y, a “Variable” column where each row contains either “i”, “j”, or “k” as strings, and a “Value” column containing the corresponding measurement. Wide data is more space-efficient and perhaps more intuitive to construct and inspect, while tidy data more clearly distinguishes inputs and outputs. These views are accessible through thewideandtidyattributes as instances ofWideDataandTidyData, respectively.As a container for
WideDataandTidyData, this class also provides simple access to standardized values of the data through wide.z and tidy.z or transformed values through wide.t and tidy.t. AStandardizerinstance can be supplied as a keyword argument, otherwise one will be constructed automatically from the supplied dataframe with the supplied values of log_vars and logit_vars. UnlikeWideDataandTidyData, thewideandtidyattributes of a DataSet can be altered and sliced while retaining their functionality, with a cursory integrity check. TheStandardizerinstance can be updated withupdate_stdzr(), for example following manipulation of the data or alteration oflog_varsandlogit_vars.- Parameters:
data (pd.DataFrame) – A wide-form dataframe. See class method
from_tidy()for instantiation from tidy data.outputs (list) – Columns of data to be treated as outputs.
names_column (str, default 'Variable') – Name to be used in tidy view for column containing output names.
values_column (str, default 'Value') – Name to be used in tidy view for column containing output values.
log_vars (list, optional) – List of input and output variables to be treated as log-normal. Ignored if stdzr is supplied.
logit_vars (list, optional) – List of input and output variables to be treated as logit-normal. Ignored if stdzr is supplied.
stdzr (Standardizer, optional) – An
Standardizerinstance. If not supplied, one will be created automatically.
Examples
>>> df = pd.read_pickle(test_data / 'estimates_test_data.pkl') >>> ds = DataSet.from_tidy(df, names_column='Parameter', log_vars=['Y', 'c', 'b'], logit_vars=['X', 'e']) >>> ds DataSet: wide: [66 rows x 13 columns] tidy: [396 rows x 9 columns] outputs: ['e', 'f', 'b', 'c', 'a', 'd'] inputs: ['Code', 'Target', 'Y', 'X', 'Reaction', 'lg10_Z', 'Metric']
>>> ds.wide = ds.wide.drop(range(0,42,2)) DataSet: wide: [45 rows x 13 columns] tidy: [270 rows x 9 columns] outputs: ['e', 'f', 'b', 'c', 'a', 'd'] inputs: ['Code', 'Target', 'Y', 'X', 'Reaction', 'lg10_Z', 'Metric']
>>> ds.tidy.z # tidy-form dataframe with standardized values >>> ds.wide.z # wide-form dataframe with standardized values
- property float_inputs
Columns of dataframe with “float64” dtype.
- classmethod from_tidy(tidy, outputs=None, names_column='Variable', values_column='Value', stdzr=None, log_vars=None, logit_vars=None)
Constructs a DataSet from a tidy-form dataframe. See
DataSetfor explanation of arguments.
- classmethod from_wide(wide, outputs=None, names_column='Variable', values_column='Value', stdzr=None, log_vars=None, logit_vars=None)
Constructs a DataSet from a wide-form dataframe. See
DataSetfor explanation of arguments.
- property inputs
Columns of dataframe not contained in
outputs.
- update_stdzr()
Updates internal
Standardizerwith current data,log_vars, andlogit_vars.