Utilities

Errors and utility functions for input processing.

vivarium_inputs.utilities.scrub_gbd_conventions(data, location)[source]
Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.scrub_location(data, location)[source]
Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.scrub_sex(data)[source]
Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.scrub_age(data)[source]
Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.scrub_year(data)[source]
Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.scrub_affected_entity(data)[source]
Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.set_age_interval(data)[source]
Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.normalize(data, cols_to_fill, fill_value=None)[source]
Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.normalize_sex(data, cols_to_fill, fill_value)[source]
Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.normalize_year(data)[source]
Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.interpolate_year(data)[source]
vivarium_inputs.utilities.normalize_age(data, cols_to_fill, fill_value)[source]
Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.get_ordered_index_cols(data_columns)[source]
Parameters:

data_columns (Index | set)

vivarium_inputs.utilities.reshape(data, value_cols)[source]
Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.wide_to_long(data, value_cols, var_name)[source]
Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.sort_hierarchical_data(data)[source]

Reorder index labels of a hierarchical index and sort in level order.

Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.convert_affected_entity(data, column)[source]
Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.compute_categorical_paf(rr_data, e, affected_entity)[source]
Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.get_age_group_ids_by_restriction(entity, which_age)[source]
Return type:

tuple[float, float]

Parameters:
vivarium_inputs.utilities.filter_data_by_restrictions(data, entity, which_age, age_group_ids)[source]

Apply age/sex restrictions and filter out the data outside of the range.

Age restrictions can be applied in 4 different ways: - yld - yll - narrowest(inner) range of yll and yld - broadest(outer) range of yll and yld.

Return type:

DataFrame

Parameters:
  • data (DataFrame) – DataFrame containing ‘age_group_id’ and ‘sex_id’ columns.

  • entity (RiskFactor | Cause) – Cause or RiskFactor

  • which_age (str) – one of 4 choices: ‘yll’, ‘yld’, ‘inner’, ‘outer’.

  • age_group_ids (list[int]) – List of possible age group ids.

Returns:

DataFrame which is filtered out any data outside of age/sex restriction ranges.

vivarium_inputs.utilities.clear_disability_weight_outside_restrictions(data, cause, fill_value, age_group_ids)[source]

Because sequela disability weight is not age/sex specific, we need to have a custom function to set the values outside the corresponding cause restrictions to 0 after it has been expanded over age/sex.

Return type:

DataFrame

Parameters:
vivarium_inputs.utilities.filter_to_most_detailed_causes(data)[source]

For the DataFrame including the cause_ids, it filters rows with cause_ids for the most detailed causes

Return type:

DataFrame

Parameters:

data (DataFrame)

vivarium_inputs.utilities.get_restriction_age_ids(start_id, end_id, age_group_ids)[source]

Get the start/end age group id and return the list of GBD age_group_ids in-between.

Return type:

list[int]

Parameters:
  • start_id (int | None)

  • end_id (int | None)

  • age_group_ids (list[int])

vivarium_inputs.utilities.get_restriction_age_boundary(entity, boundary, reverse=False)[source]

Find the minimum/maximum age restriction (if both ‘yll’ and ‘yld’ restrictions exist) for a RiskFactor.

Parameters:
  • entity (RiskFactor | Cause) – RiskFactor or Cause for which to find the minimum/maximum age restriction.

  • boundary (str) – String ‘start’ or ‘end’ indicating whether to return the minimum(maximum) start age restriction or maximum(minimum) end age restriction.

  • reverse – if reverse is True, return the maximum of start age restriction and minimum of end age restriction.

Returns:

The age group id corresponding to the minimum or maximum start or end age restriction, depending on boundary, if both ‘yll’ and ‘yld’ restrictions exist. Otherwise, returns whichever restriction exists.

vivarium_inputs.utilities.get_exposure_and_restriction_ages(exposure, entity)[source]

Get the intersection of age groups found in exposure data and entity restriction age range. Used to filter other risk data where using just exposure age groups isn’t sufficient because exposure at the point of extraction is pre-filtering by age restrictions.

Return type:

set

Parameters:
  • exposure (DataFrame) – Exposure data for entity.

  • entity (RiskFactor) – Entity for which to find the intersecting exposure and restriction ages.

Returns:

  • Set of age groups found in both the entity’s exposure data and in the

  • entity’s age restrictions.

vivarium_inputs.utilities.split_interval(data, interval_column, split_column_prefix)[source]

Split a DataFrame with an interval index into a MultiIndex with start and end columns.

Return type:

DataFrame

Parameters:
  • data (DataFrame) – DataFrame with an interval index.

  • interval_column (str) – Name of the interval column.

  • split_column_prefix (str) – Prefix for the start and end columns.

Returns:

DataFrame with a MultiIndex containing start and end columns.

vivarium_inputs.utilities.process_kidney_dysfunction_exposure(data)[source]

Process kidney dysfunction exposure (rei ID 341) given GBD data. GBD data gives two measures and an inaccurate cat5 category. cat1, cat2, and cat3 are defined for measure 5 and cat4 for measure 18, but we will say they are all from measure 5 (this only makes a difference in validation and not within a simulation). There are cat5 values (the residual category) but they are calculated separately for each measure and so are not accurate. We will drop these values and recalculate cat5.

Return type:

DataFrame

Parameters:

data (DataFrame)

exception vivarium_inputs.utilities.DataTypeNotImplementedError[source]

Raised when a data_type is requested that is not implemented for a particular data source.

class vivarium_inputs.utilities.DataType(measure, data_type)[source]

Class to handle data types and their corresponding differences.

Parameters:
type

Data type(s) for which to extract data and used to determine the data’s value columns.

Notes

Supported values include: - ‘means’ for getting mean data - ‘draws’ for getting draw-level data - None for measures that do not have meaningful value columns (e.g. age bins)

The data for the following measures do not adhere standard data_types (i.e. they are not mean or draw-level data) and so this attribute is somewhat irrelevant: - structure - theoretical_minimum_risk_life_expectancy - estimate - exposure_distribution_weights

value_columns

List of value columns corresponding to the provided data type(s).

Notes

The following measures do not adhere to standard data type-specific value_columns and so have them set manually to ‘value’: - structure - theoretical_minimum_risk_life_expectancy - estimate - exposure_distribution_weights