Risk Exposure Distribution Models

This module contains tools for modeling several different risk exposure distributions.

exception vivarium_public_health.causal_factor.distributions.MissingDataError[source]: Custom exception for missing data.

class vivarium_public_health.causal_factor.distributions.CausalFactorDistribution(causal_factor, distribution_type, exposure_data=None)[source]

Abstract base class for causal factor exposure distribution models.

Subclasses implement specific distribution types (e.g., continuous, polytomous, dichotomous, ensemble) for modeling causal factor exposures in a simulation.

Parameters:

causal_factor (EntityString)
distribution_type (str)
exposure_data (int | float | pd.DataFrame | None)

get_configuration(builder)[source]

Return the configuration tree for this causal factor.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: ConfigTree | None
Returns:: The configuration sub-tree for this causal factor.

get_exposure_data(builder)[source]

Return exposure data (using pre-loaded data if available).

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: int | float | DataFrame
Returns:: The exposure data for this risk factor.

setup(builder)[source]

Register the exposure PPF pipeline.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

abstractmethod register_exposure_ppf_pipeline(builder)[source]

Register the exposure PPF pipeline with the simulation.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

class vivarium_public_health.causal_factor.distributions.EnsembleDistribution(causal_factor, distribution_type='ensemble')[source]

Model risk exposure using an ensemble of weighted distributions.

Combine multiple parametric distributions (e.g., normal, log-normal, gamma) weighted by GBD-derived weights to represent complex exposure distributions.

Parameters:

causal_factor (EntityString)
distribution_type (str)

setup(builder)[source]

Build distribution weight and parameter lookup tables.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

get_distribution_definitions(builder)[source]

Load and compute ensemble distribution definitions.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: tuple[list[str], DataFrame, dict[str, DataFrame]]
Returns:: A tuple of (distribution names, weights DataFrame, parameter dict keyed by distribution name).
Raises:: NotImplementedError – If the glnorm distribution has non-zero weights.

register_exposure_ppf_pipeline(builder)[source]

Register the ensemble exposure PPF pipeline.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

initialize_ensemble_propensity(pop_data)[source]

Initialize propensities for selecting child distributions in the ensemble.

Parameters:: pop_data (SimulantData) – Metadata about the simulants being initialized.
Return type:: None

exposure_ppf(index)[source]

Calculate exposure values from propensities using the ensemble.

Parameters:: index (Index) – An index representing the simulants.
Return type:: Series
Returns:: A series of exposure values.

class vivarium_public_health.causal_factor.distributions.ContinuousDistribution(causal_factor, distribution_type)[source]

Model risk exposure using a continuous parametric distribution.

Support "normal" and "lognormal" distribution types. Exposure values are derived from the distribution’s percent-point function evaluated at each simulant’s propensity.

Parameters:

causal_factor (EntityString)
distribution_type (str)

setup(builder)[source]

Compute distribution parameters and register pipelines.

Parameters:: builder – Access point for utilizing framework interfaces during setup.

get_distribution_parameters(builder)[source]

Compute the distribution parameters from exposure data.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None
Returns:: A DataFrame of distribution parameters (e.g., loc, scale).

register_exposure_ppf_pipeline(builder)[source]

Register the continuous exposure PPF pipeline.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

register_exposure_params_pipeline(builder)[source]

Register the exposure parameters pipeline.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

exposure_ppf(index)[source]

Calculate exposure values from propensities.

Parameters:: index (Index) – An index representing the simulants.
Return type:: Series
Returns:: A series of exposure values.

class vivarium_public_health.causal_factor.distributions.PolytomousDistribution(causal_factor, distribution_type, exposure_data=None)[source]

Model risk exposure as a set of ordered or unordered categories.

Assign each simulant to a category by comparing their propensity against the cumulative sum of category exposure probabilities.

Parameters:

causal_factor (EntityString)
distribution_type (str)
exposure_data (int | float | pd.DataFrame | None)

property categories: list[str]: The sorted list of exposure category names.

setup(builder)[source]

Build the exposure parameters table and register pipelines.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

get_exposure_value_columns(exposure_data)[source]

Extract unique category names from exposure data.

Parameters:: exposure_data (int | float | DataFrame) – The exposure data, either as a scalar or a DataFrame.
Return type:: list[str] | None
Returns:: A list of category names if the data is a DataFrame, or None for scalar data.

register_exposure_ppf_pipeline(builder)[source]

Register the polytomous exposure PPF pipeline.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

register_exposure_params_pipeline(builder)[source]

Register the exposure parameters pipeline.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

build_exposure_params_table(builder)[source]

Build the lookup table for exposure parameters.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: LookupTable
Returns:: A lookup table for the exposure parameters.

exposure_ppf(index)[source]

Assign each simulant a category based on their propensity.

Parameters:: index (Index) – An index representing the simulants.
Return type:: Series
Returns:: A series of category labels for each simulant.
Raises:: MissingDataError – If all exposure data sums to zero.

class vivarium_public_health.causal_factor.distributions.DichotomousDistribution(causal_factor, distribution_type, exposure_data=None)[source]

Model risk exposure as a two-category (exposed/unexposed) distribution.

Simulants with a propensity below the exposure probability are assigned to the exposed category; otherwise the unexposed category. Support optional rebinning of polytomous exposure data.

Parameters:

causal_factor (EntityString)
distribution_type (str)
exposure_data (int | float | pd.DataFrame | None)

property exposed: str: The name of the exposed category.

property unexposed: str: The name of the unexposed category.

rename_deprecated_categories(data)[source]

Rename deprecated cat1/cat2 parameter values to exposed/unexposed.

If the data contains 'cat1' in its 'parameter' column, the values are replaced with the distribution’s exposed and unexposed names. A FutureWarning is emitted for non-intervention causal factors to signal that the old names will be removed in a future release.

Parameters:: data (DataFrame) – A DataFrame with a 'parameter' column.
Return type:: DataFrame
Returns:: The DataFrame with renamed parameter values (modified in place).

setup(builder)[source]

Build the exposure table and register pipelines.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

register_exposure_ppf_pipeline(builder)[source]

Register the dichotomous exposure PPF pipeline.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

register_exposure_params_pipeline(builder)[source]

Register the exposure parameters pipeline with calibration support.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: None

build_exposure_table(builder)[source]

Build a lookup table for exposure data.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: LookupTable[Series]
Returns:: A lookup table for the exposure data.
Raises:: ValueError – If any exposure values are outside the range [0, 1].

get_exposure_data(builder)[source]

Load and optionally rebin exposure data for the risk.

Parameters:: builder (Builder) – Access point for utilizing framework interfaces during setup.
Return type:: int | float | DataFrame
Returns:: The (possibly rebinned) exposure data for the exposed category.

validate_rebin_source(builder, data)[source]

Validate that rebinning configuration is consistent with the data.

Parameters:

builder – Access point for utilizing framework interfaces during setup.
data (DataFrame) – The exposure data to validate against.

Raises:

ValueError – If rebinning and category thresholds are both specified, if any rebin categories are not found in the data, or if all categories are in the rebin set.

Return type:

None

exposure_parameter_source(index)[source]

Return exposure probabilities from the exposure lookup table.

Parameters:: index (Index) – An index representing the simulants.
Return type:: Series
Returns:: A series of exposure probabilities.

exposure_ppf(index)[source]

Assign each simulant to the exposed or unexposed category based on propensity.

Parameters:: index (Index) – An index representing the simulants.
Return type:: Series
Returns:: A series of exposed or unexposed category labels.

vivarium_public_health.causal_factor.distributions.clip(q)[source]

Clip quantile values to avoid distribution boundary issues.

The risk distributions package uses the 99.9th and 0.001st percentiles of a log-normal distribution as the bounds of the distribution support. This is bound up in the GBD risk factor PAF calculation process. Clip the distribution tails so we don’t get NaNs back from the distribution calls.

Parameters:: q (Series) – A series of quantile values to clip.
Return type:: Series
Returns:: The clipped quantile values.

vivarium_public_health.causal_factor.distributions.get_risk_distribution_parameter(data)[source]

Convert risk distribution parameter data to a usable format.

If the data is a DataFrame, set the non-value columns as the index and squeeze to a Series. Drop a "parameter" column if its only value is "continuous".

Parameters:: data (float | DataFrame) – The raw parameter data, either a scalar float or a DataFrame.
Return type:: float | Series
Returns:: The parameter as a float or a pd.Series indexed by demographic columns.