Risk Exposure Distribution Models
This module contains tools for modeling several different risk exposure distributions.
- exception vivarium_public_health.causal_factor.distributions.MissingDataError[source]
Custom exception for missing data.
- class vivarium_public_health.causal_factor.distributions.CausalFactorDistribution(causal_factor, distribution_type, exposure_data=None)[source]
Abstract base class for causal factor exposure distribution models.
Subclasses implement specific distribution types (e.g., continuous, polytomous, dichotomous, ensemble) for modeling causal factor exposures in a simulation.
- Parameters:
causal_factor (EntityString)
distribution_type (str)
- class vivarium_public_health.causal_factor.distributions.EnsembleDistribution(causal_factor, distribution_type='ensemble')[source]
Model risk exposure using an ensemble of weighted distributions.
Combine multiple parametric distributions (e.g., normal, log-normal, gamma) weighted by GBD-derived weights to represent complex exposure distributions.
- Parameters:
causal_factor (EntityString)
distribution_type (str)
- get_distribution_definitions(builder)[source]
Load and compute ensemble distribution definitions.
- Parameters:
builder (
Builder) – Access point for utilizing framework interfaces during setup.- Return type:
- Returns:
A tuple of (distribution names, weights DataFrame, parameter dict keyed by distribution name).
- Raises:
NotImplementedError – If the
glnormdistribution has non-zero weights.
- initialize_ensemble_propensity(pop_data)[source]
Initialize propensities for selecting child distributions in the ensemble.
- Parameters:
pop_data (
SimulantData) – Metadata about the simulants being initialized.- Return type:
- class vivarium_public_health.causal_factor.distributions.ContinuousDistribution(causal_factor, distribution_type)[source]
Model risk exposure using a continuous parametric distribution.
Support
"normal"and"lognormal"distribution types. Exposure values are derived from the distribution’s percent-point function evaluated at each simulant’s propensity.- Parameters:
causal_factor (EntityString)
distribution_type (str)
- setup(builder)[source]
Compute distribution parameters and register pipelines.
- Parameters:
builder – Access point for utilizing framework interfaces during setup.
- get_distribution_parameters(builder)[source]
Compute the distribution parameters from exposure data.
- class vivarium_public_health.causal_factor.distributions.PolytomousDistribution(causal_factor, distribution_type, exposure_data=None)[source]
Model risk exposure as a set of ordered or unordered categories.
Assign each simulant to a category by comparing their propensity against the cumulative sum of category exposure probabilities.
- Parameters:
causal_factor (EntityString)
distribution_type (str)
- get_exposure_value_columns(exposure_data)[source]
Extract unique category names from exposure data.
- build_exposure_params_table(builder)[source]
Build the lookup table for exposure parameters.
- Parameters:
builder (
Builder) – Access point for utilizing framework interfaces during setup.- Return type:
- Returns:
A lookup table for the exposure parameters.
- exposure_ppf(index)[source]
Assign each simulant a category based on their propensity.
- Parameters:
index (
Index) – An index representing the simulants.- Return type:
Series- Returns:
A series of category labels for each simulant.
- Raises:
MissingDataError – If all exposure data sums to zero.
- class vivarium_public_health.causal_factor.distributions.DichotomousDistribution(causal_factor, distribution_type, exposure_data=None)[source]
Model risk exposure as a two-category (exposed/unexposed) distribution.
Simulants with a propensity below the exposure probability are assigned to the exposed category; otherwise the unexposed category. Support optional rebinning of polytomous exposure data.
- Parameters:
causal_factor (EntityString)
distribution_type (str)
- rename_deprecated_categories(data)[source]
Rename deprecated cat1/cat2 parameter values to exposed/unexposed.
If the data contains
'cat1'in its'parameter'column, the values are replaced with the distribution’sexposedandunexposednames. AFutureWarningis emitted for non-intervention causal factors to signal that the old names will be removed in a future release.- Parameters:
data (
DataFrame) – A DataFrame with a'parameter'column.- Return type:
DataFrame- Returns:
The DataFrame with renamed parameter values (modified in place).
- register_exposure_params_pipeline(builder)[source]
Register the exposure parameters pipeline with calibration support.
- build_exposure_table(builder)[source]
Build a lookup table for exposure data.
- Parameters:
builder (
Builder) – Access point for utilizing framework interfaces during setup.- Return type:
LookupTable[Series]- Returns:
A lookup table for the exposure data.
- Raises:
ValueError – If any exposure values are outside the range [0, 1].
- validate_rebin_source(builder, data)[source]
Validate that rebinning configuration is consistent with the data.
- Parameters:
builder – Access point for utilizing framework interfaces during setup.
data (
DataFrame) – The exposure data to validate against.
- Raises:
ValueError – If rebinning and category thresholds are both specified, if any rebin categories are not found in the data, or if all categories are in the rebin set.
- Return type:
- vivarium_public_health.causal_factor.distributions.clip(q)[source]
Clip quantile values to avoid distribution boundary issues.
The risk distributions package uses the 99.9th and 0.001st percentiles of a log-normal distribution as the bounds of the distribution support. This is bound up in the GBD risk factor PAF calculation process. Clip the distribution tails so we don’t get NaNs back from the distribution calls.
- Parameters:
q (
Series) – A series of quantile values to clip.- Return type:
Series- Returns:
The clipped quantile values.
- vivarium_public_health.causal_factor.distributions.get_risk_distribution_parameter(data)[source]
Convert risk distribution parameter data to a usable format.
If the data is a DataFrame, set the non-value columns as the index and squeeze to a Series. Drop a
"parameter"column if its only value is"continuous".