Distributions

Distribution components translate a simulant’s propensity into an exposure value. All distributions inherit from CausalFactorDistribution and implement an exposure_ppf method that evaluates the percent-point function at the simulant’s propensity. The PPF is the inverse of a cumulative distribution function: given a propensity q (a number between 0 and 1), it returns the exposure value x such that exactly a fraction q of the population has an exposure at or below x. In practical terms, each simulant’s propensity selects a point on the exposure distribution, and the PPF converts that point into a concrete exposure value (e.g., a blood-pressure reading or a category label). The distribution type is selected automatically from the risk’s configuration or artifact data.

Distribution

Exposure Type

Description

DichotomousDistribution

Categorical (2)

Assigns simulants to “exposed” or “unexposed” based on a single probability threshold. Supports rebinning from polytomous data.

PolytomousDistribution

Categorical (N)

Assigns simulants to one of N ordered or unordered categories using cumulative exposure probabilities.

ContinuousDistribution

Continuous

Models exposure with a normal or lognormal parametric distribution.

EnsembleDistribution

Continuous

Combines multiple weighted parametric distributions to capture complex exposure shapes.

Dichotomous Distribution

DichotomousDistribution models exposure as two mutually exclusive categories. When determining a simulant’s exposure, the component compares the simulant’s propensity to the exposure probability. if the propensity falls below the threshold the simulant is assigned to the “exposed” category; otherwise, “unexposed”.

When the underlying risk data is polytomous but the model needs a dichotomous representation, the rebinned_exposed configuration collapses selected categories into a single “exposed” group. See Rebinning and Category Thresholds.

Polytomous Distribution

PolytomousDistribution handles ordered and unordered categorical risks with N categories. Exposure probabilities for each category are loaded from the artifact, pivoted into a wide-format lookup table, and their cumulative sum is compared against each simulant’s propensity to select a category.

Because categories are sorted before the cumulative sum is computed, results are reproducible and consistent with the common random number framework.

Continuous Distribution

ContinuousDistribution supports normal and lognormal distribution types from the risk_distributions package. During setup, the component:

  1. Loads mean exposure and standard deviation data from the artifact.

  2. Computes the distribution’s native parameters (e.g., μ and σ for log-normal) via risk_distributions.Normal.get_parameters or risk_distributions.LogNormal.get_parameters.

  3. Builds a lookup table of those parameters, keyed by demographic bins.

  4. When determining exposure, looks up the parameters for each simulant and passes the simulant’s propensity through the distribution’s PPF to obtain a concrete exposure value (e.g., a systolic blood-pressure reading).

Propensity values are clipped to the range [0.0011, 0.998] before evaluation to avoid numerical issues at the distribution tails.

Ensemble Distribution

EnsembleDistribution models exposure using a weighted combination of several parametric distributions (for example, normal, log-normal, gamma, and others supported by the risk_distributions package). The component:

  1. Loads distribution weights and exposure data from the artifact.

  2. Computes per-distribution parameters via risk_distributions.EnsembleDistribution.get_parameters.

  3. At initialization, draws a second propensity per simulant (ensemble_propensity) that selects which child distribution to use.

  4. When determining exposure, the risk_distributions.EnsembleDistribution.ppf method uses both the simulant’s propensity (quantile) and ensemble propensity (distribution selection) to produce an exposure value.

This approach captures complex, potentially multi-modal exposure shapes that no single parametric family can represent.

See Also