Distributions
Distribution components translate a simulant’s
propensity into an exposure value. All distributions
inherit from
CausalFactorDistribution
and implement an exposure_ppf method that evaluates the
percent-point function at the simulant’s propensity.
The PPF is the inverse of a cumulative distribution function: given a
propensity q (a number between 0 and 1), it returns the exposure value x
such that exactly a fraction q of the population has an exposure at or below
x. In practical terms, each simulant’s propensity selects a point on the
exposure distribution, and the PPF converts that point into a concrete exposure
value (e.g., a blood-pressure reading or a category label). The distribution
type is selected automatically from the risk’s configuration or artifact data.
Distribution |
Exposure Type |
Description |
|---|---|---|
Categorical (2) |
Assigns simulants to “exposed” or “unexposed” based on a single probability threshold. Supports rebinning from polytomous data. |
|
Categorical (N) |
Assigns simulants to one of N ordered or unordered categories using cumulative exposure probabilities. |
|
Continuous |
Models exposure with a |
|
Continuous |
Combines multiple weighted parametric distributions to capture complex exposure shapes. |
Dichotomous Distribution
DichotomousDistribution
models exposure as two mutually exclusive categories. When determining a
simulant’s exposure, the component compares the simulant’s
propensity to the exposure probability. if the propensity
falls below the threshold the simulant is assigned to the “exposed” category;
otherwise, “unexposed”.
When the underlying risk data is polytomous but the model needs a
dichotomous representation, the rebinned_exposed configuration collapses
selected categories into a single “exposed” group. See
Rebinning and Category Thresholds.
Polytomous Distribution
PolytomousDistribution
handles ordered and unordered categorical risks with N categories. Exposure
probabilities for each category are loaded from the artifact, pivoted into a
wide-format lookup table, and their cumulative sum is compared against each
simulant’s propensity to select a category.
Because categories are sorted before the cumulative sum is computed, results are reproducible and consistent with the common random number framework.
Continuous Distribution
ContinuousDistribution
supports normal and lognormal distribution types from the
risk_distributions package. During setup, the component:
Loads mean exposure and standard deviation data from the artifact.
Computes the distribution’s native parameters (e.g., μ and σ for log-normal) via
risk_distributions.Normal.get_parametersorrisk_distributions.LogNormal.get_parameters.Builds a lookup table of those parameters, keyed by demographic bins.
When determining exposure, looks up the parameters for each simulant and passes the simulant’s propensity through the distribution’s PPF to obtain a concrete exposure value (e.g., a systolic blood-pressure reading).
Propensity values are clipped to the range [0.0011, 0.998] before evaluation to avoid numerical issues at the distribution tails.
Ensemble Distribution
EnsembleDistribution
models exposure using a weighted combination of several parametric
distributions (for example, normal, log-normal, gamma, and others supported
by the risk_distributions package). The component:
Loads distribution weights and exposure data from the artifact.
Computes per-distribution parameters via
risk_distributions.EnsembleDistribution.get_parameters.At initialization, draws a second propensity per simulant (
ensemble_propensity) that selects which child distribution to use.When determining exposure, the
risk_distributions.EnsembleDistribution.ppfmethod uses both the simulant’s propensity (quantile) and ensemble propensity (distribution selection) to produce an exposure value.
This approach captures complex, potentially multi-modal exposure shapes that no single parametric family can represent.