Risk Effects

vivarium.public_health provides components for modeling how risk factor exposures modify disease outcomes. This tutorial covers risk effects - the components that translate exposure into changes in disease rates.

For how simulants are assigned exposure values, see the Risk Exposure tutorial.

Overview 

A risk effect modifies a target rate based on exposure values of that risk. A target is identified by an entity type, entity name, and measure in dotted form: {entity_type}.{entity_name}.{measure} (e.g., cause.lung_cancer.incidence_rate).

There are two effect components:

RiskEffect - multiplies a target rate (e.g., disease incidence) by a relative risk for exposed simulants.
NonLogLinearRiskEffect - a variant where relative risk is parameterized by exposure level, using piecewise linear interpolation.

A typical risk model pairs a Risk with one or more RiskEffect components to modify disease rates. The Risk determines who is exposed, and the RiskEffect determines how much that exposure changes the outcome.

Common Setup 

Every code example in this tutorial uses imports and helpers shown below. To run any example in a standalone script, include all of these at the top:

from vivarium.engine import InteractiveContext
from vivarium.public_health.risks import Risk, RiskEffect
from vivarium.public_health.disease import SI, SIS
from vivarium.public_health.population import BasePopulation
from vivarium.public_health._example_data import BASE_PLUGINS, make_base_config

# BASE_PLUGINS overrides the data plugin to use ExampleArtifactManager,
# which serves example data from memory instead of requiring a real HDF file.
base_plugins = BASE_PLUGINS

# make_base_config() returns a configuration with sensible defaults for
# time range, step size, and randomness key columns.
config = make_base_config()

Artifact Data Format 

This section documents the key name and column layout that the risk effect components expect. Risk effect components support the data_sources configuration pattern that lets you override individual keys with a scalar, DataFrame, or callable without rebuilding the artifact (see Data sources).

Data keys 

The table below lists every data key used by the risk effect components.

Key	Index columns	Value columns	Used by	Configurable?
`risk_factor.{name}.relative_risk`	age, sex, year, parameter, affected_entity, affected_measure	`value` (relative risk per category)	`RiskEffect`	Yes - `risk_effect.{name}_on_{target}.data_sources.relative_risk`
`risk_factor.{name}.population_attributable_fraction`	age, sex, year, affected_entity, affected_measure	`value` (fraction)	`RiskEffect`	Yes - `risk_effect.{name}_on_{target}.data_sources.population_attributable_fraction`

Artifact data shapes 

The examples below use the data builders from the _example_data module; a production artifact has the same column layout but with real GBD values.

from vivarium.public_health._example_data import (
    risk_relative_risk_dichotomous,
    risk_paf,
)

# risk_factor.{name}.relative_risk - RR per exposure category and target.
rr = risk_relative_risk_dichotomous(2.0, "test_cause", "incidence_rate")
print(rr.query("year_start == 1990 and parameter == 'exposed'").head(2).to_string(index=False))

 age_start  age_end    sex  year_start  year_end parameter  value affected_entity affected_measure
       0.0 0.019178   Male        1990      1991   exposed    2.0      test_cause   incidence_rate
       0.0 0.019178 Female        1990      1991   exposed    2.0      test_cause   incidence_rate

# risk_factor.{name}.population_attributable_fraction - PAF per target.
paf = risk_paf(0.3, "test_cause", "incidence_rate")
print(paf.query("year_start == 1990").head(2).to_string(index=False))

 age_start  age_end    sex  year_start  year_end affected_entity affected_measure  value
       0.0 0.019178   Male        1990      1991      test_cause   incidence_rate    0.3
       0.0 0.019178 Female        1990      1991      test_cause   incidence_rate    0.3

Data sources 

Risk effect components support a data_sources configuration pattern that lets you override individual data keys without rebuilding the artifact. You can override any key with:

Scalar (int or float) - broadcast a constant value to all simulants.
DataFrame - use the DataFrame directly.
Callable - call the function at setup time to produce the data.
Artifact key (string) - load a different key from the artifact.

RiskEffect declares:

# Default configuration (loaded from the artifact):
risk_effect.{name}_on_{target}:
  data_sources:
    relative_risk: "risk_factor.{name}.relative_risk"
    population_attributable_fraction: "risk_factor.{name}.population_attributable_fraction"

Any of these can be overridden with scalars in the simulation configuration:

configuration:
  risk_effect.{name}_on_{target_entity}.{target_name}.{target_measure}:
    data_sources:
      relative_risk: 2.0  # scalar, DataFrame, callable, or artifact key
      population_attributable_fraction: 0  # scalar or artifact key

Note

In configuration keys, {target} expands to the dotted form {target_entity}.{target_name}.{target_measure} (e.g., risk_effect.smoking_on_cause.lung_cancer.incidence_rate).

For dichotomous risks, all data sources can be overridden this way. For continuous risks, some keys (e.g., tmred and relative_risk_scalar) are loaded directly from the artifact and cannot be overridden via configuration - see the NonLogLinearRiskEffect section for the setup=False workaround.

RiskEffect 

A RiskEffect modifies disease dynamics based on exposure. In the standard pattern, exposed simulants have a higher incidence rate (multiplied by the relative risk) than unexposed simulants.

Observing the effect 

The following example demonstrates that exposed simulants become infected at a higher rate than unexposed simulants:

config = make_base_config()
config.update(
    {
        "population": {"population_size": 10_000},
        "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
        "risk_factor.test_risk": {
            "distribution_type": "dichotomous",
            "data_sources": {"exposure": 0.5},
        },
        "risk_effect.test_risk_on_cause.test_cause.incidence_rate": {
            "data_sources": {
                "relative_risk": 5.0,
                "population_attributable_fraction": 0,
            },
        },
    },
    layer="override",
)

sim = InteractiveContext(
    components=[
        BasePopulation(),
        Risk("risk_factor.test_risk"),
        RiskEffect("risk_factor.test_risk", "cause.test_cause.incidence_rate"),
        SI("test_cause"),
    ],
    configuration=config,
    plugin_configuration=base_plugins,
)

# Step forward to allow infections to occur.
for _ in range(3):
    sim.step()

pop = sim.get_population(["test_cause", "test_risk.exposure"])
exposed = pop[pop["test_risk.exposure"] == "exposed"]
unexposed = pop[pop["test_risk.exposure"] == "unexposed"]

exposed_infection_rate = (exposed["test_cause"] == "test_cause").sum() / len(exposed)
unexposed_infection_rate = (unexposed["test_cause"] == "test_cause").sum() / len(
    unexposed
)

# With RR=5, the ratio of infection rates should be approximately 5.
ratio = exposed_infection_rate / unexposed_infection_rate
print(f"Rate ratio near 5: {np.isclose(ratio, 5, rtol=0.15)}")

Rate ratio near 5: True

Multiple risk effects 

A single Risk can have effects on multiple targets, and multiple risks can target the same disease. Each RiskEffect multiplies the target rate independently:

config = make_base_config()
config.update(
    {
        "population": {"population_size": 20_000},
        "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
        # First risk: smoking
        "risk_factor.smoking": {
            "distribution_type": "dichotomous",
            "data_sources": {"exposure": 0.3},
        },
        "risk_effect.smoking_on_cause.test_cause.incidence_rate": {
            "data_sources": {
                "relative_risk": 3.0,
                "population_attributable_fraction": 0,
            },
        },
        # Second risk: air pollution
        "risk_factor.air_pollution": {
            "distribution_type": "dichotomous",
            "data_sources": {"exposure": 0.7},
        },
        "risk_effect.air_pollution_on_cause.test_cause.incidence_rate": {
            "data_sources": {
                "relative_risk": 2.0,
                "population_attributable_fraction": 0,
            },
        },
    },
    layer="override",
)

sim = InteractiveContext(
    components=[
        BasePopulation(),
        Risk("risk_factor.smoking"),
        Risk("risk_factor.air_pollution"),
        RiskEffect("risk_factor.smoking", "cause.test_cause.incidence_rate"),
        RiskEffect("risk_factor.air_pollution", "cause.test_cause.incidence_rate"),
        SI("test_cause"),
    ],
    configuration=config,
    plugin_configuration=base_plugins,
)

for _ in range(3):
    sim.step()

pop = sim.get_population(
    ["test_cause", "smoking.exposure", "air_pollution.exposure"]
)
# Compute infection rates by exposure group.
both_exposed = pop[
    (pop["smoking.exposure"] == "exposed")
    & (pop["air_pollution.exposure"] == "exposed")
]
smoking_only = pop[
    (pop["smoking.exposure"] == "exposed")
    & (pop["air_pollution.exposure"] == "unexposed")
]
pollution_only = pop[
    (pop["smoking.exposure"] == "unexposed")
    & (pop["air_pollution.exposure"] == "exposed")
]
neither_exposed = pop[
    (pop["smoking.exposure"] == "unexposed")
    & (pop["air_pollution.exposure"] == "unexposed")
]
both_rate = (both_exposed["test_cause"] == "test_cause").sum() / len(both_exposed)
smoking_only_rate = (smoking_only["test_cause"] == "test_cause").sum() / len(
    smoking_only
)
pollution_only_rate = (pollution_only["test_cause"] == "test_cause").sum() / len(
    pollution_only
)
neither_rate = (neither_exposed["test_cause"] == "test_cause").sum() / len(
    neither_exposed
)

# Combined RR is 3*2=6, so both-exposed vs neither should have ratio near 6.
both_ratio = both_rate / neither_rate
print(f"Both-exposed ratio near 6: {np.isclose(both_ratio, 6, rtol=0.2)}")
# Smoking-only RR is 3, so ratio should be near 3.
smoking_ratio = smoking_only_rate / neither_rate
print(f"Smoking-only ratio near 3: {np.isclose(smoking_ratio, 3, rtol=0.1)}")
# Air-pollution-only RR is 2, so ratio should be near 2.
pollution_ratio = pollution_only_rate / neither_rate
print(f"Pollution-only ratio near 2: {np.isclose(pollution_ratio, 2, rtol=0.1)}")

Both-exposed ratio near 6: True
Smoking-only ratio near 3: True
Pollution-only ratio near 2: True

Population Attributable Fraction 

The population attributable fraction (PAF) adjusts the baseline rate so that the population-level rate (after applying relative risks to exposed simulants) matches the original data. When PAF is 0, the baseline rate is used as-is; when PAF is greater than 0, the baseline is reduced so that the population-average rate remains consistent with the input data.

Without PAF correction, adding a risk with RR > 1 inflates the population-average rate above the input data. With the correct PAF, the baseline is scaled down to compensate.

# Run two simulations: one with PAF=0 and one with PAF=0.3.
# The PAF simulation should have a lower overall infection rate
# because the baseline rate is multiplied by (1 - PAF).

def run_paf_sim(paf_value):
    config = make_base_config()
    config.update(
        {
            "population": {"population_size": 10_000},
            "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
            "risk_factor.test_risk": {
                "distribution_type": "dichotomous",
                "data_sources": {"exposure": 0.5},
            },
            "risk_effect.test_risk_on_cause.test_cause.incidence_rate": {
                "data_sources": {
                    "relative_risk": 2.0,
                    "population_attributable_fraction": paf_value,
                },
            },
        },
        layer="override",
    )
    sim = InteractiveContext(
        components=[
            BasePopulation(),
            Risk("risk_factor.test_risk"),
            RiskEffect("risk_factor.test_risk", "cause.test_cause.incidence_rate"),
            SI("test_cause"),
        ],
        configuration=config,
        plugin_configuration=base_plugins,
    )
    for _ in range(3):
        sim.step()
    pop = sim.get_population(["test_cause"])
    return (pop["test_cause"] == "test_cause").sum() / len(pop)

rate_no_paf = run_paf_sim(0.0)
rate_with_paf = run_paf_sim(0.3)

# PAF > 0 reduces the baseline, so the overall population infection rate
# should be lower than without PAF.
print(f"PAF reduces population rate: {rate_with_paf < rate_no_paf}")

# The ratio of rates should be approximately (1 - PAF) = 0.7.
ratio = rate_with_paf / rate_no_paf
print(f"Rate ratio near (1 - PAF): {np.isclose(ratio, 0.7, atol=0.05)}")

PAF reduces population rate: True
Rate ratio near (1 - PAF): True

NonLogLinearRiskEffect 

A NonLogLinearRiskEffect models the relationship between a continuous exposure and a target rate using piecewise linear interpolation. Unlike RiskEffect (which applies a single RR to exposed simulants), this component assigns each simulant an individually interpolated RR based on their actual exposure level.

The relative risk data must contain a numeric parameter column with exposure thresholds (typically 1000 values spanning the plausible range) and corresponding value entries. The component also requires TMRED (Theoretical Minimum-Risk Exposure Distribution) data, which defines the exposure level at which relative risk equals 1.

Building relative risk data 

The RR data for NonLogLinearRiskEffect has a numeric parameter column (exposure thresholds) rather than categorical labels:

# Build RR data: 1000 exposure thresholds from 1 to 9.
# RR increases linearly from 1.0 (at exposure=1) to 5.0 (at exposure=9).
from vivarium.public_health._example_data import risk_relative_risk_continuous

rr_data = risk_relative_risk_continuous(
    exposure_min=1, exposure_max=9, rr_min=1.0, rr_max=5.0
)

print(f"RR data shape: {rr_data.shape}")
print(f"Parameter range: {rr_data['parameter'].min():.1f} to {rr_data['parameter'].max():.1f}")
print(f"RR range: {rr_data['value'].min():.1f} to {rr_data['value'].max():.1f}")

RR data shape: (1000, 6)
Parameter range: 1.0 to 9.0
RR range: 1.0 to 5.0

Running with continuous exposure 

NonLogLinearRiskEffect requires a continuous exposure pipeline. Because it uses piecewise linear interpolation over numeric exposure values, we need a Risk subclass that produces numeric exposures rather than categorical ones. The simulation is created with setup=False so that RR and TMRED data can be written to the artifact before components initialize:

Note

ContinuousExposureRisk is a minimal demo shortcut that bypasses the parent Risk machinery (propensity, distribution lookup, framework randomness). Production continuous risks should use Risk directly with a continuous distribution_type.

Note

sim._data is an internal API. The setup=False pattern followed by sim._data.write() is specific to interactive and tutorial contexts where data must be injected before component setup.

class ContinuousExposureRisk(Risk):
    """A Risk that assigns random numeric exposures between 1 and 9."""

    def setup(self, builder):
        self.distribution_type = None
        col = f"{self.causal_factor.name}_exposure_for_non_loglinear_riskeffect"
        self._col = col
        builder.value.register_attribute_producer(
            f"{self.causal_factor.name}.exposure", source=self._get_exposure
        )
        builder.population.register_initializer(
            initializer=self._init_exposure, columns=col
        )

    def _init_exposure(self, pop_data):
        rng = np.random.default_rng(12345)
        values = rng.uniform(1, 9, size=len(pop_data.index))
        self.population_view.initialize(
            pd.Series(values, index=pop_data.index, name=self._col)
        )

    def _get_exposure(self, index):
        return self.population_view.get(index, self._col)

    def on_time_step_prepare(self, event):
        pass

# Build RR data using the helper from _example_data.
from vivarium.public_health._example_data import risk_relative_risk_continuous
rr_data = risk_relative_risk_continuous(
    exposure_min=1, exposure_max=9, rr_min=1.0, rr_max=5.0
)

# TMRED defines the exposure level where RR = 1.
# With min=max=1, the TMREL is exactly 1.0.
tmred = {"distribution": "uniform", "min": 1, "max": 1, "inverted": False}

config = make_base_config()
config.update(
    {
        "population": {"population_size": 10_000},
        "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
    },
    layer="override",
)

risk = ContinuousExposureRisk("risk_factor.test_risk")
effect = NonLogLinearRiskEffect(risk.name, "cause.test_cause.incidence_rate")

sim = InteractiveContext(
    components=[BasePopulation(), risk, effect, SI("test_cause")],
    configuration=config,
    plugin_configuration=base_plugins,
    setup=False,
)

# Write NonLogLinearRiskEffect data to the artifact before setup.
sim._data.write("risk_factor.test_risk.relative_risk", rr_data)
sim._data.write("risk_factor.test_risk.tmred", tmred)
sim._data.write("risk_factor.test_risk.population_attributable_fraction", 0)
sim._data.write("cause.test_cause.incidence_rate", 0.5)

sim.setup()

for _ in range(3):
    sim.step()

# The cached exposure column used by NonLogLinearRiskEffect.
# This name is derived from the risk factor name by the component internally.
exposure_col = "test_risk_exposure_for_non_loglinear_riskeffect"
pop = sim.get_population(["test_cause", exposure_col])

# Verify RR is monotonically increasing: split into quartiles and check
# that infection rate increases with each quartile.
quartiles = pd.qcut(pop[exposure_col], 4, labels=["Q1", "Q2", "Q3", "Q4"])
rates_by_quartile = pop.groupby(quartiles, observed=True).apply(
    lambda g: (g["test_cause"] == "test_cause").sum() / len(g)
)
print(f"Monotonically increasing: {all(rates_by_quartile.diff().dropna() > 0)}")

Monotonically increasing: True

Configuration Summary 

Component	Key configuration options	Artifact data required
`RiskEffect`	`risk_effect.{name}_on_{target}.data_sources.relative_risk`, `risk_effect.{name}_on_{target}.data_sources.population_attributable_fraction`	`risk_factor.{name}.relative_risk`, `risk_factor.{name}.population_attributable_fraction`
`NonLogLinearRiskEffect`	`non_log_linear_risk_effect.{name}_on_{target}.data_sources.relative_risk`	`risk_factor.{name}.relative_risk`, `risk_factor.{name}.tmred`, `risk_factor.{name}.population_attributable_fraction`

Note

For more advanced use cases - including polytomous risks, coverage gaps, alternative risk factors, and parameterized effect distributions - see the Non-standard Risk Exposure and Effect Models tutorial.