Risk Effects

vivarium_public_health provides components for modeling how risk factor exposures modify disease outcomes. This tutorial covers risk effects - the components that translate exposure into changes in disease rates.

For how simulants are assigned exposure values, see the Risk Exposure tutorial.

Overview

A risk effect modifies a target rate based on exposure values of that risk. A target is identified by an entity type, entity name, and measure in dotted form: {entity_type}.{entity_name}.{measure} (e.g., cause.lung_cancer.incidence_rate).

There are two effect components:

  • RiskEffect - multiplies a target rate (e.g., disease incidence) by a relative risk for exposed simulants.

  • NonLogLinearRiskEffect - a variant where relative risk is parameterized by exposure level, using piecewise linear interpolation.

A typical risk model pairs a Risk with one or more RiskEffect components to modify disease rates. The Risk determines who is exposed, and the RiskEffect determines how much that exposure changes the outcome.

Common Setup

Every code example in this tutorial uses imports and helpers shown below. To run any example in a standalone script, include all of these at the top:

from vivarium import InteractiveContext
from vivarium_public_health.risks import Risk, RiskEffect
from vivarium_public_health.disease import SI, SIS
from vivarium_public_health.population import BasePopulation
from vivarium_public_health._example_data import BASE_PLUGINS, make_base_config

# BASE_PLUGINS overrides the data plugin to use ExampleArtifactManager,
# which serves example data from memory instead of requiring a real HDF file.
base_plugins = BASE_PLUGINS

# make_base_config() returns a configuration with sensible defaults for
# time range, step size, and randomness key columns.
config = make_base_config()

Artifact Data Format

This section documents the key name and column layout that the risk effect components expect. Risk effect components support the data_sources configuration pattern that lets you override individual keys with a scalar, DataFrame, or callable without rebuilding the artifact (see Data sources).

Data keys

The table below lists every data key used by the risk effect components.

Key

Index columns

Value columns

Used by

Configurable?

risk_factor.{name}.relative_risk

age, sex, year, parameter, affected_entity, affected_measure

value (relative risk per category)

RiskEffect

Yes - risk_effect.{name}_on_{target}.data_sources.relative_risk

risk_factor.{name}.population_attributable_fraction

age, sex, year, affected_entity, affected_measure

value (fraction)

RiskEffect

Yes - risk_effect.{name}_on_{target}.data_sources.population_attributable_fraction

Artifact data shapes

The examples below use the data builders from the _example_data module; a production artifact has the same column layout but with real GBD values.

from vivarium_public_health._example_data import (
    risk_relative_risk_dichotomous,
    risk_paf,
)

# risk_factor.{name}.relative_risk - RR per exposure category and target.
rr = risk_relative_risk_dichotomous(2.0, "test_cause", "incidence_rate")
print(rr.query("year_start == 1990 and parameter == 'exposed'").head(2).to_string(index=False))
 age_start  age_end    sex  year_start  year_end parameter  value affected_entity affected_measure
       0.0 0.019178   Male        1990      1991   exposed    2.0      test_cause   incidence_rate
       0.0 0.019178 Female        1990      1991   exposed    2.0      test_cause   incidence_rate
# risk_factor.{name}.population_attributable_fraction - PAF per target.
paf = risk_paf(0.3, "test_cause", "incidence_rate")
print(paf.query("year_start == 1990").head(2).to_string(index=False))
 age_start  age_end    sex  year_start  year_end affected_entity affected_measure  value
       0.0 0.019178   Male        1990      1991      test_cause   incidence_rate    0.3
       0.0 0.019178 Female        1990      1991      test_cause   incidence_rate    0.3

Data sources

Risk effect components support a data_sources configuration pattern that lets you override individual data keys without rebuilding the artifact. You can override any key with:

  • Scalar (int or float) - broadcast a constant value to all simulants.

  • DataFrame - use the DataFrame directly.

  • Callable - call the function at setup time to produce the data.

  • Artifact key (string) - load a different key from the artifact.

RiskEffect declares:

# Default configuration (loaded from the artifact):
risk_effect.{name}_on_{target}:
  data_sources:
    relative_risk: "risk_factor.{name}.relative_risk"
    population_attributable_fraction: "risk_factor.{name}.population_attributable_fraction"

Any of these can be overridden with scalars in the simulation configuration:

configuration:
  risk_effect.{name}_on_{target_entity}.{target_name}.{target_measure}:
    data_sources:
      relative_risk: 2.0  # scalar, DataFrame, callable, or artifact key
      population_attributable_fraction: 0  # scalar or artifact key

Note

In configuration keys, {target} expands to the dotted form {target_entity}.{target_name}.{target_measure} (e.g., risk_effect.smoking_on_cause.lung_cancer.incidence_rate).

For dichotomous risks, all data sources can be overridden this way. For continuous risks, some keys (e.g., tmred and relative_risk_scalar) are loaded directly from the artifact and cannot be overridden via configuration - see the NonLogLinearRiskEffect section for the setup=False workaround.

RiskEffect

A RiskEffect modifies disease dynamics based on exposure. In the standard pattern, exposed simulants have a higher incidence rate (multiplied by the relative risk) than unexposed simulants.

Observing the effect

The following example demonstrates that exposed simulants become infected at a higher rate than unexposed simulants:

config = make_base_config()
config.update(
    {
        "population": {"population_size": 10_000},
        "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
        "risk_factor.test_risk": {
            "distribution_type": "dichotomous",
            "data_sources": {"exposure": 0.5},
        },
        "risk_effect.test_risk_on_cause.test_cause.incidence_rate": {
            "data_sources": {
                "relative_risk": 5.0,
                "population_attributable_fraction": 0,
            },
        },
    },
    layer="override",
)

sim = InteractiveContext(
    components=[
        BasePopulation(),
        Risk("risk_factor.test_risk"),
        RiskEffect("risk_factor.test_risk", "cause.test_cause.incidence_rate"),
        SI("test_cause"),
    ],
    configuration=config,
    plugin_configuration=base_plugins,
)

# Step forward to allow infections to occur.
for _ in range(3):
    sim.step()

pop = sim.get_population(["test_cause", "test_risk.exposure"])
exposed = pop[pop["test_risk.exposure"] == "exposed"]
unexposed = pop[pop["test_risk.exposure"] == "unexposed"]

exposed_infection_rate = (exposed["test_cause"] == "test_cause").sum() / len(exposed)
unexposed_infection_rate = (unexposed["test_cause"] == "test_cause").sum() / len(
    unexposed
)

# With RR=5, the ratio of infection rates should be approximately 5.
ratio = exposed_infection_rate / unexposed_infection_rate
print(f"Rate ratio near 5: {np.isclose(ratio, 5, rtol=0.15)}")
Rate ratio near 5: True

Multiple risk effects

A single Risk can have effects on multiple targets, and multiple risks can target the same disease. Each RiskEffect multiplies the target rate independently:

config = make_base_config()
config.update(
    {
        "population": {"population_size": 20_000},
        "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
        # First risk: smoking
        "risk_factor.smoking": {
            "distribution_type": "dichotomous",
            "data_sources": {"exposure": 0.3},
        },
        "risk_effect.smoking_on_cause.test_cause.incidence_rate": {
            "data_sources": {
                "relative_risk": 3.0,
                "population_attributable_fraction": 0,
            },
        },
        # Second risk: air pollution
        "risk_factor.air_pollution": {
            "distribution_type": "dichotomous",
            "data_sources": {"exposure": 0.7},
        },
        "risk_effect.air_pollution_on_cause.test_cause.incidence_rate": {
            "data_sources": {
                "relative_risk": 2.0,
                "population_attributable_fraction": 0,
            },
        },
    },
    layer="override",
)

sim = InteractiveContext(
    components=[
        BasePopulation(),
        Risk("risk_factor.smoking"),
        Risk("risk_factor.air_pollution"),
        RiskEffect("risk_factor.smoking", "cause.test_cause.incidence_rate"),
        RiskEffect("risk_factor.air_pollution", "cause.test_cause.incidence_rate"),
        SI("test_cause"),
    ],
    configuration=config,
    plugin_configuration=base_plugins,
)

for _ in range(3):
    sim.step()

pop = sim.get_population(
    ["test_cause", "smoking.exposure", "air_pollution.exposure"]
)
# Compute infection rates by exposure group.
both_exposed = pop[
    (pop["smoking.exposure"] == "exposed")
    & (pop["air_pollution.exposure"] == "exposed")
]
smoking_only = pop[
    (pop["smoking.exposure"] == "exposed")
    & (pop["air_pollution.exposure"] == "unexposed")
]
pollution_only = pop[
    (pop["smoking.exposure"] == "unexposed")
    & (pop["air_pollution.exposure"] == "exposed")
]
neither_exposed = pop[
    (pop["smoking.exposure"] == "unexposed")
    & (pop["air_pollution.exposure"] == "unexposed")
]
both_rate = (both_exposed["test_cause"] == "test_cause").sum() / len(both_exposed)
smoking_only_rate = (smoking_only["test_cause"] == "test_cause").sum() / len(
    smoking_only
)
pollution_only_rate = (pollution_only["test_cause"] == "test_cause").sum() / len(
    pollution_only
)
neither_rate = (neither_exposed["test_cause"] == "test_cause").sum() / len(
    neither_exposed
)

# Combined RR is 3*2=6, so both-exposed vs neither should have ratio near 6.
both_ratio = both_rate / neither_rate
print(f"Both-exposed ratio near 6: {np.isclose(both_ratio, 6, rtol=0.2)}")
# Smoking-only RR is 3, so ratio should be near 3.
smoking_ratio = smoking_only_rate / neither_rate
print(f"Smoking-only ratio near 3: {np.isclose(smoking_ratio, 3, rtol=0.1)}")
# Air-pollution-only RR is 2, so ratio should be near 2.
pollution_ratio = pollution_only_rate / neither_rate
print(f"Pollution-only ratio near 2: {np.isclose(pollution_ratio, 2, rtol=0.1)}")
Both-exposed ratio near 6: True
Smoking-only ratio near 3: True
Pollution-only ratio near 2: True

Population Attributable Fraction

The population attributable fraction (PAF) adjusts the baseline rate so that the population-level rate (after applying relative risks to exposed simulants) matches the original data. When PAF is 0, the baseline rate is used as-is; when PAF is greater than 0, the baseline is reduced so that the population-average rate remains consistent with the input data.

Without PAF correction, adding a risk with RR > 1 inflates the population-average rate above the input data. With the correct PAF, the baseline is scaled down to compensate.

# Run two simulations: one with PAF=0 and one with PAF=0.3.
# The PAF simulation should have a lower overall infection rate
# because the baseline rate is multiplied by (1 - PAF).

def run_paf_sim(paf_value):
    config = make_base_config()
    config.update(
        {
            "population": {"population_size": 10_000},
            "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
            "risk_factor.test_risk": {
                "distribution_type": "dichotomous",
                "data_sources": {"exposure": 0.5},
            },
            "risk_effect.test_risk_on_cause.test_cause.incidence_rate": {
                "data_sources": {
                    "relative_risk": 2.0,
                    "population_attributable_fraction": paf_value,
                },
            },
        },
        layer="override",
    )
    sim = InteractiveContext(
        components=[
            BasePopulation(),
            Risk("risk_factor.test_risk"),
            RiskEffect("risk_factor.test_risk", "cause.test_cause.incidence_rate"),
            SI("test_cause"),
        ],
        configuration=config,
        plugin_configuration=base_plugins,
    )
    for _ in range(3):
        sim.step()
    pop = sim.get_population(["test_cause"])
    return (pop["test_cause"] == "test_cause").sum() / len(pop)

rate_no_paf = run_paf_sim(0.0)
rate_with_paf = run_paf_sim(0.3)

# PAF > 0 reduces the baseline, so the overall population infection rate
# should be lower than without PAF.
print(f"PAF reduces population rate: {rate_with_paf < rate_no_paf}")

# The ratio of rates should be approximately (1 - PAF) = 0.7.
ratio = rate_with_paf / rate_no_paf
print(f"Rate ratio near (1 - PAF): {np.isclose(ratio, 0.7, atol=0.05)}")
PAF reduces population rate: True
Rate ratio near (1 - PAF): True

NonLogLinearRiskEffect

A NonLogLinearRiskEffect models the relationship between a continuous exposure and a target rate using piecewise linear interpolation. Unlike RiskEffect (which applies a single RR to exposed simulants), this component assigns each simulant an individually interpolated RR based on their actual exposure level.

The relative risk data must contain a numeric parameter column with exposure thresholds (typically 1000 values spanning the plausible range) and corresponding value entries. The component also requires TMRED (Theoretical Minimum-Risk Exposure Distribution) data, which defines the exposure level at which relative risk equals 1.

Building relative risk data

The RR data for NonLogLinearRiskEffect has a numeric parameter column (exposure thresholds) rather than categorical labels:

# Build RR data: 1000 exposure thresholds from 1 to 9.
# RR increases linearly from 1.0 (at exposure=1) to 5.0 (at exposure=9).
from vivarium_public_health._example_data import risk_relative_risk_continuous

rr_data = risk_relative_risk_continuous(
    exposure_min=1, exposure_max=9, rr_min=1.0, rr_max=5.0
)

print(f"RR data shape: {rr_data.shape}")
print(f"Parameter range: {rr_data['parameter'].min():.1f} to {rr_data['parameter'].max():.1f}")
print(f"RR range: {rr_data['value'].min():.1f} to {rr_data['value'].max():.1f}")
RR data shape: (1000, 6)
Parameter range: 1.0 to 9.0
RR range: 1.0 to 5.0

Running with continuous exposure

NonLogLinearRiskEffect requires a continuous exposure pipeline. Because it uses piecewise linear interpolation over numeric exposure values, we need a Risk subclass that produces numeric exposures rather than categorical ones. The simulation is created with setup=False so that RR and TMRED data can be written to the artifact before components initialize:

Note

ContinuousExposureRisk is a minimal demo shortcut that bypasses the parent Risk machinery (propensity, distribution lookup, framework randomness). Production continuous risks should use Risk directly with a continuous distribution_type.

Note

sim._data is an internal API. The setup=False pattern followed by sim._data.write() is specific to interactive and tutorial contexts where data must be injected before component setup.

class ContinuousExposureRisk(Risk):
    """A Risk that assigns random numeric exposures between 1 and 9."""

    def setup(self, builder):
        self.distribution_type = None
        col = f"{self.causal_factor.name}_exposure_for_non_loglinear_riskeffect"
        self._col = col
        builder.value.register_attribute_producer(
            f"{self.causal_factor.name}.exposure", source=self._get_exposure
        )
        builder.population.register_initializer(
            initializer=self._init_exposure, columns=col
        )

    def _init_exposure(self, pop_data):
        rng = np.random.default_rng(12345)
        values = rng.uniform(1, 9, size=len(pop_data.index))
        self.population_view.initialize(
            pd.Series(values, index=pop_data.index, name=self._col)
        )

    def _get_exposure(self, index):
        return self.population_view.get(index, self._col)

    def on_time_step_prepare(self, event):
        pass

# Build RR data using the helper from _example_data.
from vivarium_public_health._example_data import risk_relative_risk_continuous
rr_data = risk_relative_risk_continuous(
    exposure_min=1, exposure_max=9, rr_min=1.0, rr_max=5.0
)

# TMRED defines the exposure level where RR = 1.
# With min=max=1, the TMREL is exactly 1.0.
tmred = {"distribution": "uniform", "min": 1, "max": 1, "inverted": False}

config = make_base_config()
config.update(
    {
        "population": {"population_size": 10_000},
        "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
    },
    layer="override",
)

risk = ContinuousExposureRisk("risk_factor.test_risk")
effect = NonLogLinearRiskEffect(risk.name, "cause.test_cause.incidence_rate")

sim = InteractiveContext(
    components=[BasePopulation(), risk, effect, SI("test_cause")],
    configuration=config,
    plugin_configuration=base_plugins,
    setup=False,
)

# Write NonLogLinearRiskEffect data to the artifact before setup.
sim._data.write("risk_factor.test_risk.relative_risk", rr_data)
sim._data.write("risk_factor.test_risk.tmred", tmred)
sim._data.write("risk_factor.test_risk.population_attributable_fraction", 0)
sim._data.write("cause.test_cause.incidence_rate", 0.5)

sim.setup()

for _ in range(3):
    sim.step()

# The cached exposure column used by NonLogLinearRiskEffect.
# This name is derived from the risk factor name by the component internally.
exposure_col = "test_risk_exposure_for_non_loglinear_riskeffect"
pop = sim.get_population(["test_cause", exposure_col])

# Verify RR is monotonically increasing: split into quartiles and check
# that infection rate increases with each quartile.
quartiles = pd.qcut(pop[exposure_col], 4, labels=["Q1", "Q2", "Q3", "Q4"])
rates_by_quartile = pop.groupby(quartiles, observed=True).apply(
    lambda g: (g["test_cause"] == "test_cause").sum() / len(g)
)
print(f"Monotonically increasing: {all(rates_by_quartile.diff().dropna() > 0)}")
Monotonically increasing: True

Configuration Summary

Component

Key configuration options

Artifact data required

RiskEffect

risk_effect.{name}_on_{target}.data_sources.relative_risk, risk_effect.{name}_on_{target}.data_sources.population_attributable_fraction

risk_factor.{name}.relative_risk, risk_factor.{name}.population_attributable_fraction

NonLogLinearRiskEffect

non_log_linear_risk_effect.{name}_on_{target}.data_sources.relative_risk

risk_factor.{name}.relative_risk, risk_factor.{name}.tmred, risk_factor.{name}.population_attributable_fraction

Note

For more advanced use cases - including polytomous risks, coverage gaps, alternative risk factors, and parameterized effect distributions - see the Non-standard Risk Exposure and Effect Models tutorial.