============
Risk Effects
============

:mod:`vivarium_public_health` provides components for modeling how risk factor
exposures modify disease outcomes. This tutorial covers **risk effects** -
the components that translate exposure into changes in disease rates.

For how simulants are assigned exposure values, see the :doc:`risk` tutorial.

.. contents::
   :local:
   :depth: 2

.. testsetup:: *

   import numpy as np
   import pandas as pd
   from vivarium import InteractiveContext
   from vivarium_public_health.risks import Risk, RiskEffect, NonLogLinearRiskEffect
   from vivarium_public_health.disease import *
   from vivarium_public_health.population import BasePopulation
   from vivarium_public_health._example_data import *
   base_plugins = BASE_PLUGINS


Overview
--------

A risk effect modifies a *target* rate based on exposure values of that risk.
A *target* is identified by an entity type, entity name, and measure in dotted
form: ``{entity_type}.{entity_name}.{measure}`` (e.g.,
``cause.lung_cancer.incidence_rate``).

There are two effect components:

- :class:`~vivarium_public_health.risks.effect.RiskEffect` - multiplies a
  target rate (e.g., disease incidence) by a relative risk for exposed
  simulants.
- :class:`~vivarium_public_health.risks.effect.NonLogLinearRiskEffect` -
  a variant where relative risk is parameterized by exposure level, using
  piecewise linear interpolation.

A typical risk model pairs a :class:`~vivarium_public_health.risks.base_risk.Risk`
with one or more :class:`~vivarium_public_health.risks.effect.RiskEffect`
components to modify disease rates. The
:class:`~vivarium_public_health.risks.base_risk.Risk` determines *who* is
exposed, and the :class:`~vivarium_public_health.risks.effect.RiskEffect`
determines *how much* that exposure changes the outcome.


Common Setup
------------

Every code example in this tutorial uses imports and helpers shown below.
To run any example in a standalone script, include all of these at the top:

.. testcode::

   from vivarium import InteractiveContext
   from vivarium_public_health.risks import Risk, RiskEffect
   from vivarium_public_health.disease import SI, SIS
   from vivarium_public_health.population import BasePopulation
   from vivarium_public_health._example_data import BASE_PLUGINS, make_base_config

   # BASE_PLUGINS overrides the data plugin to use ExampleArtifactManager,
   # which serves example data from memory instead of requiring a real HDF file.
   base_plugins = BASE_PLUGINS

   # make_base_config() returns a configuration with sensible defaults for
   # time range, step size, and randomness key columns.
   config = make_base_config()


Artifact Data Format
--------------------

This section documents the **key name** and **column layout** that the
risk effect components expect. Risk effect components support the
``data_sources`` configuration pattern that lets you override individual
keys with a scalar, DataFrame, or callable without rebuilding the artifact
(see `Data sources`_).


Data keys
^^^^^^^^^

The table below lists every data key used by the risk effect components.

.. list-table::
   :header-rows: 1

   * - Key
     - Index columns
     - Value columns
     - Used by
     - Configurable?
   * - ``risk_factor.{name}.relative_risk``
     - age, sex, year, parameter, affected_entity, affected_measure
     - ``value`` (relative risk per category)
     - :class:`~vivarium_public_health.risks.effect.RiskEffect`
     - Yes - ``risk_effect.{name}_on_{target}.data_sources.relative_risk``
   * - ``risk_factor.{name}.population_attributable_fraction``
     - age, sex, year, affected_entity, affected_measure
     - ``value`` (fraction)
     - :class:`~vivarium_public_health.risks.effect.RiskEffect`
     - Yes - ``risk_effect.{name}_on_{target}.data_sources.population_attributable_fraction``


Artifact data shapes
^^^^^^^^^^^^^^^^^^^^

The examples below use the data builders from the
:mod:`~vivarium_public_health._example_data` module; a production artifact
has the same column layout but with real GBD values.

.. testcode::

   from vivarium_public_health._example_data import (
       risk_relative_risk_dichotomous,
       risk_paf,
   )

   # risk_factor.{name}.relative_risk - RR per exposure category and target.
   rr = risk_relative_risk_dichotomous(2.0, "test_cause", "incidence_rate")
   print(rr.query("year_start == 1990 and parameter == 'exposed'").head(2).to_string(index=False))

.. testoutput::
   :options: +NORMALIZE_WHITESPACE

    age_start  age_end    sex  year_start  year_end parameter  value affected_entity affected_measure
          0.0 0.019178   Male        1990      1991   exposed    2.0      test_cause   incidence_rate
          0.0 0.019178 Female        1990      1991   exposed    2.0      test_cause   incidence_rate

.. testcode::

   # risk_factor.{name}.population_attributable_fraction - PAF per target.
   paf = risk_paf(0.3, "test_cause", "incidence_rate")
   print(paf.query("year_start == 1990").head(2).to_string(index=False))

.. testoutput::
   :options: +NORMALIZE_WHITESPACE

    age_start  age_end    sex  year_start  year_end affected_entity affected_measure  value
          0.0 0.019178   Male        1990      1991      test_cause   incidence_rate    0.3
          0.0 0.019178 Female        1990      1991      test_cause   incidence_rate    0.3


Data sources
^^^^^^^^^^^^

Risk effect components support a ``data_sources`` configuration pattern that
lets you override individual data keys without rebuilding the artifact. You
can override any key with:

- **Scalar** (int or float) - broadcast a constant value to all simulants.
- **DataFrame** - use the DataFrame directly.
- **Callable** - call the function at setup time to produce the data.
- **Artifact key** (string) - load a different key from the artifact.

:class:`~vivarium_public_health.risks.effect.RiskEffect` declares:

.. code-block:: yaml

   # Default configuration (loaded from the artifact):
   risk_effect.{name}_on_{target}:
     data_sources:
       relative_risk: "risk_factor.{name}.relative_risk"
       population_attributable_fraction: "risk_factor.{name}.population_attributable_fraction"

Any of these can be overridden with scalars in the simulation configuration:

.. code-block:: yaml

   configuration:
     risk_effect.{name}_on_{target_entity}.{target_name}.{target_measure}:
       data_sources:
         relative_risk: 2.0  # scalar, DataFrame, callable, or artifact key
         population_attributable_fraction: 0  # scalar or artifact key

.. note::

   In configuration keys, ``{target}`` expands to the dotted form
   ``{target_entity}.{target_name}.{target_measure}`` (e.g.,
   ``risk_effect.smoking_on_cause.lung_cancer.incidence_rate``).

For dichotomous risks, all data sources can be overridden this way. For
continuous risks, some keys (e.g., ``tmred`` and ``relative_risk_scalar``)
are loaded directly from the artifact and cannot be overridden via
configuration - see the `NonLogLinearRiskEffect`_ section for the
``setup=False`` workaround.


RiskEffect
----------

A :class:`~vivarium_public_health.risks.effect.RiskEffect` modifies disease
dynamics based on exposure. In the standard pattern, exposed simulants have
a higher incidence rate (multiplied by the relative risk) than unexposed
simulants.


Observing the effect
^^^^^^^^^^^^^^^^^^^^

The following example demonstrates that exposed simulants become infected
at a higher rate than unexposed simulants:

.. testcode::

   config = make_base_config()
   config.update(
       {
           "population": {"population_size": 10_000},
           "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
           "risk_factor.test_risk": {
               "distribution_type": "dichotomous",
               "data_sources": {"exposure": 0.5},
           },
           "risk_effect.test_risk_on_cause.test_cause.incidence_rate": {
               "data_sources": {
                   "relative_risk": 5.0,
                   "population_attributable_fraction": 0,
               },
           },
       },
       layer="override",
   )

   sim = InteractiveContext(
       components=[
           BasePopulation(),
           Risk("risk_factor.test_risk"),
           RiskEffect("risk_factor.test_risk", "cause.test_cause.incidence_rate"),
           SI("test_cause"),
       ],
       configuration=config,
       plugin_configuration=base_plugins,
   )

   # Step forward to allow infections to occur.
   for _ in range(3):
       sim.step()

   pop = sim.get_population(["test_cause", "test_risk.exposure"])
   exposed = pop[pop["test_risk.exposure"] == "exposed"]
   unexposed = pop[pop["test_risk.exposure"] == "unexposed"]

   exposed_infection_rate = (exposed["test_cause"] == "test_cause").sum() / len(exposed)
   unexposed_infection_rate = (unexposed["test_cause"] == "test_cause").sum() / len(
       unexposed
   )

   # With RR=5, the ratio of infection rates should be approximately 5.
   ratio = exposed_infection_rate / unexposed_infection_rate
   print(f"Rate ratio near 5: {np.isclose(ratio, 5, rtol=0.15)}")

.. testoutput::

   Rate ratio near 5: True


Multiple risk effects
^^^^^^^^^^^^^^^^^^^^^

A single :class:`~vivarium_public_health.risks.base_risk.Risk` can have
effects on multiple targets, and multiple risks can target the same disease.
Each :class:`~vivarium_public_health.risks.effect.RiskEffect` multiplies the
target rate independently:

.. testcode::

   config = make_base_config()
   config.update(
       {
           "population": {"population_size": 20_000},
           "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
           # First risk: smoking
           "risk_factor.smoking": {
               "distribution_type": "dichotomous",
               "data_sources": {"exposure": 0.3},
           },
           "risk_effect.smoking_on_cause.test_cause.incidence_rate": {
               "data_sources": {
                   "relative_risk": 3.0,
                   "population_attributable_fraction": 0,
               },
           },
           # Second risk: air pollution
           "risk_factor.air_pollution": {
               "distribution_type": "dichotomous",
               "data_sources": {"exposure": 0.7},
           },
           "risk_effect.air_pollution_on_cause.test_cause.incidence_rate": {
               "data_sources": {
                   "relative_risk": 2.0,
                   "population_attributable_fraction": 0,
               },
           },
       },
       layer="override",
   )

   sim = InteractiveContext(
       components=[
           BasePopulation(),
           Risk("risk_factor.smoking"),
           Risk("risk_factor.air_pollution"),
           RiskEffect("risk_factor.smoking", "cause.test_cause.incidence_rate"),
           RiskEffect("risk_factor.air_pollution", "cause.test_cause.incidence_rate"),
           SI("test_cause"),
       ],
       configuration=config,
       plugin_configuration=base_plugins,
   )

   for _ in range(3):
       sim.step()

   pop = sim.get_population(
       ["test_cause", "smoking.exposure", "air_pollution.exposure"]
   )
   # Compute infection rates by exposure group.
   both_exposed = pop[
       (pop["smoking.exposure"] == "exposed")
       & (pop["air_pollution.exposure"] == "exposed")
   ]
   smoking_only = pop[
       (pop["smoking.exposure"] == "exposed")
       & (pop["air_pollution.exposure"] == "unexposed")
   ]
   pollution_only = pop[
       (pop["smoking.exposure"] == "unexposed")
       & (pop["air_pollution.exposure"] == "exposed")
   ]
   neither_exposed = pop[
       (pop["smoking.exposure"] == "unexposed")
       & (pop["air_pollution.exposure"] == "unexposed")
   ]
   both_rate = (both_exposed["test_cause"] == "test_cause").sum() / len(both_exposed)
   smoking_only_rate = (smoking_only["test_cause"] == "test_cause").sum() / len(
       smoking_only
   )
   pollution_only_rate = (pollution_only["test_cause"] == "test_cause").sum() / len(
       pollution_only
   )
   neither_rate = (neither_exposed["test_cause"] == "test_cause").sum() / len(
       neither_exposed
   )

   # Combined RR is 3*2=6, so both-exposed vs neither should have ratio near 6.
   both_ratio = both_rate / neither_rate
   print(f"Both-exposed ratio near 6: {np.isclose(both_ratio, 6, rtol=0.2)}")
   # Smoking-only RR is 3, so ratio should be near 3.
   smoking_ratio = smoking_only_rate / neither_rate
   print(f"Smoking-only ratio near 3: {np.isclose(smoking_ratio, 3, rtol=0.1)}")
   # Air-pollution-only RR is 2, so ratio should be near 2.
   pollution_ratio = pollution_only_rate / neither_rate
   print(f"Pollution-only ratio near 2: {np.isclose(pollution_ratio, 2, rtol=0.1)}")

.. testoutput::

   Both-exposed ratio near 6: True
   Smoking-only ratio near 3: True
   Pollution-only ratio near 2: True


Population Attributable Fraction
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The **population attributable fraction** (PAF) adjusts the baseline rate so
that the population-level rate (after applying relative risks to exposed
simulants) matches the original data. When PAF is 0, the baseline rate is
used as-is; when PAF is greater than 0, the baseline is reduced so that the
population-average rate remains consistent with the input data.

Without PAF correction, adding a risk with RR > 1 inflates the
population-average rate above the input data. With the correct PAF, the
baseline is scaled down to compensate.

.. testcode::

   # Run two simulations: one with PAF=0 and one with PAF=0.3.
   # The PAF simulation should have a lower overall infection rate
   # because the baseline rate is multiplied by (1 - PAF).

   def run_paf_sim(paf_value):
       config = make_base_config()
       config.update(
           {
               "population": {"population_size": 10_000},
               "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
               "risk_factor.test_risk": {
                   "distribution_type": "dichotomous",
                   "data_sources": {"exposure": 0.5},
               },
               "risk_effect.test_risk_on_cause.test_cause.incidence_rate": {
                   "data_sources": {
                       "relative_risk": 2.0,
                       "population_attributable_fraction": paf_value,
                   },
               },
           },
           layer="override",
       )
       sim = InteractiveContext(
           components=[
               BasePopulation(),
               Risk("risk_factor.test_risk"),
               RiskEffect("risk_factor.test_risk", "cause.test_cause.incidence_rate"),
               SI("test_cause"),
           ],
           configuration=config,
           plugin_configuration=base_plugins,
       )
       for _ in range(3):
           sim.step()
       pop = sim.get_population(["test_cause"])
       return (pop["test_cause"] == "test_cause").sum() / len(pop)

   rate_no_paf = run_paf_sim(0.0)
   rate_with_paf = run_paf_sim(0.3)

   # PAF > 0 reduces the baseline, so the overall population infection rate
   # should be lower than without PAF.
   print(f"PAF reduces population rate: {rate_with_paf < rate_no_paf}")

   # The ratio of rates should be approximately (1 - PAF) = 0.7.
   ratio = rate_with_paf / rate_no_paf
   print(f"Rate ratio near (1 - PAF): {np.isclose(ratio, 0.7, atol=0.05)}")

.. testoutput::

   PAF reduces population rate: True
   Rate ratio near (1 - PAF): True


NonLogLinearRiskEffect
----------------------

A :class:`~vivarium_public_health.risks.effect.NonLogLinearRiskEffect` models
the relationship between a **continuous** exposure and a target rate using
piecewise linear interpolation. Unlike ``RiskEffect`` (which applies a single
RR to exposed simulants), this component assigns each simulant an
individually interpolated RR based on their actual exposure level.

The relative risk data must contain a numeric ``parameter`` column with
exposure thresholds (typically 1000 values spanning the plausible range)
and corresponding ``value`` entries. The component also requires TMRED
(Theoretical Minimum-Risk Exposure Distribution) data, which defines the
exposure level at which relative risk equals 1.


Building relative risk data
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The RR data for ``NonLogLinearRiskEffect`` has a numeric ``parameter``
column (exposure thresholds) rather than categorical labels:

.. testcode::

   # Build RR data: 1000 exposure thresholds from 1 to 9.
   # RR increases linearly from 1.0 (at exposure=1) to 5.0 (at exposure=9).
   from vivarium_public_health._example_data import risk_relative_risk_continuous

   rr_data = risk_relative_risk_continuous(
       exposure_min=1, exposure_max=9, rr_min=1.0, rr_max=5.0
   )

   print(f"RR data shape: {rr_data.shape}")
   print(f"Parameter range: {rr_data['parameter'].min():.1f} to {rr_data['parameter'].max():.1f}")
   print(f"RR range: {rr_data['value'].min():.1f} to {rr_data['value'].max():.1f}")

.. testoutput::

   RR data shape: (1000, 6)
   Parameter range: 1.0 to 9.0
   RR range: 1.0 to 5.0


Running with continuous exposure
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``NonLogLinearRiskEffect`` requires a continuous exposure pipeline.
Because it uses piecewise linear interpolation over numeric exposure
values, we need a ``Risk`` subclass that produces numeric exposures
rather than categorical ones. The simulation is created with
``setup=False`` so that RR and TMRED data can be written to the artifact
before components initialize:

.. note::

   ``ContinuousExposureRisk`` is a minimal demo shortcut that bypasses the
   parent ``Risk`` machinery (propensity, distribution lookup, framework
   randomness). Production continuous risks should use ``Risk`` directly
   with a continuous ``distribution_type``.

.. note::

   ``sim._data`` is an internal API. The ``setup=False`` pattern followed
   by ``sim._data.write()`` is specific to interactive and tutorial
   contexts where data must be injected before component setup.

.. testcode::

   class ContinuousExposureRisk(Risk):
       """A Risk that assigns random numeric exposures between 1 and 9."""

       def setup(self, builder):
           self.distribution_type = None
           col = f"{self.causal_factor.name}_exposure_for_non_loglinear_riskeffect"
           self._col = col
           builder.value.register_attribute_producer(
               f"{self.causal_factor.name}.exposure", source=self._get_exposure
           )
           builder.population.register_initializer(
               initializer=self._init_exposure, columns=col
           )

       def _init_exposure(self, pop_data):
           rng = np.random.default_rng(12345)
           values = rng.uniform(1, 9, size=len(pop_data.index))
           self.population_view.initialize(
               pd.Series(values, index=pop_data.index, name=self._col)
           )

       def _get_exposure(self, index):
           return self.population_view.get(index, self._col)

       def on_time_step_prepare(self, event):
           pass

   # Build RR data using the helper from _example_data.
   from vivarium_public_health._example_data import risk_relative_risk_continuous
   rr_data = risk_relative_risk_continuous(
       exposure_min=1, exposure_max=9, rr_min=1.0, rr_max=5.0
   )

   # TMRED defines the exposure level where RR = 1.
   # With min=max=1, the TMREL is exactly 1.0.
   tmred = {"distribution": "uniform", "min": 1, "max": 1, "inverted": False}

   config = make_base_config()
   config.update(
       {
           "population": {"population_size": 10_000},
           "mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
       },
       layer="override",
   )

   risk = ContinuousExposureRisk("risk_factor.test_risk")
   effect = NonLogLinearRiskEffect(risk.name, "cause.test_cause.incidence_rate")

   sim = InteractiveContext(
       components=[BasePopulation(), risk, effect, SI("test_cause")],
       configuration=config,
       plugin_configuration=base_plugins,
       setup=False,
   )

   # Write NonLogLinearRiskEffect data to the artifact before setup.
   sim._data.write("risk_factor.test_risk.relative_risk", rr_data)
   sim._data.write("risk_factor.test_risk.tmred", tmred)
   sim._data.write("risk_factor.test_risk.population_attributable_fraction", 0)
   sim._data.write("cause.test_cause.incidence_rate", 0.5)

   sim.setup()

   for _ in range(3):
       sim.step()

   # The cached exposure column used by NonLogLinearRiskEffect.
   # This name is derived from the risk factor name by the component internally.
   exposure_col = "test_risk_exposure_for_non_loglinear_riskeffect"
   pop = sim.get_population(["test_cause", exposure_col])

   # Verify RR is monotonically increasing: split into quartiles and check
   # that infection rate increases with each quartile.
   quartiles = pd.qcut(pop[exposure_col], 4, labels=["Q1", "Q2", "Q3", "Q4"])
   rates_by_quartile = pop.groupby(quartiles, observed=True).apply(
       lambda g: (g["test_cause"] == "test_cause").sum() / len(g)
   )
   print(f"Monotonically increasing: {all(rates_by_quartile.diff().dropna() > 0)}")

.. testoutput::

   Monotonically increasing: True


Configuration Summary
---------------------

.. list-table::
   :header-rows: 1

   * - Component
     - Key configuration options
     - Artifact data required
   * - ``RiskEffect``
     - ``risk_effect.{name}_on_{target}.data_sources.relative_risk``,
       ``risk_effect.{name}_on_{target}.data_sources.population_attributable_fraction``
     - ``risk_factor.{name}.relative_risk``,
       ``risk_factor.{name}.population_attributable_fraction``
   * - ``NonLogLinearRiskEffect``
     - ``non_log_linear_risk_effect.{name}_on_{target}.data_sources.relative_risk``
     - ``risk_factor.{name}.relative_risk``,
       ``risk_factor.{name}.tmred``,
       ``risk_factor.{name}.population_attributable_fraction``

.. note::

   For more advanced use cases - including polytomous risks, coverage gaps,
   alternative risk factors, and parameterized effect distributions - see
   the :doc:`non_standard_risk` tutorial.