Population Structures and Fertility
vivarium.public_health provides several components for creating and
managing simulated populations. This tutorial demonstrates the minimal
configuration required for each approach.
Overview
There are two categories of population components:
Initial population components create the starting set of simulants when the simulation begins:
BasePopulation- the standard component that samples simulants from demographic data.ScaledPopulation- a variant that rescales the demographic data before sampling.
Fertility components add new simulants during the simulation:
FertilityDeterministic- adds a fixed number of births per year.FertilityCrudeBirthRate- adds births based on a population-level crude birth rate without accounting for the age or sex composition of the population.FertilityAgeSpecificRates- adds births at the individual level based on age-specific fertility rates.
Note
BasePopulation
includes three sub-components:
Mortality,
AgeOutSimulants, and Disability. You do not need to add these.
Common Setup
Population components load their data through the data_sources
configuration pattern (see Data sources). The Expected Data Layout
section shows the key names and column layouts for every data key so that
you know exactly what format your data should have.
Every code example in this tutorial uses two helpers imported from
vivarium.public_health._example_data:
from vivarium.public_health._example_data import BASE_PLUGINS, make_base_config
# BASE_PLUGINS configures the data plugin to serve example data from memory.
# Pass it as plugin_configuration to InteractiveContext.
base_plugins = BASE_PLUGINS
# make_base_config() returns a configuration with sensible defaults for
# time range, step size, and randomness key columns.
config = make_base_config()
Expected Data Layout
This section documents the key name and column layout that each
population component expects. Some components also support a
data_sources configuration pattern that lets you override individual
keys with a scalar, DataFrame, or callable (see Data sources).
Data keys
The table below lists every data key used by the population components.
All of them can be overridden in the data_sources section of the
configuration (see Data sources); the data key shown is simply the
default.
Key |
Index columns |
Value columns |
Used by |
Configuration override |
|---|---|---|---|---|
|
age, sex, year, location |
|
|
|
|
(scalar) |
A string (e.g. |
|
|
|
age, sex, year |
|
|
|
|
age |
|
|
|
|
year, sex, |
|
|
|
|
age, sex, year, |
|
|
Data sources
Some components support a data_sources configuration pattern that lets
you override individual data keys. This is especially useful during
development or for simple tutorial examples like the ones in this page.
Components that support it declare their data needs in
configuration_defaults; by default each key points to the
corresponding data key string. You can override any of them with:
Scalar (int or float) - broadcast a constant value to all simulants.
DataFrame - use the DataFrame directly.
Callable - call the function at setup time to produce the data.
Data key (string) - load a different key from the data plugin.
For example, Mortality declares
three configurable data sources:
# Default configuration (loads from the data plugin):
mortality:
data_sources:
all_cause_mortality_rate: "cause.all_causes.cause_specific_mortality_rate"
life_expectancy: "population.theoretical_minimum_risk_life_expectancy"
unmodeled_cause_specific_mortality_rate: <internal method>
Note
The unmodeled_cause_specific_mortality_rate default is shown as
<internal method> because it is a bound Python method that cannot be
expressed in YAML.
Any of these can be overridden in the simulation configuration:
# Override with a scalar - no data key lookup needed:
configuration:
mortality:
data_sources:
all_cause_mortality_rate: 0.01
life_expectancy: 80.0
The component sections below show the first few rows of the data each component expects, so you can see the concrete layout.
BasePopulation
BasePopulation is the standard way
to create an initial population. It loads a population structure and samples
simulants whose age, sex, and location distributions match the source data.
Data consumed by BasePopulation
BasePopulation and its sub-components use the following data. The examples
below show the expected column layout. The data builders come from
_example_data.
from vivarium.public_health._example_data import (
population_structure,
theoretical_minimum_risk_life_expectancy,
)
# population.structure - population counts per demographic cell.
pop_structure = population_structure()
print(pop_structure.query("year_start == 1990").head(6).to_string(index=False))
age_start age_end sex year_start year_end location value
0.000000 0.019178 Male 1990 1991 Kenya 1.917808
0.000000 0.019178 Female 1990 1991 Kenya 1.917808
0.019178 0.076712 Male 1990 1991 Kenya 5.753425
0.019178 0.076712 Female 1990 1991 Kenya 5.753425
0.076712 1.000000 Male 1990 1991 Kenya 92.328767
0.076712 1.000000 Female 1990 1991 Kenya 92.328767
# population.location - a scalar string identifying the simulated location.
# In the example data this is the string "Kenya".
# population.theoretical_minimum_risk_life_expectancy - remaining life
# expectancy by age, used by the Mortality sub-component to compute years
# of life lost. Indexed only by age (no sex, year, or location).
tmrle = theoretical_minimum_risk_life_expectancy()
print(tmrle.head(5).to_string(index=False))
age_start age_end value
0.0 1.0 98.0
1.0 2.0 98.0
2.0 3.0 98.0
3.0 4.0 98.0
4.0 5.0 98.0
Overriding the default data
By default, every data source loads from the artifact (configured via
BASE_PLUGINS in these examples). You can bypass the artifact and supply
data directly through data_sources - pass a DataFrame, callable, or
literal string:
import pandas as pd
from vivarium.engine import InteractiveContext
from vivarium.public_health.population import BasePopulation
from vivarium.public_health._example_data import population_structure
# Build population structure data (same layout as the data key).
pop_data = population_structure()
config = make_base_config()
config.update(
{
"population": {
"population_size": 5_000,
"data_sources": {
"population_structure": pop_data,
"location": "Kenya",
},
},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation()],
configuration=config,
plugin_configuration=base_plugins,
)
pop = sim.get_population(["age", "sex", "location"])
assert len(pop) == 5_000
assert (pop["location"] == "Kenya").all()
print(f"Population: {len(pop)}, location: {pop['location'].iloc[0]}")
Population: 5000, location: Kenya
Default configuration
The absolute minimum is a population_size. Everything else has sensible
defaults (ages 0-125, both sexes, no age-out):
from vivarium.engine import InteractiveContext
from vivarium.public_health.population import BasePopulation
config = make_base_config()
config.update(
{
"population": {
"population_size": 10_000,
},
# Override mortality to zero so simulants don't die during
# this demonstration.
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation()],
configuration=config,
plugin_configuration=base_plugins,
)
pop = sim.get_population(["age", "sex", "location"])
assert len(pop) == 10_000
assert pop["age"].min() >= 0
assert pop["age"].max() <= 125
assert set(pop["sex"].unique()) == {"Male", "Female"}
print(f"Population: {len(pop)}")
Population: 10000
Custom age range
Use initialization_age_min and initialization_age_max to restrict the
age range of the initial population. This is the most common customization:
config = make_base_config()
config.update(
{
"population": {
"population_size": 10_000,
"initialization_age_min": 0,
"initialization_age_max": 5,
},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation()],
configuration=config,
plugin_configuration=base_plugins,
)
pop = sim.get_population(["age"])
assert pop["age"].min() >= 0
assert pop["age"].max() < 5
print(f"All ages in [0, 5): {pop['age'].min() >= 0 and pop['age'].max() < 5}")
All ages in [0, 5): True
Single-age initialization (newborns)
When initialization_age_min equals initialization_age_max, all
simulants start at the same age. This can be used with fertility
components to represent a cohort of newborns:
config = make_base_config()
config.update(
{
"population": {
"population_size": 1_000,
"initialization_age_min": 0,
"initialization_age_max": 0,
},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation()],
configuration=config,
plugin_configuration=base_plugins,
)
pop = sim.get_population(["age"])
# All simulants are newborns; ages are smoothed within the first time step.
assert pop["age"].max() < 1.0
print(f"All simulants under 1 year old: {pop['age'].max() < 1.0}")
All simulants under 1 year old: True
Filtering by sex
The include_sex option restricts the population to a single sex. Valid
values are "Male", "Female", or "Both" (the default):
config = make_base_config()
config.update(
{
"population": {
"population_size": 10_000,
"include_sex": "Female",
},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation()],
configuration=config,
plugin_configuration=base_plugins,
)
pop = sim.get_population(["sex"])
assert (pop["sex"] == "Female").all()
print(f"All Female: {len(pop)}")
All Female: 10000
Aging out of a simulation
Setting untracking_age causes simulants to be removed from the tracked
population once they reach that age (see the
vivarium population concepts
documentation for more on untracking). This is useful when a model only
cares about a specific age window. The is_aged_out column is populated
by the AgeOutSimulants sub-component when untracking_age is set:
config = make_base_config()
config.update(
{
"population": {
"population_size": 10_000,
"initialization_age_min": 4,
"initialization_age_max": 4,
"untracking_age": 5,
},
"time": {"step_size": 100},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation()],
configuration=config,
plugin_configuration=base_plugins,
)
# All 4-year-olds at the start
print(f"Tracked: {len(sim.get_population(['age']))}")
Tracked: 10000
# After taking 6 steps of 100 days (~1.6 years), everyone has aged past 5
sim.take_steps(number_of_steps=6)
pop = sim.get_population(["is_aged_out", "exit_time"], include_untracked=True)
print(f"Aged out: {pop['is_aged_out'].sum()}")
Aged out: 10000
Configuration summary for BasePopulation
Key |
Default |
Description |
|---|---|---|
|
10000 |
Number of simulants to create. |
|
internal method (loads |
Population structure data. Accepts a DataFrame, callable, or data key. |
|
internal method (loads |
Location string. Accepts a scalar string, callable, or data key. |
|
0 |
Minimum age (years) for the initial population. |
|
125 |
Maximum age (years) for the initial population. |
|
|
|
|
|
Age at which simulants are removed from the tracked population.
|
|
|
All-cause mortality rate. Accepts a scalar, DataFrame, callable, or data key. |
|
|
Remaining life expectancy by age. Accepts a scalar, DataFrame, callable, or data key. |
|
internal method |
CSMR for unmodeled causes. Accepts a scalar, DataFrame, callable, or data key. |
ScaledPopulation
ScaledPopulation works like
BasePopulation but multiplies
the population structure by a scaling factor before sampling. This is useful
when simulants represent a subset of the real population (for example, only
the population eligible for an intervention).
The scaling factor can be either a pandas.DataFrame with the same
demographic index as the population structure, or a string data key that
resolves to such a DataFrame.
ScaledPopulation uses the same data sources as BasePopulation (see
Data consumed by BasePopulation) plus a user-supplied scaling
factor. The scaling factor can be passed as a pandas.DataFrame
directly or as a string data key:
import numpy as np
import pandas as pd
from vivarium.engine import InteractiveContext
from vivarium.public_health.population import ScaledPopulation
from vivarium.public_health._example_data import population_structure
config = make_base_config()
config.update(
{
"population": {
"population_size": 100_000,
"include_sex": "Both",
},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
},
layer="override",
)
# Build a scaling factor DataFrame - same demographic index as
# population.structure, with a ``value`` per cell. Each cell's
# population count is multiplied by its scaling value.
scalar_data = (
population_structure()
.query("year_start == 1990")
.drop(columns=["location"])
.copy()
)
scalar_data["value"] = np.linspace(0.5, 2.0, len(scalar_data))
print(scalar_data.head(6).to_string(index=False))
age_start age_end sex year_start year_end value
0.000000 0.019178 Male 1990 1991 0.500000
0.000000 0.019178 Female 1990 1991 0.533333
0.019178 0.076712 Male 1990 1991 0.566667
0.019178 0.076712 Female 1990 1991 0.600000
0.076712 1.000000 Male 1990 1991 0.633333
0.076712 1.000000 Female 1990 1991 0.666667
# Pass the DataFrame directly.
sim = InteractiveContext(
components=[ScaledPopulation(scalar_data)],
configuration=config,
plugin_configuration=base_plugins,
)
pop = sim.get_population(["age", "sex"])
assert len(pop) == 100_000
assert set(pop["sex"].unique()) == {"Male", "Female"}
print(f"Population: {len(pop)}, sexes: {sorted(pop['sex'].unique())}")
Population: 100000, sexes: ['Female', 'Male']
Fertility Components
Fertility components add new simulants during the simulation to model births.
They are paired with a population component such as
BasePopulation.
Note
All three fertility components create newborns with age_start=0 and
age_end=0, meaning new simulants enter the simulation as newborns.
FertilityDeterministic
FertilityDeterministic adds a
fixed number of new simulants each year. This is the simplest fertility model
and does not require any external data.
from vivarium.engine import InteractiveContext
from vivarium.public_health.population import BasePopulation, FertilityDeterministic
config = make_base_config()
config.update(
{
"population": {
"population_size": 1_000,
"initialization_age_min": 0,
"initialization_age_max": 100,
},
"time": {"step_size": 10},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
"fertility": {"number_of_new_simulants_each_year": 500},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation(), FertilityDeterministic()],
configuration=config,
plugin_configuration=base_plugins,
)
sim.take_steps(number_of_steps=10)
pop = sim.get_population(["age"])
# Population grew from 1000 by ~500 * (100/365) ≈ 137 new simulants.
assert len(pop) > 1_000
print(f"Population grew: {len(pop) > 1_000}")
Population grew: True
FertilityCrudeBirthRate
FertilityCrudeBirthRate models
births at the population level using a crude birth rate - the number of live
births per unit of population, regardless of age or sex structure. Because it
does not consider the demographic composition of the population, the number of
births depends only on the total population size and the overall birth rate.
This contrasts with
FertilityAgeSpecificRates, which
models births at the individual level using rates that vary by age.
It requires initialization_age_min to be 0.
The live_births_by_sex data should contain a row for each
year × sex combination:
from vivarium.public_health._example_data import live_births_by_sex
# covariate.live_births_by_sex.estimate - each row gives the number of
# live births for a year × sex combination.
print(live_births_by_sex().head(6).to_string(index=False))
year_start year_end sex parameter value
1990 1991 Female mean_value 500.0
1990 1991 Male mean_value 500.0
1991 1992 Female mean_value 500.0
1991 1992 Male mean_value 500.0
1992 1993 Female mean_value 500.0
1992 1993 Male mean_value 500.0
Both data sources can be supplied via configuration:
from vivarium.engine import InteractiveContext
from vivarium.public_health.population import BasePopulation, FertilityCrudeBirthRate
from vivarium.public_health._example_data import population_structure, live_births_by_sex
config = make_base_config()
config.update(
{
"population": {
"population_size": 10_000,
"initialization_age_min": 0,
"initialization_age_max": 125,
"data_sources": {
"population_structure": population_structure(),
"location": "Kenya",
},
},
"time": {"step_size": 10},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
"fertility": {
"data_sources": {
"population_structure": population_structure(),
"live_births_by_sex": live_births_by_sex(),
},
},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation(), FertilityCrudeBirthRate()],
configuration=config,
plugin_configuration=base_plugins,
)
sim.take_steps(number_of_steps=10)
pop = sim.get_population(["age"])
assert len(pop) > 10_000
print(f"Population grew: {len(pop) > 10_000}")
Population grew: True
Important
FertilityCrudeBirthRate requires initialization_age_min to be 0.
It will raise a ValueError if this is not the case.
FertilityAgeSpecificRates
FertilityAgeSpecificRates models
fertility at the individual level. Each living female simulant who has not
given birth in the last nine months has a chance of giving birth determined
by age-specific fertility rates. Newborns are linked to their parent via a
parent_id column.
The expected data shape is one row per age × year × sex × parameter combination:
from vivarium.public_health._example_data import age_specific_fertility_rate
# covariate.age_specific_fertility_rate.estimate - each row gives a
# fertility rate for an age × year × sex × parameter cell.
asfr_data = age_specific_fertility_rate(rate=0.05)
print(asfr_data.head(6).to_string(index=False))
year_start year_end age_start age_end sex parameter value
1990 1991 0.0 0.019178 Female mean_value 0.05
1990 1991 0.0 0.019178 Female lower_value 0.05
1990 1991 0.0 0.019178 Female upper_value 0.05
1990 1991 0.0 0.019178 Male mean_value 0.05
1990 1991 0.0 0.019178 Male lower_value 0.05
1990 1991 0.0 0.019178 Male upper_value 0.05
The tutorial example below supplies a constant rate directly:
from vivarium.engine import InteractiveContext
from vivarium.public_health.population import BasePopulation, FertilityAgeSpecificRates
config = make_base_config()
config.update(
{
"population": {
"population_size": 1_000,
"initialization_age_min": 0,
"initialization_age_max": 125,
},
"time": {"step_size": 10},
"mortality": {"data_sources": {"all_cause_mortality_rate": 0}},
# Override the fertility rate via data_sources configuration.
"fertility_age_specific_rates": {
"data_sources": {
"age_specific_fertility_rate": 0.05,
},
},
},
layer="override",
)
sim = InteractiveContext(
components=[BasePopulation(), FertilityAgeSpecificRates()],
configuration=config,
plugin_configuration=base_plugins,
)
sim.take_steps(number_of_steps=100)
pop = sim.get_population(["age", "parent_id", "last_birth_time"])
# Newborns have a parent_id pointing to their mother
newborns = pop[pop["parent_id"] >= 0]
assert len(newborns) > 0
print(f"Births occurred: {len(newborns) > 0}")
Births occurred: True
Fertility configuration summary
Component |
Configuration key |
Default data keys |
Notes |
|---|---|---|---|
|
|
None |
Simplest model; fixed birth count. Pure configuration. |
|
|
|
Requires |
|
|
|
Supports |