Exploring a Simulation in an Interactive Setting

In other tutorials [1] [2] we’ve walked through how to build components for simulations. We’ve also shown how to run those simulations from the command line and in an interactive setting.

In this tutorial we’ll focus on exploring simulations in an interactive setting. The only prerequisite is that you’ve set up your programming environment (See the getting started section). We’ll look at how to examine the population state table, how to print and interpret the simulation configuration, and how to get values from the value pipeline system.

We’ll work through all this with a few case studies using the simulations built in the other tutorials.

What Are We Looking At?

Simulations are complicated things. It’s beyond the scope of this tutorial in particular to talk about what they are and how they work and when they make sense as models of the world. Luckily, once you have one in hand, you can start figuring out the answers to many of those questions yourself.

In the case studies that follow, we’ll start simply. We’ll get our simulations setup in an interactive environment. We’ll then examine various aspects of the simulation state at the beginning of the simulation. We’ll then run them for a while and see how that state changes over time. After we have a handle on examining different aspects of the simulation, we’ll take a step back to talk about what our expectations should be about how the simulation should work and look at some examples of how to test those expectations. Finally, we’ll setup a comparison across two simulations to examine how changing our configuration parameters alters what happens in a simulation.

Case Study #1: Population Epidemiology

In this case study, we’re going to put together and examine an individual-based epidemiology model from a bunch of pre-constructed parts. We’ll start out rather mechanically, just showing how to set up and run a simulation and pull out interesting data. As we go on, we’ll talk about what sort of results we should expect from the structure of the model and how we can verify those expectations.

Getting Things Set Up

Before we can start exploring properties of the simulation, we need to get our hands on a simulation context. This is the object we’ll use to examine and run our simulation model. You can check out our tutorial on setting up a simulation to see the tools that vivarium provides for building your own simulation context objects. For this tutorial on exploring simulations, however, we’ve provided a convenience function to get you started. In a Jupyter notebook or python interpreter, you can run the following

from vivarium.examples.disease_model import get_disease_model_simulation

sim = get_disease_model_simulation()

The sim object returned here is our simulation context. With it, we’re ready to begin examining various aspects of the simulation state.

Checking Out the Configuration

One of the things we might want to look at is the simulation configuration. Typically, a model specification encodes some configuration information, but leaves many things set to defaults. We can see what’s in the configuration by simply printing it.

print(sim.configuration)

randomness:
    key_columns:
        model_override: ['entrance_time', 'age']
    map_size:
        component_configs: 1000000
    random_seed:
        component_configs: 0
    additional_seed:
        component_configs: None
    rate_conversion_type:
        component_configs: linear
time:
    start:
        year:
            model_override: 2022
        month:
            model_override: 1
        day:
            model_override: 1
    end:
        year:
            model_override: 2026
        month:
            model_override: 12
        day:
            model_override: 31
    step_size:
        model_override: 0.5
    standard_step_size:
        component_configs: None
population:
    population_size:
        model_override: 100000
    age_start:
        model_override: 0
    age_end:
        model_override: 5
mortality:
    mortality_rate:
        model_override: 0.0114
    life_expectancy:
        model_override: 88.9
    data_sources:
        mortality_rate:
            component_configs: 0.01
lower_respiratory_infections:
    incidence_rate:
        model_override: 0.871
    remission_rate:
        model_override: 45.1
    excess_mortality_rate:
        model_override: 0.634
child_wasting:
    proportion_exposed:
        model_override: 0.0914
effect_of_child_wasting_on_infected_with_lower_respiratory_infections.incidence_rate:
    relative_risk:
        model_override: 4.63
sqlns:
    effect_size:
        model_override: 0.18
interpolation:
    order:
        component_configs: 0
    validate:
        component_configs: True
    extrapolate:
        component_configs: True
stratification:
    default:
        component_configs: []
    deaths:
        exclude:
            component_configs: []
        include:
            component_configs: []
    ylls:
        exclude:
            component_configs: []
        include:
            component_configs: []
disease_state.susceptible_to_lower_respiratory_infections:
    data_sources:
        initialization_weights:
            component_configs: 1.0
        excess_mortality_rate:
            component_configs: 0.0
disease_state.infected_with_lower_respiratory_infections:
    data_sources:
        initialization_weights:
            component_configs: 0.0
        excess_mortality_rate:
            component_configs: 0.0

What do we see here? The configuration is hierarchical. There are a set of top level keys that define named subsets of configuration data. We can access just those subsets if we like.

print(sim.configuration.randomness)

key_columns:
    model_override: ['entrance_time', 'age']
map_size:
    component_configs: 1000000
random_seed:
    component_configs: 0
additional_seed:
    component_configs: None
rate_conversion_type:
    component_configs: linear

This subset of configuration data contains more keys. All of the keys in our example here (key_columns, map_size, random_seed, additional_seed, and rate_conversion_type) point directly to values. We can access these values from the simulation as well.

print(sim.configuration.randomness.key_columns)
print(sim.configuration.randomness.map_size)
print(sim.configuration.randomness.random_seed)
print(sim.configuration.randomness.additional_seed)
print(sim.configuration.randomness.rate_conversion_type)

['entrance_time', 'age']
1000000
0
None
linear

However, we can no longer modify the configuration since the simulation has already been setup.

from layered_config_tree import ConfigurationError

try:
    sim.configuration.randomness.update({'random_seed': 5})
except ConfigurationError:
    print("Can't update configuration after setup")

Can't update configuration after setup

If we look again at the randomness configuration, it appears that there should be one more layer of keys.

key_columns:
    model_override: ['entrance_time', 'age']
map_size:
    component_configs: 1000000
random_seed:
    component_configs: 0
additional_seed:
    component_configs: None
rate_conversion_type:
    component_configs: linear

This last layer reflects a priority level in the way simulation configuration is managed. The component_configs under map_size, random_seed, and additional_seed tells us that the value was set by a simulation component’s configuration_defaults. The model_override under key_columns tells us that a model specification file set the value. If you’re trying to debug issues, you may want more information than this. You can also type repr(sim.configuration) (this is the equivalent of evaluating sim.configuration in a jupyter notebook or ipython cell). This will give you considerable information about where each configuration value was set and at what priority level. You can read more about how the configuration works in the configuration concept section

Looking at the Simulation Population

Another interesting thing to look at at the beginning of the simulation is your starting population.

pop = sim.get_population(
    [
        "age",
        "is_alive",
        "entrance_time",
        "lower_respiratory_infections",
        "child_wasting_propensity",
    ]
)
print(pop.head())

        age  is_alive       entrance_time                 lower_respiratory_infections  child_wasting_propensity
1.707662      True 2021-12-31 12:00:00  susceptible_to_lower_respiratory_infections                  0.579157
2.731665      True 2021-12-31 12:00:00  susceptible_to_lower_respiratory_infections                  0.280783
0.511246      True 2021-12-31 12:00:00  susceptible_to_lower_respiratory_infections                  0.332681
2.898714      True 2021-12-31 12:00:00  susceptible_to_lower_respiratory_infections                  0.505482
1.381896      True 2021-12-31 12:00:00  susceptible_to_lower_respiratory_infections                  0.017806

This gives you a pandas.DataFrame representing your starting population. You can use it to check all sorts of characteristics about individuals or the population as a whole.

count    100000.000000
mean          2.490328
std           1.441763
min           0.000011
25%           1.242100
50%           2.487453
75%           3.730555
max           4.999957
Name: age, dtype: float64
is_alive
True    100000
Name: count, dtype: int64
count    1.000000e+05
mean     5.007716e-01
std      2.885722e-01
min      5.485898e-07
25%      2.504689e-01
50%      5.000226e-01
75%      7.516029e-01
max      9.999966e-01
Name: child_wasting_propensity, dtype: float64
lower_respiratory_infections
susceptible_to_lower_respiratory_infections    100000
Name: count, dtype: int64
entrance_time
2021-12-31 12:00:00    100000
Name: count, dtype: int64
sex
Female    50011
Male      49989
Name: count, dtype: int64

Understanding the Simulation Data

Our model starts with a bunch of people with uniformly distributed ages and sexes. They march through time 3 days at a time (we’ll vary this later) in discrete steps. On each step for each person, the simulation will ask and answer several questions: Did they die? Did they get sick? If they were sick, did they recover? Are they exposed to any risks? At the end we’ll examine how many people died and compare that with a theoretical life expectancy. Later, we’ll consider two simulations that differ only by the presence of a new intervention and examine how effective that intervention is.

Todo

Show how to understand the starting population from both the configuration and the population state table. Show how to understand the simulation time and how the clock progresses based on configuration parameters.

Case Study #2: Boids

Todo

Everything