.. _exploration_tutorial: ================================================ Exploring a Simulation in an Interactive Setting ================================================ In other tutorials :ref:`[1] ` :ref:`[2] ` we've walked through how to build components for simulations. We've also shown how to run those simulations from the :ref:`command line ` and in an :ref:`interactive setting `. In this tutorial we'll focus on exploring simulations in an interactive setting. The only prerequisite is that you've set up your programming environment (See :ref:`the getting started section `). We'll look at how to examine the :term:`population state table `, how to print and interpret the simulation :term:`configuration `, and how to get values from the :term:`value pipeline ` system. We'll work through all this with a few case studies using the simulations built in the other tutorials. .. contents:: :depth: 2 :local: :backlinks: none What Are We Looking At? ----------------------- Simulations are complicated things. It's beyond the scope of this tutorial in particular to talk about what they are and how they work and when they make sense as models of the world. Luckily, once you have one in hand, you can start figuring out the answers to many of those questions yourself. In the case studies that follow, we'll start simply. We'll get our simulations :ref:`setup ` in an interactive environment. We'll then examine various aspects of the simulation state at the beginning of the simulation. We'll then run them for a while and see how that state changes over time. After we have a handle on examining different aspects of the simulation, we'll take a step back to talk about what our expectations should be about how the simulation should work and look at some examples of how to test those expectations. Finally, we'll setup a comparison across two simulations to examine how changing our :term:`configuration parameters ` alters what happens in a simulation. Case Study #1: Population Epidemiology -------------------------------------- In this case study, we're going to put together and examine an individual-based epidemiology model from a bunch of pre-constructed parts. We'll start out rather mechanically, just showing how to set up and run a simulation and pull out interesting data. As we go on, we'll talk about what sort of results we should expect from the structure of the model and how we can verify those expectations. Getting Things Set Up +++++++++++++++++++++ Before we can start exploring properties of the simulation, we need to get our hands on a simulation :class:`context `. This is the object we'll use to examine and run our simulation model. You can check out our tutorial on :ref:`setting up a simulation ` to see the tools that ``vivarium`` provides for building your own simulation context objects. For this tutorial on exploring simulations, however, we've provided a convenience function to get you started. In a Jupyter notebook or python interpreter, you can run the following .. testcode:: from vivarium.examples.disease_model import get_disease_model_simulation sim = get_disease_model_simulation() The ``sim`` object returned here is our simulation context. With it, we're ready to begin examining various aspects of the simulation state. Checking Out the Configuration ++++++++++++++++++++++++++++++ One of the things we might want to look at is the simulation :term:`configuration `. Typically, a :term:`model specification ` encodes some configuration information, but leaves many things set to defaults. We can see what's in the configuration by simply printing it. .. testsetup:: configuration from vivarium.examples.disease_model import get_disease_model_simulation sim = get_disease_model_simulation() del sim.configuration['input_data'] del sim.configuration['stratification']['excluded_categories'] .. testcode:: configuration print(sim.configuration) .. testoutput:: configuration randomness: key_columns: model_override: ['entrance_time', 'age'] map_size: component_configs: 1000000 random_seed: component_configs: 0 additional_seed: component_configs: None rate_conversion_type: component_configs: linear time: start: year: model_override: 2022 month: model_override: 1 day: model_override: 1 end: year: model_override: 2026 month: model_override: 12 day: model_override: 31 step_size: model_override: 0.5 standard_step_size: component_configs: None population: population_size: model_override: 100000 age_start: model_override: 0 age_end: model_override: 5 mortality: mortality_rate: model_override: 0.0114 life_expectancy: model_override: 88.9 data_sources: mortality_rate: component_configs: 0.01 lower_respiratory_infections: incidence_rate: model_override: 0.871 remission_rate: model_override: 45.1 excess_mortality_rate: model_override: 0.634 child_wasting: proportion_exposed: model_override: 0.0914 effect_of_child_wasting_on_infected_with_lower_respiratory_infections.incidence_rate: relative_risk: model_override: 4.63 sqlns: effect_size: model_override: 0.18 interpolation: order: component_configs: 0 validate: component_configs: True extrapolate: component_configs: True stratification: default: component_configs: [] deaths: exclude: component_configs: [] include: component_configs: [] ylls: exclude: component_configs: [] include: component_configs: [] disease_state.susceptible_to_lower_respiratory_infections: data_sources: initialization_weights: component_configs: 1.0 excess_mortality_rate: component_configs: 0.0 disease_state.infected_with_lower_respiratory_infections: data_sources: initialization_weights: component_configs: 0.0 excess_mortality_rate: component_configs: 0.0 What do we see here? The configuration is *hierarchical*. There are a set of top level *keys* that define named subsets of configuration data. We can access just those subsets if we like. .. testcode:: print(sim.configuration.randomness) .. testoutput:: key_columns: model_override: ['entrance_time', 'age'] map_size: component_configs: 1000000 random_seed: component_configs: 0 additional_seed: component_configs: None rate_conversion_type: component_configs: linear This subset of configuration data contains more keys. All of the keys in our example here (key_columns, map_size, random_seed, additional_seed, and rate_conversion_type) point directly to values. We can access these values from the simulation as well. .. testcode:: print(sim.configuration.randomness.key_columns) print(sim.configuration.randomness.map_size) print(sim.configuration.randomness.random_seed) print(sim.configuration.randomness.additional_seed) print(sim.configuration.randomness.rate_conversion_type) .. testoutput:: ['entrance_time', 'age'] 1000000 0 None linear However, we can no longer modify the configuration since the simulation has already been setup. .. testcode:: from layered_config_tree import ConfigurationError try: sim.configuration.randomness.update({'random_seed': 5}) except ConfigurationError: print("Can't update configuration after setup") .. testoutput:: Can't update configuration after setup If we look again at the randomness configuration, it appears that there should be one more layer of keys. .. code-block:: python key_columns: model_override: ['entrance_time', 'age'] map_size: component_configs: 1000000 random_seed: component_configs: 0 additional_seed: component_configs: None rate_conversion_type: component_configs: linear This last layer reflects a priority level in the way simulation configuration is managed. The ``component_configs`` under ``map_size``, ``random_seed``, and ``additional_seed`` tells us that the value was set by a simulation component's ``configuration_defaults``. The ``model_override`` under ``key_columns`` tells us that a model specification file set the value. If you're trying to debug issues, you may want more information than this. You can also type ``repr(sim.configuration)`` (this is the equivalent of evaluating ``sim.configuration`` in a jupyter notebook or ipython cell). This will give you considerable information about where each configuration value was set and at what priority level. You can read more about how the configuration works in the :ref:`configuration concept section ` Looking at the Simulation Population ++++++++++++++++++++++++++++++++++++ Another interesting thing to look at at the beginning of the simulation is your starting population. .. code-block:: python pop = sim.get_population( [ "age", "is_alive", "entrance_time", "lower_respiratory_infections", "child_wasting_propensity", ] ) print(pop.head()) :: age is_alive entrance_time lower_respiratory_infections child_wasting_propensity 0 1.707662 True 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.579157 1 2.731665 True 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.280783 2 0.511246 True 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.332681 3 2.898714 True 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.505482 4 1.381896 True 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.017806 This gives you a ``pandas.DataFrame`` representing your starting population. You can use it to check all sorts of characteristics about individuals or the population as a whole. .. testcode:: :hide: pop = sim.get_population( [ "age", "sex", "is_alive", "entrance_time", "lower_respiratory_infections", "child_wasting_propensity", ] ) pop = pop.reindex(sorted(pop.columns), axis=1) print(pop.age.describe()) print(pop.is_alive.value_counts()) print(pop.child_wasting_propensity.describe()) print(pop.lower_respiratory_infections.value_counts()) print(pop.entrance_time.value_counts()) print(pop.sex.value_counts()) .. testoutput:: count 100000.000000 mean 2.490328 std 1.441763 min 0.000011 25% 1.242100 50% 2.487453 75% 3.730555 max 4.999957 Name: age, dtype: float64 is_alive True 100000 Name: count, dtype: int64 count 1.000000e+05 mean 5.007716e-01 std 2.885722e-01 min 5.485898e-07 25% 2.504689e-01 50% 5.000226e-01 75% 7.516029e-01 max 9.999966e-01 Name: child_wasting_propensity, dtype: float64 lower_respiratory_infections susceptible_to_lower_respiratory_infections 100000 Name: count, dtype: int64 entrance_time 2021-12-31 12:00:00 100000 Name: count, dtype: int64 sex Female 50011 Male 49989 Name: count, dtype: int64 Understanding the Simulation Data +++++++++++++++++++++++++++++++++ Our model starts with a bunch of people with uniformly distributed ages and sexes. They march through time 3 days at a time (we'll vary this later) in discrete steps. On each step for each person, the simulation will ask and answer several questions: Did they die? Did they get sick? If they were sick, did they recover? Are they exposed to any risks? At the end we'll examine how many people died and compare that with a theoretical life expectancy. Later, we'll consider two simulations that differ only by the presence of a new intervention and examine how effective that intervention is. .. todo:: Show how to understand the starting population from both the configuration and the population state table. Show how to understand the simulation time and how the clock progresses based on configuration parameters. Case Study #2: Boids -------------------- .. todo:: Everything