Population Management

Since Vivarium is an agent-based simulation framework, managing a group of simulants and their attributes is a critical task. Fundamentally, to run a simulation we need to be able to create new simulates, update their state attributes, and facilitate access to their state so that components in the simulation can do interesting things based on it. The tooling to support working with our simulant population is called the population management system.

The Population State Table

The core representation of simulants and their state information in Vivarium is a dynamically-generated pandas.DataFrame known as the population state table (or just “state table”). Under this representation, rows represent simulants while columns correspond to attributes like age, sex or systolic blood pressure. These columns represent one of several important resources within Vivarium that other components can draw on. Each of the actions we need to be able to take correspond to a manipulation of this state table. The addition of new simulants is the creation of rows, the creation of new attributes is the creation of columns, and the reading and updating of state is reading and updating the dataframe itself.

<<TODO: image of state table w/ expanding rows and columns>>

Attributes

Attributes are the fundamental characteristics of a population and are represented by columns in the population state table. They are a particular type of values that are produced by on-demand by attribute pipelines. When a component requires the state table (or some subset of it), each attribute requested is calculated via its corresponding attribute pipeline and returned in tabular form. For example, when a component requests the entire population’s age, the “age” attribute pipeline calculates the age of all simulants and returns a pandas.Series of age values.

Note

The population system is distinct from the values system documentation although they are intimately related. While the values system is responsible for populating the columns of the state table with attributes, the population system is responsible for managing and providing access to said state table.

Population Views

As mentioned above, columns in the state table are dynamically generated via attribute pipelines as needed. The population manager holds this logic and tightly controls read and write access to it through a structure it provides known as a “population view”. A population view itself provides access to a subset of columns and rows from the state table as well as any private columns created by the component the view is attached to. Through a view, components can read, update, or, under the right circumstances, create new state in the state table.

Views are created for components in a simulation by specifying the component needing it and an optional query string to the population manager interface. All attributes are then viewable and the query string filters the simulants returned. And as noted above, they also have read and write access to all private columns created by the component they are attached to. This is how one might update the source data for attributes, e.g. updating all simulants’ ages on every time step.

There are several methods on a population view that facilitate working with the state table, including ones to get the population index and attributes. There are also two methods for writing to private columns:

initialize() is used during simulant creation (both initial population and new cohorts) to write initial values for private columns.
update() is used during the simulation to modify existing private column data. It takes the column name(s) and a modifier function that receives the current values and returns the updated values.

Filtering Simulants

There are two types of filtering that can be applied when using a population view to get attributes or private columns.

First, a query argument can be passed in to any of the population view’s get(), get_frame(), or get_filtered_index() to filter the simulants returned for that specific call.

Second, if any components have registered an untracking query, untracked simulants will be automatically filtered out. There is an optional include_untracked argument that defaults to False that can be used to bypass the untracked filtering if desired.

Note

Combining Queries All types of queries are combined using the logical AND operator. Be sure to set up your query strings accordingly.

Untracking Simulants

As mentioned above, there is a Vivarium concept of untracking simulants. Untracking a simulant allows for automatic filtering of those simulants from population views so that components can ignore them. This is useful to reduce computational overhead when simulants are no longer relevant to the simulation, e.g. deceased individuals or those who have aged beyond the scope of interest. A component can register a tracked query via vivarium.framework.population.interface.PopulationInterface.register_tracked_query().

Note

Tracked Queries and Including Untracked Simulants When a component wants to register a query to be used for filtering out untracked simulants, it registers the tracked query, i.e. the query that defines which simulants should be kept. This can perhaps be a bit confusing since we then decide to include or exclude untracked simulants when using population views. Despite this potential source of confusion, we feel it’s more intuitive to think about the query in terms of who to keep and then the population view call in terms of who to exclude.

For example, if a component wants to untrack simulants whohave died, it would register is_alive == True as a tracked query which tells Vivarium to keep simulants who are alive (and, conversely, filter out those who are not). Then, when using a population view to data, we can decide whether or not to include untracked simulants or not (i.e. deceased ones).

Private Columns

We have mentioned private columns a few times now, but what exactly are they and how do they differ from attributes (which can be thought of as “public” columns in the state table)? To start, keep in mind that attributes are produced by attribute pipelines and all pipelines - attribute or otherwise - require a source of data to operate on. One of the things that an attribute pipeline’s source can be is a column of data. All such attribute pipeline source columns are maintained in a pandas.DataFrame attached to the population manager, but are only accessible by population views attached to the component that created the source data in the first place. These columns are thus referred to as private columns.

Creating and Updating Private Columns

To create a private column to be used as a source for an attribute pipeline, a component must register initializer methods during its setup. Any columns that are created and passed to the population view’s initialize() method within these methods will be automatically registered as private columns for that component. The corresponding attribute pipelines will be registered automatically as well.

To update private column data over the course of a simulation, a component can use the population view’s update() method, passing the column name(s) and a modifier function.

Note

Private columns vs attributes The distinction between private columns and attributes can be confusing. It’s important to remember that attributes are dynamically calculated as needed (via attribute pipelines) and are readable by all components (via population views). Private columns, on the other hand, are static data stored in the population manager that are only readable and writable by the component that created them and serve as the source for their corresponding attributes.

Private column data can be updated as needed by the owning component. These updates are then reflected in the attributes calculated from them the next time they are requested. For example, a component that creates an “age” private column (and thus and “age” attribute) instantiates the starting ages for all simulants at the start of the simulation. At each time step, the component can then update the private column by incrementing all ages by the duration of the time step. The next time any component then requests the “age” attribute, the updated ages will be returned since the source data was update.

Creating Attributes

There are two ways to create attributes. The first, as described above, is to have a component register an initializer method during its setup phase which creates a private column. This private column will act as the source of data for its corresponding attribute pipeline which is automatically registered as well. For example, if a component creates an “age” private column, and “age” attribute pipeline will be automatically registered and so the “age” attribute will be available for use by all components.

Not all attributes use a private column as their source, however. A component can also register an attribute pipeline explicitly during its setup phase by calling the values manager interface’s register_attribute_producer() or register_rate_producer() methods.

Creating Simulants

The population view pattern also underlies the creation of simulants, the only difference being that when simulations are being initialized for the first time, it is acceptable to create columns in the state table via initialize() that don’t already exist.

The Simulant Creator Function

Simulants are are introduced to the simulation using a function that takes the number of new simulants as its parameter. This function, known as the simulant creator, is provided by the population manager interface and is used by the simulation entrypoint to initialize the population. It can also be used by components that want to introduce new simulants over the course of a simulation, such as a fertility component that models births. This means there are two distinct execution states in which simulants can be created: The population initialization state during the setup phase, and the main event loop.

The simulant creator function first adds rows to the state table. It then loops through a set of functions that have been registered to it as population initializers via register_initializer(), passing in the index of the newly created simulants. These functions generally proceed by using population views to dictate the state of the newly created simulants they are responsible for. It is the only time creating columns in the state table is acceptable.