Random Numbers in vivarium
This module contains classes and functions supporting common random numbers.
Vivarium has some peculiar needs around randomness. We need to be totally consistent between branches in a comparison. For example, if a simulant gets hit by a truck in the base case it must be hit by that same truck in the counter-factual at exactly the same moment unless the counter-factual explicitly deals with traffic accidents. That means that the system can’t rely on standard global randomness sources because small changes to the number of bits consumed or the order in which randomness consuming operations occur will cause the system to diverge.
The current approach is to generate hash-based seeds where the key is the simulation time, the simulant’s id, the draw number and a unique id for the decision point which needs the randomness. These seeds are then used to generate numpy.random.RandomState objects that can be used to create pseudo-random numbers in a repeatable manner.
- vivarium.framework.randomness.RESIDUAL_CHOICE
A probability placeholder to be used in an un-normalized array of weights to absorb leftover weight so that the array sums to unity. For example:
[0.2, 0.2, RESIDUAL_CHOICE] => [0.2, 0.2, 0.6]
Note
Currently this object is only used in the choice function of this module.
- Type
For mor information, see the Common Random Numbers concept note.
- exception vivarium.framework.randomness.RandomnessError[source]
Raised for inconsistencies in random number and choice generation.
- class vivarium.framework.randomness.IndexMap(map_size=1000000)[source]
A key-index mapping with a vectorized hash and vectorized lookups.
- TEN_DIGIT_MODULUS = 10000000000
- update(new_keys)[source]
Adds the new keys to the mapping.
- Parameters
new_keys (Union[Index, MultiIndex]) – The new index to hash.
- Return type
None
- hash_(keys, salt=0)[source]
Hashes the index into an integer index in the range [0, self.stride]
- Parameters
keys (Union[Index, MultiIndex]) – The new index to hash.
salt (int) – An integer used to perturb the hash in a deterministic way. Useful in dealing with collisions.
- Returns
A pandas series indexed by the given keys and whose values take on integers in the range [0, self.stride]. Duplicates may appear and should be dealt with by the calling code.
- Return type
- convert_to_ten_digit_int(column)[source]
Converts a column of datetimes, integers, or floats into a column of 10 digit integers.
- Parameters
column (Series) – A series of datetimes, integers, or floats.
- Returns
A series of ten digit integers based on the input data.
- Return type
- Raises
RandomnessError – If the column contains data that is neither a datetime-like nor numeric.
- vivarium.framework.randomness.random(key, index, index_map=None)[source]
Produces an indexed set of uniformly distributed random numbers.
The index passed in typically corresponds to a subset of rows in a pandas.DataFrame for which a probabilistic draw needs to be made.
- Parameters
key (str) – A string used to create a seed for the random number generation.
index (Union[Index, MultiIndex]) – The index used for the returned series.
index_map (Optional[IndexMap]) – A mapping between the provided index (which may contain ints, floats, datetimes or any arbitrary combination of them) and an integer index into the random number array.
- Returns
A series of random numbers indexed by the provided index.
- Return type
- vivarium.framework.randomness.choice(key, index, choices, p=None, index_map=None)[source]
Decides between a weighted or unweighted set of choices.
Given a a set of choices with or without corresponding weights, returns an indexed set of decisions from those choices. This is simply a vectorized way to make decisions with some book-keeping.
- Parameters
key (str) – A string used to create a seed for the random number generation.
index (Union[Index, MultiIndex]) – An index whose length is the number of random draws made and which indexes the returned pandas.Series.
choices (Union[List, Tuple, ndarray, Series]) – A set of options to choose from.
p (Optional[Union[List, Tuple, ndarray, Series]]) – The relative weights of the choices. Can be either a 1-d array of the same length as choices or a 2-d array with len(index) rows and len(choices) columns. In the 1-d case, the same set of weights are used to decide among the choices for every item in the index. In the 2-d case, each row in p contains a separate set of weights for every item in the index.
index_map (Optional[IndexMap]) – A mapping between the provided index (which may contain ints, floats, datetimes or any arbitrary combination of them) and an integer index into the random number array.
- Returns
An indexed set of decisions from among the available choices.
- Return type
- Raises
RandomnessError – If any row in p contains RESIDUAL_CHOICE and the remaining weights in the row are not normalized or any row of p contains more than one reference to RESIDUAL_CHOICE.
- vivarium.framework.randomness.filter_for_probability(key, population, probability, index_map=None)[source]
Decide an event outcome for each individual in a population from probabilities.
Given a population or its index and an array of associated probabilities for some event to happen, we create and return the sub-population for whom the event occurred.
- Parameters
key (str) – A string used to create a seed for the random number generation.
population (Union[DataFrame, Series, Index, MultiIndex]) – A view on the simulants for which we are determining the outcome of an event.
probability (Union[List, Tuple, ndarray, Series]) – A 1d list of probabilities of the event under consideration occurring which corresponds (i.e. len(population) == len(probability)) to the population array passed in.
index_map (Optional[IndexMap]) – A mapping between the provided index (which may contain ints, floats, datetimes or any arbitrary combination of them) and an integer index into the random number array.
- Returns
The sub-population of the simulants for whom the event occurred. The return type will be the same as type(population)
- Return type
pandas.core.generic.PandasObject
- class vivarium.framework.randomness.RandomnessStream(key, clock, seed, index_map=None, manager=None, for_initialization=False)[source]
A stream for producing common random numbers.
RandomnessStream objects provide an interface to Vivarium’s common random number generation. They provide a number of methods for doing common simulation tasks that require random numbers like making decisions among a number of choices.
- Parameters
- key
The name of the randomness stream.
- clock
A way to get the current simulation time.
- seed
An extra number used to seed the random number generation.
Notes
Should not be constructed by client code.
Simulation components get RandomnessStream objects by requesting them from the builder provided to them during the setup phase. I.E.:
class VivariumComponent: def setup(self, builder): self.randomness_stream = builder.randomness.get_stream('stream_name')
- copy_with_additional_key(key)[source]
Creates a copy of this stream with a permutation of it’s random seed.
- Parameters
key (Any) – The additional key to describe the new stream with.
- Returns
A new RandomnessStream with a combined key.
- Return type
- property name
- get_draw(index, additional_key=None)[source]
Get an indexed set of numbers uniformly drawn from the unit interval.
- Parameters
index (Union[Index, MultiIndex]) – An index whose length is the number of random draws made and which indexes the returned pandas.Series.
additional_key (Optional[Any]) – Any additional information used to seed random number generation.
- Returns
A series of random numbers indexed by the provided pandas.Index.
- Return type
- filter_for_rate(population, rate, additional_key=None)[source]
Decide an event outcome for each individual from rates.
Given a population or its index and an array of associated rates for some event to happen, we create and return the sub-population for whom the event occurred.
- Parameters
population (Union[DataFrame, Series, Index, MultiIndex]) – A view on the simulants for which we are determining the outcome of an event.
rate (Union[List, Tuple, ndarray, Series]) – A 1d list of rates of the event under consideration occurring which corresponds (i.e. len(population) == len(probability)) to the population view passed in. The rates must be scaled to the simulation time-step size either manually or as a post-processing step in a rate pipeline.
additional_key (Optional[Any]) – Any additional information used to create the seed.
- Returns
The sub-population of the simulants for whom the event occurred. The return type will be the same as type(population)
- Return type
pandas.core.generic.PandasObject
- filter_for_probability(population, probability, additional_key=None)[source]
Decide an outcome for each individual from probabilities.
Given a population or its index and an array of associated probabilities for some event to happen, we create and return the sub-population for whom the event occurred.
- Parameters
population (Union[DataFrame, Series, Index, MultiIndex]) – A view on the simulants for which we are determining the outcome of an event.
probability (Union[List, Tuple, ndarray, Series]) – A 1d list of probabilities of the event under consideration occurring which corresponds (i.e. len(population) == len(probability) to the population view passed in.
additional_key (Optional[Any]) – Any additional information used to create the seed.
- Returns
The sub-population of the simulants for whom the event occurred. The return type will be the same as type(population)
- Return type
pandas.core.generic.PandasObject
- choice(index, choices, p=None, additional_key=None)[source]
Decides between a weighted or unweighted set of choices.
Given a a set of choices with or without corresponding weights, returns an indexed set of decisions from those choices. This is simply a vectorized way to make decisions with some book-keeping.
- Parameters
index (Union[Index, MultiIndex]) – An index whose length is the number of random draws made and which indexes the returned pandas.Series.
choices (Union[List, Tuple, ndarray, Series]) – A set of options to choose from.
p (Optional[Union[List, Tuple, ndarray, Series]]) – The relative weights of the choices. Can be either a 1-d array of the same length as choices or a 2-d array with len(index) rows and len(choices) columns. In the 1-d case, the same set of weights are used to decide among the choices for every item in the index. In the 2-d case, each row in p contains a separate set of weights for every item in the index.
additional_key (Optional[Any]) – Any additional information used to seed random number generation.
- Returns
An indexed set of decisions from among the available choices.
- Return type
- Raises
RandomnessError – If any row in p contains RESIDUAL_CHOICE and the remaining weights in the row are not normalized or any row of p contains more than one reference to `RESIDUAL_CHOICE.
- class vivarium.framework.randomness.RandomnessManager[source]
Access point for common random number generation.
- configuration_defaults = {'randomness': {'additional_seed': None, 'key_columns': ['entrance_time'], 'map_size': 1000000, 'random_seed': 0}}
- property name
- get_randomness_stream(decision_point, for_initialization=False)[source]
Provides a new source of random numbers for the given decision point.
- Parameters
decision_point (str) – A unique identifier for a stream of random numbers. Typically represents a decision that needs to be made each time step like ‘moves_left’ or ‘gets_disease’.
for_initialization (bool) – A flag indicating whether this stream is used to generate key initialization information that will be used to identify simulants in the Common Random Number framework. These streams cannot be copied and should only be used to generate the state table columns specified in
builder.configuration.randomness.key_columns
.
- Raises
RandomnessError – If another location in the simulation has already created a randomness stream with the same identifier.
- Return type
- get_seed(decision_point)[source]
Get a randomly generated seed for use with external randomness tools.
- Parameters
decision_point (str) – A unique identifier for a stream of random numbers. Typically represents a decision that needs to be made each time step like ‘moves_left’ or ‘gets_disease’.
- Returns
A seed for a random number generation that is linked to Vivarium’s common random number framework.
- Return type
- register_simulants(simulants)[source]
Adds new simulants to the randomness mapping.
- Parameters
simulants (DataFrame) – A table with state data representing the new simulants. Each simulant should pass through this function exactly once.
- Raises
RandomnessError – If the provided table does not contain all key columns specified in the configuration.
- class vivarium.framework.randomness.RandomnessInterface(manager)[source]
- Parameters
manager (RandomnessManager) –
- get_stream(decision_point, for_initialization=False)[source]
Provides a new source of random numbers for the given decision point.
vivarium
provides a framework for Common Random Numbers which allows for variance reduction when modeling counter-factual scenarios. Users interested in causal analysis and comparisons between simulation scenarios should be careful to use randomness streams provided by the framework wherever randomness is employed.- Parameters
decision_point (str) – A unique identifier for a stream of random numbers. Typically represents a decision that needs to be made each time step like ‘moves_left’ or ‘gets_disease’.
for_initialization (bool) – A flag indicating whether this stream is used to generate key initialization information that will be used to identify simulants in the Common Random Number framework. These streams cannot be copied and should only be used to generate the state table columns specified in
builder.configuration.randomness.key_columns
.
- Returns
An entry point into the Common Random Number generation framework. The stream provides vectorized access to random numbers and a few other utilities.
- Return type
- get_seed(decision_point)[source]
Get a randomly generated seed for use with external randomness tools.
- Parameters
decision_point (str) – A unique identifier for a stream of random numbers. Typically represents a decision that needs to be made each time step like ‘moves_left’ or ‘gets_disease’.
- Returns
A seed for a random number generation that is linked to Vivarium’s common random number framework.
- Return type
- register_simulants(simulants)[source]
Registers simulants with the Common Random Number Framework.
- Parameters
simulants (DataFrame) – A section of the state table with new simulants and at least the columns specified in
builder.configuration.randomness.key_columns
. This function should be called as soon as the key columns are generated.