Random Numbers in vivarium

This module contains classes and functions supporting common random numbers.

Vivarium has some peculiar needs around randomness. We need to be totally consistent between branches in a comparison. For example, if a simulant gets hit by a truck in the base case in must be hit by that same truck in the counter-factual at exactly the same moment unless the counter-factual explicitly deals with traffic accidents. That means that the system can’t rely on standard global randomness sources because small changes to the number of bits consumed or the order in which randomness consuming operations occur will cause the system to diverge. The current approach is to generate hash-based seeds where the key is the simulation time, the simulant’s id, the draw number and a unique id for the decision point which needs the randomness. These seeds are then used to generate numpy.random.RandomState objects that can be used to create pseudo-random numbers in a repeatable manner.

vivarium.framework.randomness.RESIDUAL_CHOICE

A probability placeholder to be used in an un-normalized array of weights to absorb leftover weight so that the array sums to unity. For example:

[0.2, 0.2, RESIDUAL_CHOICE] => [0.2, 0.2, 0.6]

Note

Currently this object is only used in the choice function of this module.

Type:object

For mor information, see the Common Random Numbers concept note.

exception vivarium.framework.randomness.RandomnessError[source]

Exception raised for inconsistencies in random number and choice generation.

class vivarium.framework.randomness.IndexMap(map_size=1000000)[source]

A key-index mapping with a simple vectorized hash and vectorized lookups.

TEN_DIGIT_MODULUS = 10000000000
update(new_keys)[source]

Adds the new keys to the mapping.

Parameters:new_keys (Index) – The new index to hash.
hash_(keys, salt=0)[source]

Hashes the given index into an integer index in the range [0, self.stride]

Parameters:
  • keys (Index) – The new index to hash.
  • salt (int) – An integer used to perturb the hash in a deterministic way. Useful in dealing with collisions.
Returns:

A pandas series indexed by the given keys and whose values take on integers in the range [0, self.stride]. Duplicates may appear and should be dealt with by the calling code.

Return type:

pd.Series

convert_to_ten_digit_int(column)[source]

Converts a column of datetimes, integers, or floats into a column of 10 digit integers.

Parameters:column (Series) – A series of datetimes, integers, or floats.
Returns:A series of ten digit integers based on the input data.
Return type:pd.Series
Raises:RandomnessError : – If the column contains data that is neither a datetime-like nor numeric.
static digit(m, n)[source]

Returns the nth digit of each number in m.

Return type:Union[int, Series]
static clip_to_seconds(m)[source]

Clips UTC datetime in nanoseconds to seconds.

Return type:Union[int, Series]
spread(m)[source]

Spreads out integer values to give smaller values more weight.

Return type:Union[int, Series]
shift(m)[source]

Shifts floats so that the first 10 decimal digits are significant.

Return type:Union[int, Series]
vivarium.framework.randomness.random(key, index, index_map=None)[source]

Produces an indexed pandas.Series of uniformly distributed random numbers.

The index passed in typically corresponds to a subset of rows in a pandas.DataFrame for which a probabilistic draw needs to be made.

Parameters:
  • key (str) – A string used to create a seed for the random number generation.
  • index (Index) – The index used for the returned series.
  • index_map (Optional[IndexMap]) – A mapping between the provided index (which may contain ints, floats, datetimes or any arbitrary combination of them) and an integer index into the random number array.
Returns:

A series of random numbers indexed by the provided index.

Return type:

pd.Series

vivarium.framework.randomness.get_hash(key)[source]

Gets a hash of the provided key.

Parameters:key (str) – A string used to create a seed for the random number generator.
Returns:A hash of the provided key.
Return type:int
vivarium.framework.randomness.choice(key, index, choices, p=None, index_map=None)[source]

Decides between a weighted or unweighted set of choices.

Given a a set of choices with or without corresponding weights, returns an indexed set of decisions from those choices. This is simply a vectorized way to make decisions with some book-keeping.

Parameters:
  • key (str) – A string used to create a seed for the random number generation.
  • index (pandas.Index) – An index whose length is the number of random draws made and which indexes the returned pandas.Series.
  • choices (Union[List[~T], Tuple, ndarray, Series]) – A set of options to choose from.
  • p (Union[List[~T], Tuple, ndarray, Series, None]) – The relative weights of the choices. Can be either a 1-d array of the same length as choices or a 2-d array with len(index) rows and len(choices) columns. In the 1-d case, the same set of weights are used to decide among the choices for every item in the index. In the 2-d case, each row in p contains a separate set of weights for every item in the index.
  • index_map (Optional[IndexMap]) – A mapping between the provided index (which may contain ints, floats, datetimes or any arbitrary combination of them) and an integer index into the random number array.
Returns:

An indexed set of decisions from among the available choices.

Return type:

pd.Series

Raises:

RandomnessError – If any row in p contains RESIDUAL_CHOICE and the remaining weights in the row are not normalized or any row of p contains more than one reference to RESIDUAL_CHOICE.

vivarium.framework.randomness.filter_for_probability(key, population, probability, index_map=None)[source]

Decide an event outcome for each individual in a population from probabilities.

Given a population or its index and an array of associated probabilities for some event to happen, we create and return the sub-population for whom the event occurred.

Parameters:
  • key (str) – A string used to create a seed for the random number generation.
  • population (Union[DataFrame, Series, Index]) – A view on the simulants for which we are determining the outcome of an event.
  • probability (Union[List[~T], Tuple, ndarray, Series]) – A 1d list of probabilities of the event under consideration occurring which corresponds (i.e. len(population) == len(probability)) to the population array passed in.
  • index_map (Optional[IndexMap]) – A mapping between the provided index (which may contain ints, floats, datetimes or any arbitrary combination of them) and an integer index into the random number array.
Returns:

The sub-population of the simulants for whom the event occurred. The return type will be the same as type(population)

Return type:

pd.core.generic.PandasObject

class vivarium.framework.randomness.RandomnessStream(key, clock, seed, index_map=None, manager=None, for_initialization=False)[source]

A stream for producing common random numbers.

RandomnessStream objects provide an interface to Vivarium’s common random number generation. They provide a number of methods for doing common simulation tasks that require random numbers like making decisions among a number of choices.

key

The name of the randomness stream.

clock

A way to get the current simulation time.

seed

An extra number used to seed the random number generation.

Notes

Should not be constructed by client code.

Simulation components get RandomnessStream objects by requesting them from the builder provided to them during the setup phase. I.E.:

class CeamComponent:
    def setup(self, builder):
        self.randomness_stream = builder.randomness.get_stream('stream_name')

See also

engine.Builder

name
get_draw(index, additional_key=None)[source]

Get an indexed sequence of floats pulled from a uniform distribution over [0.0, 1.0)

Parameters:
  • index (Index) – An index whose length is the number of random draws made and which indexes the returned pandas.Series.
  • additional_key (Optional[Any]) – Any additional information used to seed random number generation.
Returns:

A series of random numbers indexed by the provided pandas.Index.

Return type:

pd.Series

get_seed(additional_key=None)[source]

Get a randomly generated seed for use with external randomness tools.

Parameters:additional_key (Optional[Any]) – Any additional information used to create the seed.
Returns:A seed for a random number generation that is linked to Vivarium’s common random number framework.
Return type:int
filter_for_rate(population, rate, additional_key=None)[source]

Decide an event outcome for each individual in a population from rates.

Given a population or its index and an array of associated rates for some event to happen, we create and return the sub-population for whom the event occurred.

Parameters:
  • population (Union[DataFrame, Series, Index]) – A view on the simulants for which we are determining the outcome of an event.
  • rate (Union[List[~T], Tuple, ndarray, Series]) – A 1d list of rates of the event under consideration occurring which corresponds (i.e. len(population) == len(probability)) to the population view passed in. The rates must be scaled to the simulation time-step size either manually or as a post-processing step in a rate pipeline.
  • additional_key (Optional[Any]) – Any additional information used to create the seed.
Returns:

The index of the simulants for whom the event occurred.

Return type:

Index

See also

framework.values()
Value/rate pipeline management module.
filter_for_probability(population, probability, additional_key=None)[source]

Decide an event outcome for each individual in a population from probabilities.

Given a population or its index and an array of associated probabilities for some event to happen, we create and return the sub-population for whom the event occurred.

Parameters:
  • population (Union[DataFrame, Series, Index]) – A view on the simulants for which we are determining the outcome of an event.
  • probability (Union[List[~T], Tuple, ndarray, Series]) – A 1d list of probabilities of the event under consideration occurring which corresponds (i.e. len(population) == len(probability) to the population view passed in.
  • additional_key (Optional[Any]) – Any additional information used to create the seed.
Returns:

The sub-population of the simulants for whom the event occurred. The return type will be the same as type(population)

Return type:

Index

choice(index, choices, p=None, additional_key=None)[source]

Decides between a weighted or unweighted set of choices.

Given a a set of choices with or without corresponding weights, returns an indexed set of decisions from those choices. This is simply a vectorized way to make decisions with some book-keeping.

Parameters:
  • index (Index) – An index whose length is the number of random draws made and which indexes the returned pandas.Series.
  • choices (Union[List[~T], Tuple, ndarray, Series]) – A set of options to choose from.
  • p (Union[List[~T], Tuple, ndarray, Series, None]) – The relative weights of the choices. Can be either a 1-d array of the same length as choices or a 2-d array with len(index) rows and len(choices) columns. In the 1-d case, the same set of weights are used to decide among the choices for every item in the index. In the 2-d case, each row in p contains a separate set of weights for every item in the index.
  • additional_key (Optional[Any]) – Any additional information used to seed random number generation.
Returns:

An indexed set of decisions from among the available choices.

Return type:

pd.Series

Raises:

RandomnessError – If any row in p contains RESIDUAL_CHOICE and the remaining weights in the row are not normalized or any row of p contains more than one reference to `RESIDUAL_CHOICE.

class vivarium.framework.randomness.RandomnessManager[source]

Access point for common random number generation.

configuration_defaults = {'randomness': {'additional_seed': None, 'key_columns': ['entrance_time'], 'map_size': 1000000, 'random_seed': 0}}
name
setup(builder)[source]
get_randomness_stream(decision_point, for_initialization=False)[source]

Provides a new source of random numbers for the given decision point.

Parameters:
  • decision_point (str) – A unique identifier for a stream of random numbers. Typically represents a decision that needs to be made each time step like ‘moves_left’ or ‘gets_disease’.
  • for_initialization (bool) – A flag indicating whether this stream is used to generate key initialization information that will be used to identify simulants in the Common Random Number framework. These streams cannot be copied and should only be used to generate the state table columns specified in builder.configuration.randomness.key_columns.
Raises:

RandomnessError : – If another location in the simulation has already created a randomness stream with the same identifier.

Return type:

RandomnessStream

register_simulants(simulants)[source]

Adds new simulants to the randomness mapping.

Parameters:simulants (DataFrame) – A table with state data representing the new simulants. Each simulant should pass through this function exactly once.
Raises:RandomnessError : – If the provided table does not contain all key columns specified in the configuration.
class vivarium.framework.randomness.RandomnessInterface(manager)[source]
get_stream(decision_point, for_initialization=False)[source]

Provides a new source of random numbers for the given decision point.

vivarium provides a framework for Common Random Numbers which allows for variance reduction when modeling counter-factual scenarios. Users interested in causal analysis and comparisons between simulation scenarios should be careful to use randomness streams provided by the framework wherever randomness is employed.

Parameters:
  • decision_point (str) – A unique identifier for a stream of random numbers. Typically represents a decision that needs to be made each time step like ‘moves_left’ or ‘gets_disease’.
  • for_initialization (bool) – A flag indicating whether this stream is used to generate key initialization information that will be used to identify simulants in the Common Random Number framework. These streams cannot be copied and should only be used to generate the state table columns specified in builder.configuration.randomness.key_columns.
Returns:

An entry point into the Common Random Number generation framework. The stream provides vectorized access to random numbers and a few other utilities.

Return type:

RandomnessStream

register_simulants(simulants)[source]

Registers simulants with the Common Random Number Framework.

Parameters:simulants (DataFrame) – A section of the state table with new simulants and at least the columns specified in builder.configuration.randomness.key_columns. This function should be called as soon as the key columns are generated.