Population Data Transformations

Provide tools for handling raw demographic data and transforming it into different distributions for sampling.

vivarium_public_health.population.data_transformations.add_age_midpoint(data)[source]

Add an age column as the midpoint of age_start and age_end.

Parameters:

data (DataFrame) – A DataFrame with age_start and age_end columns.

Return type:

DataFrame

Returns:

The input DataFrame with an age column added in place.

vivarium_public_health.population.data_transformations.assign_demographic_proportions(population_data, include_sex)[source]

Calculate conditional probabilities on the provided population data for sampling.

Parameters:
  • population_data (DataFrame) – Table with columns ‘age’, ‘sex’, ‘year’, ‘location’, and ‘value’

  • include_sex (str) – ‘Female’, ‘Male’, or ‘Both’. Sexes to include in the distribution.

Return type:

DataFrame

Returns:

Table with columns

’age’ : Midpoint of the age group, ‘age_start’ : Lower bound of the age group, ‘age_end’ : Upper bound of the age group, ‘sex’ : ‘Male’ or ‘Female’, ‘location’ : location, ‘year’ : Year, ‘population’ : Total population estimate, ‘P(sex, location | age, year)’ : Conditional probability of sex and location given age and year, ‘P(sex, location, age | year)’ : Conditional probability of sex, location, and age given year, ‘P(age | year, sex, location)’ : Conditional probability of age given year, sex, and location.

vivarium_public_health.population.data_transformations.rescale_binned_proportions(pop_data, age_start, age_end)[source]

Reshape the distribution so that bin edges fall on the age_start and age_end.

Parameters:
  • pop_data (DataFrame) – Table with columns ‘age’, ‘age_start’, ‘age_end’, ‘sex’, ‘year’, ‘location’, ‘population’, ‘P(sex, location, age| year)’, ‘P(sex, location | age, year)’, ‘P(age | year, sex, location)’

  • age_start (float) – The starting age for the rescaled bins.

  • age_end (float) – The terminal age for the rescaled bins.

Return type:

DataFrame

Returns:

Table with the same columns as pop_data where all bins outside the range (age_start, age_end) have been discarded. If age_start and age_end don’t fall cleanly on age boundaries, the bins in which they lie are clipped and the ‘population’, ‘P(sex, location, age| year)’, and ‘P(age | year, sex, location)’ values are rescaled to reflect their smaller representation.

class vivarium_public_health.population.data_transformations.AgeValues(current, young, old)
current

Alias for field number 0

old

Alias for field number 2

young

Alias for field number 1

class vivarium_public_health.population.data_transformations.EndpointValues(left, right)
left

Alias for field number 0

right

Alias for field number 1

vivarium_public_health.population.data_transformations.smooth_ages(simulants, population_data, randomness)[source]

Distribute simulants among ages within their assigned age bins.

Parameters:
  • simulants (DataFrame) – Table with columns ‘age’, ‘sex’, and ‘location’

  • population_data (DataFrame) – Table with columns ‘age’, ‘sex’, ‘year’, ‘location’, ‘population’, ‘P(sex, location, age| year)’, ‘P(sex, location | age, year)’, ‘P(age | year, sex, location)’

  • randomness (RandomnessStream) – Source of random number generation within the vivarium common random number framework.

Return type:

DataFrame

Returns:

Table with same columns as simulants with ages smoothed out within the age bins.

vivarium_public_health.population.data_transformations.get_cause_deleted_mortality_rate(all_cause_mortality_rate, list_of_csmrs)[source]

Compute the cause-deleted mortality rate by subtracting individual CSMRs.

Parameters:
  • all_cause_mortality_rate (DataFrame) – DataFrame with standard age/sex/year index columns and a value column.

  • list_of_csmrs (list[DataFrame | None]) – List of DataFrames each containing a value column representing a cause-specific mortality rate. None entries are skipped.

Return type:

DataFrame

Returns:

DataFrame with the same index columns and a death_due_to_other_causes column containing the residual mortality rate after subtracting all provided cause-specific rates.