Population Data Transformations

Provide tools for handling raw demographic data and transforming it into different distributions for sampling.

vivarium_public_health.population.data_transformations.assign_demographic_proportions(population_data, include_sex)[source]

Calculate conditional probabilities on the provided population data for sampling.

Parameters:
  • population_data (DataFrame) – Table with columns ‘age’, ‘sex’, ‘year’, ‘location’, and ‘value’

  • include_sex (str) – ‘Female’, ‘Male’, or ‘Both’. Sexes to include in the distribution.

Return type:

DataFrame

Returns:

Table with columns

’age’ : Midpoint of the age group, ‘age_start’ : Lower bound of the age group, ‘age_end’ : Upper bound of the age group, ‘sex’ : ‘Male’ or ‘Female’, ‘location’ : location, ‘year’ : Year, ‘population’ : Total population estimate, ‘P(sex, location | age, year)’ : Conditional probability of sex and location given age and year, ‘P(sex, location, age | year)’ : Conditional probability of sex, location, and age given year, ‘P(age | year, sex, location)’ : Conditional probability of age given year, sex, and location.

vivarium_public_health.population.data_transformations.rescale_binned_proportions(pop_data, age_start, age_end)[source]

Reshape the distribution so that bin edges fall on the age_start and age_end.

Parameters:
  • pop_data (DataFrame) – Table with columns ‘age’, ‘age_start’, ‘age_end’, ‘sex’, ‘year’, ‘location’, ‘population’, ‘P(sex, location, age| year)’, ‘P(sex, location | age, year)’, ‘P(age | year, sex, location)’

  • age_start (float) – The starting age for the rescaled bins.

  • age_end (float) – The terminal age for the rescaled bins.

Return type:

DataFrame

Returns:

Table with the same columns as pop_data where all bins outside the range (age_start, age_end) have been discarded. If age_start and age_end don’t fall cleanly on age boundaries, the bins in which they lie are clipped and the ‘population’, ‘P(sex, location, age| year)’, and ‘P(age | year, sex, location)’ values are rescaled to reflect their smaller representation.

class vivarium_public_health.population.data_transformations.AgeValues(current, young, old)
current

Alias for field number 0

old

Alias for field number 2

young

Alias for field number 1

class vivarium_public_health.population.data_transformations.EndpointValues(left, right)
left

Alias for field number 0

right

Alias for field number 1

vivarium_public_health.population.data_transformations.smooth_ages(simulants, population_data, randomness)[source]

Distribute simulants among ages within their assigned age bins.

Parameters:
  • simulants (DataFrame) – Table with columns ‘age’, ‘sex’, and ‘location’

  • population_data (DataFrame) – Table with columns ‘age’, ‘sex’, ‘year’, ‘location’, ‘population’, ‘P(sex, location, age| year)’, ‘P(sex, location | age, year)’, ‘P(age | year, sex, location)’

  • randomness (RandomnessStream) – Source of random number generation within the vivarium common random number framework.

Return type:

DataFrame

Returns:

Table with same columns as simulants with ages smoothed out within the age bins.

vivarium_public_health.population.data_transformations.get_cause_deleted_mortality_rate(all_cause_mortality_rate, list_of_csmrs)[source]

Compute the cause-deleted mortality rate by subtracting individual CSMRs.

Parameters:
  • all_cause_mortality_rate (DataFrame) – DataFrame with standard age/sex/year index columns and a value column.

  • list_of_csmrs (list[DataFrame | None]) – List of DataFrames each containing a value column representing a cause-specific mortality rate. None entries are skipped.

Return type:

DataFrame

Returns:

DataFrame with the same index columns and a death_due_to_other_causes column containing the residual mortality rate after subtracting all provided cause-specific rates.

vivarium_public_health.population.data_transformations.load_population_structure(builder)[source]

Load population structure data from the artifact and add derived columns.

Parameters:

builder (Builder) – Access point for utilizing framework interfaces during setup.

Return type:

DataFrame

Returns:

DataFrame with all columns from the raw data plus age and location.

vivarium_public_health.population.data_transformations.get_live_births_per_year(builder)[source]

Compute the simulated number of live births per year.

Combines population structure data with live birth covariate data to produce a per-year birth rate scaled to the simulation’s initial population size. Handles time-dependent vs. fixed birth rates and population fractions according to the fertility configuration, and extends the series to cover the simulation end year if needed.

Parameters:

builder (Builder) – Access point for utilizing framework interfaces during setup.

Return type:

Series

Returns:

A pandas.Series indexed by year containing the expected number of new simulant births per year.

vivarium_public_health.population.data_transformations.rescale_final_age_bin(builder, population_data)[source]

Clip and rescale the final age bin to match initialization_age_max.

When population.initialization_age_max is configured and falls within an existing age bin, that bin is truncated at initialization_age_max and its value is scaled proportionally to reflect the reduced width.

Parameters:
  • builder (Builder) – Access point for utilizing framework interfaces during setup.

  • population_data (DataFrame) – DataFrame with columns age_start, age_end, and value.

Return type:

DataFrame

Returns:

A copy of population_data with the final age bin adjusted to end at initialization_age_max and its value rescaled accordingly. Returned unchanged if initialization_age_max is not set.

vivarium_public_health.population.data_transformations.validate_crude_birth_rate_data(builder, data_year_max)[source]

Validate that birth rate data covers the simulation time period.

Parameters:
  • builder (Builder) – Access point for utilizing framework interfaces during setup.

  • data_year_max (int) – The latest year covered by the available birth rate data.

Raises:

ValueError – If the simulation end year exceeds data_year_max and extrapolation is not enabled.

Return type:

None