Run a population model¶

You may want to split the population into multiple cohorts, each with different parameters. The cohort module offers tools to do so. This tutorial will show to work with this module.

Purpose of the cohort module¶

Consider the model from the Limits of Universes tutorial in which half of the population is born in 1995 and dies at the age of 80 and the other half is born in 1996 and dies at 78. In that tutorial, we saw that we cannot use two different Universes in the same Model to simulate these cohorts.

Alternatively, we could simply create and run two models.

from miscore import Model, Universe
from miscore.processes import Birth, OC

birth1995 = Birth(year=1995)
oc1995 = OC(age=80)
universe1995 = Universe(name="universe", processes=[birth1995, oc1995])
model1995 = Model(universes=[universe1995])
result1995 = model1995.run(n=1000, seed=123)

birth1996 = Birth(year=1996)
oc1996 = OC(age=78)
universe1996 = Universe(name="universe", processes=[birth1996, oc1996])
model1996 = Model(universes=[universe1996])
result1996 = model1996.run(n=1000, seed=456)

Now, we’d have two Result objects. This is inconvenient, especially when working with a lot of cohorts. Also, for reproducibility purposes, we’ll need to set a different seed for each run. Furthermore, as the cohorts can be run independently from each other, it would be interesting to distribute the work over multiple CPU cores.

The cohort takes care of all these problems: it aggregates the result, allows for passing a single seed and manages the multiprocessing.

Minimal working example¶

Now let’s build the same model using the cohort module. We create multiple Cohort which all contain one or more universes. The cohorts are then added to a PopulationModel. Finally, we specify the size ‘n’ for each of the cohorts.

from miscore import Universe
from miscore.processes import Birth, OC
from miscore.tools.cohort import Cohort, PopulationModel

birth1995 = Birth(year=1995)
oc1995 = OC(age=80)
universe1995 = Universe(name="universe", processes=[birth1995, oc1995])
cohort1995 = Cohort(name="cohort1995", universes=[universe1995])

birth1996 = Birth(year=1996)
oc1996 = OC(age=81)
universe1996 = Universe(name="universe", processes=[birth1996, oc1996])
cohort1996 = Cohort(name="cohort1996", universes=[universe1996])

model = PopulationModel(cohorts=[cohort1995, cohort1996])

result = model.run(
    n={
        "cohort1995": 1000,
        "cohort1996": 1000
    },
    seed=123
)

Working with more cohorts¶

When working with a large number of cohorts, it is better to generate the cohorts automatically. So, let’s write a function create_cohort() that creates the cohorts for us. The cohorts_input variable contains the parameters that vary across cohorts and can be extended to contain many more entries. In practice, you might want to read this data from e.g. a CSV file.

from miscore import Universe
from miscore.processes import Birth, OC
from miscore.tools.cohort import Cohort, PopulationModel


def create_cohort(name, birth_year, oc_age):
    birth = Birth(year=birth_year)
    oc = OC(age=oc_age)
    universe = Universe(name="universe", processes=[birth, oc])
    return Cohort(name=name, universes=[universe])


cohorts_input = [
    ("cohort1995", 1995, 80, 1000),
    ("cohort1996", 1996, 81, 1000)
]

cohorts = [create_cohort(*x[:3]) for x in cohorts_input]

model = PopulationModel(cohorts=cohorts)

result = model.run(
    n={x[0]: x[3] for x in cohorts_input},
    seed=123
)

Note

Here, x[:3] selects the first three elements of x. The * symbol then ‘unpacks’ these three elements and passes them separately to create_cohort(). Thus, create_cohort(*x[:3]) is equivalent to create_cohort(x[0], x[1], x[2]).

Screening¶

It’s also possible to apply screening in a PopulationModel. You can do so by adding more than one Universe to each cohort. In the following example, we add a universe with and without screening to each Cohort.

from miscore import processes, Universe
from miscore.processes import Birth, EC, EC_screening, OC
from miscore.tools.cohort import Cohort, PopulationModel


def create_cohort(name, birth_year, oc_age):
    birth = Birth(year=birth_year)
    oc = OC(age=oc_age)

    ec = EC.from_data(processes.ec.data.us)
    ec_screening = EC_screening.from_data(processes.ec_screening.data.example)

    no_screening = Universe(name="no_screening", processes=[birth, oc, ec])
    screening = Universe(name="screening", processes=[birth, oc, ec, ec_screening])

    return Cohort(name=name, universes=[no_screening, screening])


cohorts_input = [
    ("cohort1995", 1995, 80, 1000),
    ("cohort1996", 1996, 81, 1000)
]

cohorts = [create_cohort(*x[:3]) for x in cohorts_input]

model = PopulationModel(cohorts=cohorts)

result = model.run(
    n={x[0]: x[3] for x in cohorts_input},
    seed=123
)

Multiprocessing¶

The run() also has a cores argument (see Multiprocessing). You can therefore specify cores="all". If you’re using the cohort module to construct and run models with different cohorts, this is most likely the best way to efficiently run your simulation.

from miscore import processes, Universe
from miscore.processes import Birth, EC, EC_screening, OC
from miscore.tools.cohort import Cohort, PopulationModel


def create_cohort(name, birth_year, oc_age):
    birth = Birth(year=birth_year)
    oc = OC(age=oc_age)

    ec = EC.from_data(processes.ec.data.us)
    ec_screening = EC_screening.from_data(processes.ec_screening.data.example)

    no_screening = Universe(name="no_screening", processes=[birth, oc, ec])
    screening = Universe(name="screening", processes=[birth, oc, ec, ec_screening])

    return Cohort(name=name, universes=[no_screening, screening])


if __name__ == "__main__":
    cohorts_input = [
        ("cohort1995", 1995, 80, 1000),
        ("cohort1996", 1996, 81, 1000)
    ]

    cohorts = [create_cohort(*x[:3]) for x in cohorts_input]

    model = PopulationModel(cohorts=cohorts)

    result = model.run(
        n={x[0]: x[3] for x in cohorts_input},
        seed=123,
        cores=4
    )

Processing results¶

The Result returned by run() differs slightly from what is returned by the equivalent method of Model. All DataFrame objects in Result (i.e. properties, events, durations, individual and snapshots) have an extra index: the cohort name.

Note

Sometimes it does make sense to not use a PopulationModel but to use more than one Model, as shown in the first example of this tutorial. The main difference is whether your Result objects are merged or are kept separately. You should make your own choice what is most convenient in your case. For example, when your aim is not to compare different cohorts, but to compare different parameter sets, for example after a calibration, it might be more convenient to store the Result objects as separate files.