Take snapshots

This tutorial shows how to use the snapshot arguments of the run() method of Model. This functionality can for example be used to calculate prevalence at certain ages (or years).

A simple disease model

Consider the following simple process and model. The Disease process draws some random age from a uniform distribution at which the disease starts. At this age, the disease_start event is logged. After a random duration (drawn from a uniform distribution) the individual dies from the disease. Besides this Disease process, the OC process is incorporated in the model.

 1from miscore import Model, Process, Universe
 2from miscore.processes import OC
 3
 4
 5class Disease(Process):
 6    event_tags = [
 7        "disease_start",
 8        "disease_death"
 9    ]
10
11    def __init__(self, start_age, dwell, name="disease"):
12        self.start_age = start_age
13        self.dwell = dwell
14        self.name = name
15
16        self.callbacks = {
17            "__start__": self.schedule,
18            "disease_start": self.start
19        }
20
21    def properties(self, rng, n, properties):
22        return {
23            "disease_start_age": rng.uniform(*self.start_age, n),
24            "disease_dwell": rng.uniform(*self.dwell, n)
25        }
26
27    def schedule(self, individual):
28        individual.add_to_queue_age(
29            individual.properties("disease_start_age"), "disease_start"
30        )
31
32    def start(self, individual):
33        individual.log_event("disease_start")
34        individual.add_to_queue(
35            individual.properties("disease_dwell"), "disease_death",
36            terminate=True
37        )
38
39
40oc = OC(life_table=[(0, 50), (.3, 80), (1, 100)])
41
42disease = Disease(start_age=(40, 80), dwell=(1, 10))
43
44universe = Universe(name="universe", processes=[oc, disease])
45
46model = Model(universes=[universe])
47
48result = model.run(
49    n=1000,
50    seed=123,
51    event_ages=range(100)
52)

Binary snapshots

It might be interesting to know how many people have the disease at a specific age. This is what the snapshot functionality of MISCore can be used for.

Let’s add a "disease" entry in the memory when the disease starts. This enables us to determine at any time whether the individual has the disease or not. Next, let’s define a function returning a set with an element "disease" if the individual has the disease (i.e. if the memory contains a "disease" entry). In the run() method of Model we can now pass our snapshot function. Note that it’s possible to pass multiple snapshot functions. We should also pass the tags that we wish to log (only "disease" in this case). Finally, we should pass a list with the ages at which the snapshot functions should be called.

 1from miscore import Model, Process, Universe
 2from miscore.processes import OC
 3
 4
 5class Disease(Process):
 6    event_tags = [
 7        "disease_start",
 8        "disease_death"
 9    ]
10
11    def __init__(self, start_age, dwell, name="disease"):
12        self.start_age = start_age
13        self.dwell = dwell
14        self.name = name
15
16        self.callbacks = {
17            "__start__": self.schedule,
18            "disease_start": self.start
19        }
20
21    def properties(self, rng, n, properties):
22        return {
23            "disease_start_age": rng.uniform(*self.start_age, n),
24            "disease_dwell": rng.uniform(*self.dwell, n)
25        }
26
27    def schedule(self, individual):
28        age = individual.properties("disease_start_age")
29        individual.add_to_queue_age(age, "disease_start")
30
31    def start(self, individual):
32        individual.log_event("disease_start")
33        individual.memory["disease"] = True
34        individual.add_to_queue(
35            individual.properties("disease_dwell"), "disease_death",
36            terminate=True
37        )
38
39
40def take_snapshot(individual):
41    if "disease" in individual.memory:
42        return {"disease"}
43
44
45oc = OC(life_table=[(0, 50), (.3, 80), (1, 100)])
46
47disease = Disease(start_age=(40, 80), dwell=(1, 10))
48
49universe = Universe(name="universe", processes=[oc, disease])
50
51model = Model(universes=[universe])
52
53result = model.run(
54    n=1000,
55    seed=123,
56    event_ages=range(100),
57    snapshot_tags=["disease"],
58    snapshot_functions=[take_snapshot],
59    snapshot_ages=[40, 50, 60, 70, 80, 90]
60)
61
62result.snapshots_ages.to_csv("snapshots_ages.csv")

The above method allows for binary snapshots only: every individual gets a 1 (diseased) or a 0 (not diseased). This is useful to measure the prevalence of a disease. For measuring the multiplicity of adenomas or risk factors, it may be useful to take integer snapshots.

Warning

Snapshot functions should not add, remove or modify events from the simulation queue, nor modify the Individual.memory. This is because when multiple snapshots precede the next event in the queue, these snapshots are taken at once. Therefore, changes in the memory or simulation queue are not incorporated in these snapshots.

Note

The snapshot functions are called for every individual in every universe at all ages specified (unless the simulation has been terminated). Thus, using many functions or many ages can significantly decrease performance. If performance is important, try to minimize the number of functions and ages. Also, try to write efficient snapshots functions.

Note

In this simple scenario, we might be able to determine the number of people with the disease at any age from the events. In fact, when carefully logging the right events, this is probably always possible. However, with more complex processes this can get really complicated. Snapshots will probably be easier to implement, understand and maintain.

Integer snapshots

Integer snapshots would be useful if for example we would modify our simple Disease process such that a person gets the disease only when it has n_risk_factors risk factors. For that, the process draws n_risk_factors random ages from a uniform distribution at which these risk factors “start”. Once the individual has all risk factors, it is diseased and it dies after a random duration. The memory item "disease_risk_factors" tracks the number of risk factors an individual has. The differences with the previous script are highlighted:

 1from miscore import Model, Process, Universe
 2from miscore.processes import OC
 3
 4
 5class Disease(Process):
 6    event_tags = [
 7        "disease_start",
 8        "disease_death"
 9    ]
10
11    def __init__(self, risk_factor_age, n_risk_factors, dwell, name="disease"):
12        self.risk_factor_age = risk_factor_age
13        self.n_risk_factors = n_risk_factors
14        self.dwell = dwell
15        self.name = name
16
17        self.callbacks = {
18            "__start__": self.schedule,
19            "disease_risk_factor_onset": self.risk_factor_onset
20        }
21
22    def properties(self, rng, n, properties):
23        return {
24            "disease_risk_factors_age": rng.uniform(*self.risk_factor_age,
25                                                    (n, self.n_risk_factors)),
26            "disease_dwell": rng.uniform(*self.dwell, n)
27        }
28
29    def schedule(self, individual):
30        for age in individual.properties("disease_risk_factors_age"):
31            individual.add_to_queue_age(age, "disease_risk_factor_onset")
32
33    def risk_factor_onset(self, individual):
34        if "disease_risk_factors" not in individual.memory:
35            individual.memory["disease_risk_factors"] = 1
36        else:
37            individual.memory["disease_risk_factors"] += 1
38            if individual.memory["disease_risk_factors"] >= self.n_risk_factors:
39                self.start(individual)
40
41    def start(self, individual):
42        individual.log_event("disease_start")
43        individual.memory["disease"] = True
44        individual.add_to_queue(
45            individual.properties("disease_dwell"), "disease_death",
46            terminate=True
47        )

We may be interested in the total number of risk factors in the population at a specific age. Now an individual should not be represented by a 0 or 1, as it may have up to n_risk_factors risk factors. Therefore our new snapshot function take_snapshot_risk_factors returns a dictionary with a snapshot tag as key and the number of risk factors as value. The number of risk factors of an individual is now represented by an integer, rather than a 0 or 1, allowing us to measure the multiplicity.

50def take_snapshot_disease(individual):
51    if "disease" in individual.memory:
52        return {"disease"}
53
54
55def take_snapshot_risk_factors(individual):
56    if "disease_risk_factors" in individual.memory:
57        return {"disease_risk_factors": individual.memory["disease_risk_factors"]}
58
59
60oc = OC(life_table=[(0, 50), (.3, 80), (1, 100)])
61
62disease = Disease(risk_factor_age=(40, 80), n_risk_factors=3, dwell=(1, 10))
63
64universe = Universe(name="universe", processes=[oc, disease])
65
66model = Model(universes=[universe])
67
68result = model.run(
69    n=1000,
70    seed=123,
71    event_ages=range(100),
72    snapshot_tags=["disease", "disease_risk_factors"],
73    snapshot_functions=[take_snapshot_disease, take_snapshot_risk_factors],
74    snapshot_ages=[40, 50, 60, 70, 80, 90]
75)
76
77result.snapshots_ages.to_csv("snapshots_ages.csv")

Warning

The returned tags of two snapshot functions may never overlap as the values will be overwritten. Whenever multiple snapshot functions are given, a warning is raised to remind the user of this. Note that any set of snapshot functions can be written as one function, to suppress the warning. In our example, the following snapshot function can replace our two snapshot functions:

def take_snapshot(individual):
    res = {}
    if "disease" in individual.memory:
        res["disease"] = 1
    if "disease_risk_factors" in individual.memory:
        res["disease_risk_factors"] = individual.memory["disease_risk_factors"]
    return res

Process result

The Result object obtained by running this simulation has a snapshots_ages attribute, which is a DataFrame. The table below shows what this DataFrame looks like in this case. Note that ages 40 and 90 are not included, as the disease and risk factors were not present in the population at these ages.

universe

stratum

tag

age

number

universe

0

disease

50.0

13

universe

0

disease

60.0

60

universe

0

disease

70.0

158

universe

0

disease

80.0

247

universe

0

disease_risk_factors

50.0

755

universe

0

disease_risk_factors

60.0

1225

universe

0

disease_risk_factors

70.0

1254

universe

0

disease_risk_factors

80.0

741

Snapshot_years

It is also possible to take snapshots at specific years instead of ages. This may be convenient when you run a PopulationModel and you are interested in the prevalence of disease in certain years. In that case you can use the argument snapshot_years to specify the years to take a snapshot.

We add a Birth process with a birth table to our example, and specify several years in which a snapshot is taken:

50def take_snapshot_disease(individual):
51    if "disease" in individual.memory:
52        return {"disease"}
53
54
55def take_snapshot_risk_factors(individual):
56    if "disease_risk_factors" in individual.memory:
57        return {"disease_risk_factors": individual.memory["disease_risk_factors"]}
58
59
60birth = Birth(birth_table=[(0, 1945), (1, 1975)])
61
62oc = OC(life_table=[(0, 50), (.3, 80), (1, 100)])
63
64disease = Disease(risk_factor_age=(40, 80), n_risk_factors=3, dwell=(1, 10))
65
66universe = Universe(name="universe", processes=[oc, disease])
67
68model = Model(universes=[universe])
69
70result = model.run(
71    n=1000,
72    seed=123,
73    event_ages=range(100),
74    snapshot_tags=["disease", "disease_risk_factors"],
75    snapshot_functions=[take_snapshot_disease, take_snapshot_risk_factors],
76    snapshot_years=[1995, 2005, 2015, 2025, 2035]
77)
78
79result.snapshots_years.to_csv("snapshots_years.csv")

The Result object now has a snapshots_years attribute that contains the output of the snapshots.

It is also possible to specify both snapshot_ages and snapshot_years. The snapshots will be taken at these ages and at these years. The Result object then contains both snapshots_ages and snapshots_years attributes.