Using stratum¶
With population models, we can stratify our results by cohort to observe outcomes specific to a
particular sex or birth cohort. However, sometimes we may be interested in splitting up our results
by properties that are generated as part of our simulation. For this purpose, the
Stratum process can be employed. This tutorial provides a working
example of using Stratum to split up results by properties generated
during simulation, and also demonstrates how we can go further by employing dynamic stratification,
which allows individuals to change their stratum during their simulated lifetime.
A basic example¶
To execute the examples used in this tutorial, download the script 1_stratified_model.py.
For the sake of the tutorial, we will employ an example process which simulates Smurf lifetimes.
We assume that their total lifetime is Weibull distributed with shape 2 and scale 15, for an
average life expectancy of about 13 years. However, at year 5 some of our smurfs meet an early end
by being eaten by a cat. Smurfs may either be born clumsy, in which case they have a 0.5 probability
of being eaten, or regular, in which case they only have a 0.1 probability of being eaten.
For more information on creating processes such as these, see the tutorial Create a process.
This tutorial also assumes knowing how to build and run a model, as demonstrated in
Run a model. It will also be useful to have studied how to interpret
model output, as shown in Model output.
1from miscore import Model, Process, Universe
2from miscore.processes import Stratum
3from miscore.utils import WeibullDistribution
4
5
6class Clumsy(Process):
7 # Process to divide smurfs into clumsy or regular
8 def __init__(self, prob_clumsy: float = 0.5, name: str = "clumsy"):
9 self.prob_clumsy = prob_clumsy
10 self.name = name
11
12 def properties(self, rng, n, properties):
13 clumsy_category = rng.choice(
14 ["clumsy", "regular"],
15 size=n,
16 p=[self.prob_clumsy, 1 - self.prob_clumsy]
17 )
18 return {
19 "clumsy_category": clumsy_category
20 }
21
22
23class Smurf(Process):
24 event_tags = [
25 "eaten_early",
26 "smurf_death"
27 ]
28
29 duration_tags = [
30 "smurf_life"
31 ]
32
33 # Process to simulate smurf lifetimes
34 def __init__(self, survival_shape: float = 2., survival_scale: float = 15,
35 name: str = "smurfs"):
36 self.survival = WeibullDistribution(scale=survival_scale, shape=survival_shape)
37 self.name = name
38
39 self.callbacks = {
40 "__start__": self.schedule_survival,
41 "eaten_early": self.eaten_early,
42 "__end__": self.log_durations
43 }
44
45 def properties(self, rng, n, properties):
46 survival_r = rng.random(n)
47 eaten_early_r = rng.random(n)
48 return {
49 "survival_r": survival_r,
50 "eaten_early_r": eaten_early_r
51 }
52
53 def schedule_survival(self, individual):
54 individual.add_to_queue_age(5., "eaten_early")
55 natural_mortality_age = self.survival(individual.properties("survival_r"))
56 individual.add_to_queue_age(natural_mortality_age, message="smurf_death",
57 terminate=True)
58
59 def eaten_early(self, individual):
60 threshold = 0.1
61 if individual.properties("clumsy_category") == "clumsy":
62 threshold = 0.5
63 if individual.properties("eaten_early_r") < threshold:
64 individual.log_event("eaten_early")
65 individual.add_to_queue(
66 delta=0.,
67 message="smurf_death",
68 terminate=True
69 )
70
71 def log_durations(self, individual, message):
72 individual.log_duration(
73 tag="smurf_life",
74 start_age=0,
75 end_age=individual.age
76 )
Using the Stratum process¶
Given what we know about our smurf population, we may be interested in splitting up our results between
the clumsy and regular smurfs. To this end, we may create an instance of the Stratum
process that places our smurfs in separate strata when writing out our results. We do so by specifying a list of
conditions for each Stratum we want to generate. By default,
all individuals are placed in stratum 0. Each list of conditions reflect the properties that place an
individual in new stratums from stratum 1 onwards. The first list of conditions passed to stratum places individuals in
stratum 1, the second list places individuals in stratum 2, and so onwards for any number of lists that may
be passed to stratum.
Each condition is a tuple of the name of the property by which we wish to stratify, and a string of the expression we wish to evaluate on
that property. If multiple conditions are required for a stratum, multiple tuples may be entered into the list.
A valid call of Stratum will follow the form demonstrated below.
Stratum(
[("property_name_1", ">", 10)], # Singular condition for stratum 1
[("property_name_1", "<", 5), ("property_name_2", "!=", "blue")] # Two conditions for placing in stratum 2
)
Make sure that no overlap exists between the conditions supplied to Stratum, since this will
challenge the interpretability of the acquired results. Continuing, we define our strata, and
perform a model run to observe the results by stratum.
79clumsy = Clumsy()
80
81smurf = Smurf()
82
83stratum = Stratum(
84 [("clumsy_category", "==", "clumsy")]
85)
86
87universe = Universe(name="smurfs", processes=[clumsy, smurf, stratum])
88
89model = Model(
90 universes=[universe]
91)
92
93result = model.run(n=1000, seed=0)
94
95print(result.events)
96print(result.durations)
We see in our result.events that the stratum 1, which corresponds to our clumsy smurfs, has a much higher
count of being eaten than the stratum 0, which should contain all our regular smurfs.
universe |
stratum |
tag |
year |
age |
number |
|---|---|---|---|---|---|
smurfs |
0 |
eaten_early |
0 |
0 |
51 |
smurfs |
0 |
smurf_death |
0 |
0 |
512 |
smurfs |
1 |
eaten_early |
0 |
0 |
209 |
smurfs |
1 |
smurf_death |
0 |
0 |
488 |
We can also study the average lifetime of our smurfs in the result.durations DataFrame, where
we find clumsy smurfs from our first stratum to have a lower number of total lifeyears lived.
universe |
stratum |
tag |
year |
age |
number |
|---|---|---|---|---|---|
smurfs |
0 |
smurf_life |
0 |
0 |
6198.715778 |
smurfs |
1 |
smurf_life |
0 |
0 |
4542.202767 |
Note
For each example in this tutorial, strata are given in integer values 0 and 1, which is
how the strata are stored internally. To make your events, durations and snapshots
DataFrames more intuitive, you may also pass stratum_names as an argument to Model.run().
The argument must be a sequence of str values, where each entry denotes the name of the
corresponding stratum, such that the integer value of the stratum is the index of the string in the
sequence of stratum names. In this most recent example, we may pass stratum_names=["regular", "clumsy"] to obtain:
universe |
stratum |
tag |
year |
age |
number |
|---|---|---|---|---|---|
smurfs |
regular |
smurf_life |
0 |
0 |
6198.715778 |
smurfs |
clumsy |
smurf_life |
0 |
0 |
4542.202767 |
Dynamic stratification¶
The Stratum process assumes that we want to stratify our results by information given in the
properties dictionary. This allows us to split up results by the simulated characteristics of
our individuals, but does not allow any interaction between the stratum an individual is in, and the
events that take place during simulation. It may be interesting to stratify results by outcomes that
occur in the simulated events, which we generally do not know from the properties. For example,
we may wish to study the outcomes by whether someone has shown up for at least one screening event,
or we may have a risk factor that changes over time by which we may want to split up our results.
To stratify our results dynamically, we may use Individual.hop_stratum() in our Process. This
function notes a change in stratum for the individual at the age at which the call to the function occurs.
From this age onwards, any events and durations are written to the new stratum. Any events and durations
that occurred before this age are still written to the old stratum.
Note
Note that the Individual.hop_stratum() call must be integrated in the process definition
to be used, as opposed to the Stratum process, which can be defined after-the-fact. We
therefore encourage that any use of hop_stratum includes an option in the process __init__
for turning the stratum changes on and off. Dynamic stratification is expected to be a slight
computational burden, and of course sometimes we may wish to have aggregated, rather than
stratified results.
Note
Memory is allocated to store a maximum of 256 hops per individual.
To demonstrate this function, we define a screening process for identifying clumsy smurfs, where
we implement a change in stratum for anyone with a correct screening result. The screening is
performed at year 4., and has a 0.5 probability of correctly identifying whether a smurf is clumsy
or regular. We place any smurf with a correct screening result into a new stratum with the
Individual.hop_stratum() method.
After identification, a clumsy smurf has a 0.6 probability of being protected from their clumsiness,
at which point we remove their eaten_early event from the queue, saving them from an early demise.
99class Smurf_Screen(Process):
100 event_tags = ["smurf_screen"]
101
102 def __init__(self, test_sens: float = 0.5, cure_prob: float = 0.6):
103 self.test_sens = test_sens
104 self.cure_prob = cure_prob
105
106 self.callbacks = {
107 "__start__": self.schedule_screen,
108 "smurf_screen": self.smurf_screen
109 }
110
111 def properties(self, rng, n, properties):
112 return {
113 "test_r": rng.random(n),
114 "cure_r": rng.random(n)
115 }
116
117 def schedule_screen(self, individual):
118 individual.add_to_queue_age(4, "smurf_screen")
119
120 def smurf_screen(self, individual):
121 individual.log_event("smurf_screen")
122 if individual.properties("test_r") < self.test_sens:
123 individual.hop_stratum(1)
124 if individual.properties("clumsy_category"):
125 if individual.properties("cure_r") < self.cure_prob:
126 individual.remove_from_queue(0)
We now have a process which performs dynamic stratification, such that any smurf with an accurate
screening is placed in a new stratum from the moment of the screen onwards. However, to correctly run the model,
we need to inform the model.run() method that we will be performing such a change in stratum. Since
the model is blind to the workings of any process, it cannot pre-allocate space in our results for additional
strata, unless we explicitly tell it that we will be adding strata during simulation. By supplying the
argument n_stratum_hops=1, the model knows it should allocate for at least 2 strata, since we will be
adding another on top of the default stratum of 0.
We run our model with screening, and split up our result.durations by the period before and after
our screening to study the difference in outcomes between our correctly screened group and the group
with false results.
129smurf_screen = Smurf_Screen()
130
131screening_universe = Universe(name="screened_smurfs", processes=[clumsy, smurf, smurf_screen])
132
133model = Model(universes=[screening_universe])
134
135result = model.run(n=1000, duration_years=[4], n_stratum_hops=1, seed=0)
136
137print(result.events)
138print(result.durations)
Looking at our result.events, we see that every smurf is still in stratum 0 at the time of the
smurf_screen event, after which the group of identified smurfs is split up. We see that the
stratum 1 group has a much lower rate of being eaten early than the stratum 0 group, demonstrating
the positive outcome the screening has on the clumsy smurfs among stratum 1.
universe |
stratum |
tag |
year |
age |
number |
|---|---|---|---|---|---|
screened_smurfs |
0 |
eaten_early |
0 |
0 |
135 |
screened_smurfs |
0 |
smurf_death |
0 |
0 |
504 |
screened_smurfs |
0 |
smurf_screen |
0 |
0 |
926 |
screened_smurfs |
1 |
eaten_early |
0 |
0 |
59 |
screened_smurfs |
1 |
smurf_death |
0 |
0 |
496 |
The result.durations table shows that before year 4.0, every smurf is still allocated to stratum 0.
After year 4.0, we observe that stratum 1 has a much higher cumulative number of lifeyears, even
though the number of smurfs in each stratum should be roughly equal, demonstrating the improved outcomes
for smurfs with a correct screening.
universe |
stratum |
tag |
year |
age |
number |
|---|---|---|---|---|---|
screened_smurfs |
0 |
smurf_life |
0.0 |
0 |
3904.970480 |
screened_smurfs |
0 |
smurf_life |
4.0 |
0 |
3096.596602 |
screened_smurfs |
1 |
smurf_life |
4.0 |
0 |
4352.426803 |
Preserving stratum from the first universe¶
We often perform analyses comparing outcomes between universes, where one universe may select individuals for a screening, and the other has a different screening process, or no screening at all. To compare the outcomes of screening participants particularly, we may use dynamic stratification to put individuals into a stratum depending on their outcomes in one universe and then preserve this stratum in the other universes. For example, we may study the outcomes of screening eligibles in a universe in which there is no screening at all, even if screening eligibility is only revealed during the simulation process. Using MISCore, we can either preserve the dynamic stratification of the first universe or we can preserve the final stratum of each individual.
By preserving the dynamic stratification, the moments at which the simulated individuals hop across the strata
in the first universe is stored, and applied in the other universes.
For our smurfs, we may do so by running the same analysis as before, and
then running a non-screening universe in which we preserve the stratum hops from the screening universe. We do so
by setting preserve_stratum="dynamic" in model.run(). Note that we can only preserve the stratum hops
from the first universe in the universes argument, as otherwise the model does not have the correct stratum
at hand when noting results. We therefore run screened_smurfs as our first universe, since this is the universe
in which our initial dynamic stratification occurs.
140no_screening_universe = Universe(name="no_screening", processes=[clumsy, smurf])
141
142model = Model([screening_universe, no_screening_universe])
143
144result = model.run(n=1000, duration_years=[4], n_stratum_hops=1,
145 preserve_stratum="dynamic", seed=0)
146
147print(result.events)
148print(result.durations)
Studying our result.durations dataframe, we now see the explicit loss of lifeyears of the
correctly screened group in the non-screening universe. Up to year 4.0, the two strata both exist
unperturbed and have an equal number of lifeyears in each universe. However, after year 4.0 the
group with a true screening result have a higher number of lifeyears lived in the screening universe
than they do in the non-screening universe. The group with a false screening result (stratum 0)
have no change to their lifeyears lived between universes.
universe |
stratum |
tag |
year |
age |
number |
|---|---|---|---|---|---|
screened_smurfs |
0 |
smurf_life |
0.0 |
0 |
3904.970480 |
screened_smurfs |
0 |
smurf_life |
4.0 |
0 |
3096.596602 |
screened_smurfs |
1 |
smurf_life |
4.0 |
0 |
4352.426803 |
no_screening |
0 |
smurf_life |
0.0 |
0 |
3904.970480 |
no_screening |
0 |
smurf_life |
4.0 |
0 |
3096.596602 |
no_screening |
1 |
smurf_life |
4.0 |
0 |
3739.351464 |
Alternatively, we can also preserve the final stratum of the first universe for all individuals.
This way, individuals do not hop across strata in the second universe anymore. Instead,
individuals are born in the stratum that they died in during the first universe and do not change stratum during
their lives. To do so, we set preserve_stratum="final" in model.run(). Again, we run
screened_smurfs as our first universe as we can only preserve the final stratum of the first universe in the
universes argument.
150result = model.run(n=1000, duration_years=[4], n_stratum_hops=1,
151 preserve_stratum="final", seed=0)
152
153print(result.events)
154print(result.durations)
In our result.durations dataframe, we now see 1984 years in stratum 1 from year
0 to 4. This means that some of our smurfs were born in stratum 1. For that reason, we see
that the life years in statum 0 from year 0 to 4 has decreased by 1984 compared to preserving
the dynamic stratification. After year 4, the smurfs are in the same stratum as they were
under preserving dynamic stratifcation, so the number has not changed. Preserving the final
stratum can be used to compute the overdiagnosis of a screening programme (it is currently
implemented in the BC process).
universe |
stratum |
tag |
year |
age |
number |
|---|---|---|---|---|---|
screened_smurfs |
0 |
smurf_life |
0.0 |
0 |
3904.970480 |
screened_smurfs |
0 |
smurf_life |
4.0 |
0 |
3096.596602 |
screened_smurfs |
1 |
smurf_life |
4.0 |
0 |
4352.426803 |
no_screening |
0 |
smurf_life |
0.0 |
0 |
1920.970480 |
no_screening |
0 |
smurf_life |
4.0 |
0 |
3096.596602 |
no_screening |
1 |
smurf_life |
0.0 |
0 |
1984.000000 |
no_screening |
1 |
smurf_life |
4.0 |
0 |
3739.351464 |
Using dynamic stratification to isolate prevalent cases¶
Calibration efforts are often focused on populations which have been filtered in some way. For example, clinical trials likely include individuals with a particular age profile, but more importantly, includes individuals who are not in the clinical phase of the disease of interest at baseline. When we calibrate to such trials, we want to apply a similar filtering to make our simulation representative of the situation which generated the data.
Whether someone is in a particular state at a particular year is not something known beforehand to the population. By applying dynamic stratification, we can step in at the relevant timepoint to separate our results into the results of interest (all individuals not in a clinical phase) and the results of the individuals that we want to filter out. Consider below an example of how to do so in a simple lung cancer screening process.
1from miscore import Process
2
3
4class lc_screening_filtered(Process):
5 event_tags = ["screen"]
6
7 def __init__(self, ct_sensitivity=0.80, start_age=50.0, end_age=80.0, interval=1.0):
8 self.ct_sensitivity = ct_sensitivity
9 self.start_age = start_age
10 self.end_age = end_age
11 self.interval = interval
12
13 self.callbacks = {
14 "__start__": self.schedule_screen,
15 "screen": self.lc_screen
16 }
17
18 def properties(self, rng, n, properties):
19 return {
20 "test_r": rng.random(n)
21 }
22
23 def schedule_screen(self, individual):
24 individual.add_to_queue_age(self.start_age, "screen")
25
26 def lc_screen(self, individual):
27 if ("lc_clinical" in individual.memory):
28 if (individual.age == self.start_age):
29 individual.hop_stratum(1)
30 return
31
32 individual.log_event("lc_screen")
33 if individual.age <= (self.end_age - self.interval):
34 individual.add_to_queue(1.0, "screen")
35
36 if individual.properties("test_r") < self.ct_sensitivity:
37 if "lc_state" in individual.memory:
38 individual.add_to_queue(0., "lc_clinical", stage="IA", screen_detected=True)
Note that individuals with clinical lung cancer never participate in screening, since their screening
visit always ends in a return statement before any transition to screen-detected lung cancer occurs.
However, only individuals at baseline (those who match the start_age) are placed in a different stratum.
This ensures that individuals who get a clinical lung cancer in the intermittent period (i.e. those who
would already be included by their health status at baseline) are still accounted for in our analysis.
From the age at baseline onwards, all results for individuals with prevalent cases are written to a
different stratum. One could also terminate the simulation for these individuals by adding a mortality event together
with the hop_stratum call. However, we recommend leaving them in at least initially to check that your
stratification happens correctly.
Note
This approach requires adjusting your disease or your disease screening process. You may also
construct a secondary process which performs these checks just before the screening event,
but special care must be taken that the sequence of events is performed correctly, such that
no events of those with prevalent cases are written to the default stratum (use the priority argument
for the add_to_queue method to schedule events at the same time but have them processed first).