Using stratum

With population models, we can stratify our results by cohort to observe outcomes specific to a particular sex or birth cohort. However, sometimes we may be interested in splitting up our results by properties that are generated as part of our simulation. For this purpose, the Stratum process can be employed. This tutorial provides a working example of using Stratum to split up results by properties generated during simulation, and also demonstrates how we can go further by employing dynamic stratification, which allows individuals to change their stratum during their simulated lifetime.

A basic example

To execute the examples used in this tutorial, download the script 1_stratified_model.py.

For the sake of the tutorial, we will employ an example process which simulates Smurf lifetimes. We assume that their total lifetime is Weibull distributed with shape 2 and scale 15, for an average life expectancy of about 13 years. However, at year 5 some of our smurfs meet an early end by being eaten by a cat. Smurfs may either be born clumsy, in which case they have a 0.5 probability of being eaten, or regular, in which case they only have a 0.1 probability of being eaten. For more information on creating processes such as these, see the tutorial Create a process. This tutorial also assumes knowing how to build and run a model, as demonstrated in Run a model. It will also be useful to have studied how to interpret model output, as shown in Model output.

 1from miscore import Model, Process, Universe
 2from miscore.processes import Stratum
 3from miscore.utils import WeibullDistribution
 4
 5
 6class Clumsy(Process):
 7    # Process to divide smurfs into clumsy or regular
 8    def __init__(self, prob_clumsy: float = 0.5, name: str = "clumsy"):
 9        self.prob_clumsy = prob_clumsy
10        self.name = name
11
12    def properties(self, rng, n, properties):
13        clumsy_category = rng.choice(
14            ["clumsy", "regular"],
15            size=n,
16            p=[self.prob_clumsy, 1 - self.prob_clumsy]
17        )
18        return {
19            "clumsy_category": clumsy_category
20        }
21
22
23class Smurf(Process):
24    event_tags = [
25        "eaten_early",
26        "smurf_death"
27    ]
28
29    duration_tags = [
30        "smurf_life"
31    ]
32
33    # Process to simulate smurf lifetimes
34    def __init__(self, survival_shape: float = 2., survival_scale: float = 15,
35                 name: str = "smurfs"):
36        self.survival = WeibullDistribution(scale=survival_scale, shape=survival_shape)
37        self.name = name
38
39        self.callbacks = {
40            "__start__": self.schedule_survival,
41            "eaten_early": self.eaten_early,
42            "__end__": self.log_durations
43        }
44
45    def properties(self, rng, n, properties):
46        survival_r = rng.random(n)
47        eaten_early_r = rng.random(n)
48        return {
49            "survival_r": survival_r,
50            "eaten_early_r": eaten_early_r
51        }
52
53    def schedule_survival(self, individual):
54        individual.add_to_queue_age(5., "eaten_early")
55        natural_mortality_age = self.survival(individual.properties("survival_r"))
56        individual.add_to_queue_age(natural_mortality_age, message="smurf_death",
57                                    terminate=True)
58
59    def eaten_early(self, individual):
60        threshold = 0.1
61        if individual.properties("clumsy_category") == "clumsy":
62            threshold = 0.5
63        if individual.properties("eaten_early_r") < threshold:
64            individual.log_event("eaten_early")
65            individual.add_to_queue(
66                delta=0.,
67                message="smurf_death",
68                terminate=True
69            )
70
71    def log_durations(self, individual, message):
72        individual.log_duration(
73            tag="smurf_life",
74            start_age=0,
75            end_age=individual.age
76        )

Using the Stratum process

Given what we know about our smurf population, we may be interested in splitting up our results between the clumsy and regular smurfs. To this end, we may create an instance of the Stratum process that places our smurfs in separate strata when writing out our results. We do so by specifying a list of conditions for each Stratum we want to generate. By default, all individuals are placed in stratum 0. Each list of conditions reflect the properties that place an individual in new stratums from stratum 1 onwards. The first list of conditions passed to stratum places individuals in stratum 1, the second list places individuals in stratum 2, and so onwards for any number of lists that may be passed to stratum. Each condition is a tuple of the name of the property by which we wish to stratify, and a string of the expression we wish to evaluate on that property. If multiple conditions are required for a stratum, multiple tuples may be entered into the list. A valid call of Stratum will follow the form demonstrated below.

Stratum(
    [("property_name_1", ">", 10)], # Singular condition for stratum 1
    [("property_name_1", "<", 5), ("property_name_2", "!=", "blue")] # Two conditions for placing in stratum 2
)

Make sure that no overlap exists between the conditions supplied to Stratum, since this will challenge the interpretability of the acquired results. Continuing, we define our strata, and perform a model run to observe the results by stratum.

79clumsy = Clumsy()
80
81smurf = Smurf()
82
83stratum = Stratum(
84    [("clumsy_category", "==", "clumsy")]
85)
86
87universe = Universe(name="smurfs", processes=[clumsy, smurf, stratum])
88
89model = Model(
90    universes=[universe]
91)
92
93result = model.run(n=1000, seed=0)
94
95print(result.events)
96print(result.durations)

We see in our result.events that the stratum 1, which corresponds to our clumsy smurfs, has a much higher count of being eaten than the stratum 0, which should contain all our regular smurfs.

universe

stratum

tag

year

age

number

smurfs

0

eaten_early

0

0

51

smurfs

0

smurf_death

0

0

512

smurfs

1

eaten_early

0

0

209

smurfs

1

smurf_death

0

0

488

We can also study the average lifetime of our smurfs in the result.durations DataFrame, where we find clumsy smurfs from our first stratum to have a lower number of total lifeyears lived.

universe

stratum

tag

year

age

number

smurfs

0

smurf_life

0

0

6198.715778

smurfs

1

smurf_life

0

0

4542.202767

Note

For each example in this tutorial, strata are given in integer values 0 and 1, which is how the strata are stored internally. To make your events, durations and snapshots DataFrames more intuitive, you may also pass stratum_names as an argument to Model.run(). The argument must be a sequence of str values, where each entry denotes the name of the corresponding stratum, such that the integer value of the stratum is the index of the string in the sequence of stratum names. In this most recent example, we may pass stratum_names=["regular", "clumsy"] to obtain:

universe

stratum

tag

year

age

number

smurfs

regular

smurf_life

0

0

6198.715778

smurfs

clumsy

smurf_life

0

0

4542.202767

Dynamic stratification

The Stratum process assumes that we want to stratify our results by information given in the properties dictionary. This allows us to split up results by the simulated characteristics of our individuals, but does not allow any interaction between the stratum an individual is in, and the events that take place during simulation. It may be interesting to stratify results by outcomes that occur in the simulated events, which we generally do not know from the properties. For example, we may wish to study the outcomes by whether someone has shown up for at least one screening event, or we may have a risk factor that changes over time by which we may want to split up our results.

To stratify our results dynamically, we may use Individual.hop_stratum() in our Process. This function notes a change in stratum for the individual at the age at which the call to the function occurs. From this age onwards, any events and durations are written to the new stratum. Any events and durations that occurred before this age are still written to the old stratum.

Note

Note that the Individual.hop_stratum() call must be integrated in the process definition to be used, as opposed to the Stratum process, which can be defined after-the-fact. We therefore encourage that any use of hop_stratum includes an option in the process __init__ for turning the stratum changes on and off. Dynamic stratification is expected to be a slight computational burden, and of course sometimes we may wish to have aggregated, rather than stratified results.

Note

Memory is allocated to store a maximum of 256 hops per individual.

To demonstrate this function, we define a screening process for identifying clumsy smurfs, where we implement a change in stratum for anyone with a correct screening result. The screening is performed at year 4., and has a 0.5 probability of correctly identifying whether a smurf is clumsy or regular. We place any smurf with a correct screening result into a new stratum with the Individual.hop_stratum() method.

After identification, a clumsy smurf has a 0.6 probability of being protected from their clumsiness, at which point we remove their eaten_early event from the queue, saving them from an early demise.

 99class Smurf_Screen(Process):
100    event_tags = ["smurf_screen"]
101
102    def __init__(self, test_sens: float = 0.5, cure_prob: float = 0.6):
103        self.test_sens = test_sens
104        self.cure_prob = cure_prob
105
106        self.callbacks = {
107            "__start__": self.schedule_screen,
108            "smurf_screen": self.smurf_screen
109        }
110
111    def properties(self, rng, n, properties):
112        return {
113            "test_r": rng.random(n),
114            "cure_r": rng.random(n)
115        }
116
117    def schedule_screen(self, individual):
118        individual.add_to_queue_age(4, "smurf_screen")
119
120    def smurf_screen(self, individual):
121        individual.log_event("smurf_screen")
122        if individual.properties("test_r") < self.test_sens:
123            individual.hop_stratum(1)
124            if individual.properties("clumsy_category"):
125                if individual.properties("cure_r") < self.cure_prob:
126                    individual.remove_from_queue(0)

We now have a process which performs dynamic stratification, such that any smurf with an accurate screening is placed in a new stratum from the moment of the screen onwards. However, to correctly run the model, we need to inform the model.run() method that we will be performing such a change in stratum. Since the model is blind to the workings of any process, it cannot pre-allocate space in our results for additional strata, unless we explicitly tell it that we will be adding strata during simulation. By supplying the argument n_stratum_hops=1, the model knows it should allocate for at least 2 strata, since we will be adding another on top of the default stratum of 0.

We run our model with screening, and split up our result.durations by the period before and after our screening to study the difference in outcomes between our correctly screened group and the group with false results.

129smurf_screen = Smurf_Screen()
130
131screening_universe = Universe(name="screened_smurfs", processes=[clumsy, smurf, smurf_screen])
132
133model = Model(universes=[screening_universe])
134
135result = model.run(n=1000, duration_years=[4], n_stratum_hops=1, seed=0)
136
137print(result.events)
138print(result.durations)

Looking at our result.events, we see that every smurf is still in stratum 0 at the time of the smurf_screen event, after which the group of identified smurfs is split up. We see that the stratum 1 group has a much lower rate of being eaten early than the stratum 0 group, demonstrating the positive outcome the screening has on the clumsy smurfs among stratum 1.

universe

stratum

tag

year

age

number

screened_smurfs

0

eaten_early

0

0

135

screened_smurfs

0

smurf_death

0

0

504

screened_smurfs

0

smurf_screen

0

0

926

screened_smurfs

1

eaten_early

0

0

59

screened_smurfs

1

smurf_death

0

0

496

The result.durations table shows that before year 4.0, every smurf is still allocated to stratum 0. After year 4.0, we observe that stratum 1 has a much higher cumulative number of lifeyears, even though the number of smurfs in each stratum should be roughly equal, demonstrating the improved outcomes for smurfs with a correct screening.

universe

stratum

tag

year

age

number

screened_smurfs

0

smurf_life

0.0

0

3904.970480

screened_smurfs

0

smurf_life

4.0

0

3096.596602

screened_smurfs

1

smurf_life

4.0

0

4352.426803

Preserving stratum from the first universe

We often perform analyses comparing outcomes between universes, where one universe may select individuals for a screening, and the other has a different screening process, or no screening at all. To compare the outcomes of screening participants particularly, we may use dynamic stratification to put individuals into a stratum depending on their outcomes in one universe and then preserve this stratum in the other universes. For example, we may study the outcomes of screening eligibles in a universe in which there is no screening at all, even if screening eligibility is only revealed during the simulation process. Using MISCore, we can either preserve the dynamic stratification of the first universe or we can preserve the final stratum of each individual.

By preserving the dynamic stratification, the moments at which the simulated individuals hop across the strata in the first universe is stored, and applied in the other universes. For our smurfs, we may do so by running the same analysis as before, and then running a non-screening universe in which we preserve the stratum hops from the screening universe. We do so by setting preserve_stratum="dynamic" in model.run(). Note that we can only preserve the stratum hops from the first universe in the universes argument, as otherwise the model does not have the correct stratum at hand when noting results. We therefore run screened_smurfs as our first universe, since this is the universe in which our initial dynamic stratification occurs.

140no_screening_universe = Universe(name="no_screening", processes=[clumsy, smurf])
141
142model = Model([screening_universe, no_screening_universe])
143
144result = model.run(n=1000, duration_years=[4], n_stratum_hops=1,
145                   preserve_stratum="dynamic", seed=0)
146
147print(result.events)
148print(result.durations)

Studying our result.durations dataframe, we now see the explicit loss of lifeyears of the correctly screened group in the non-screening universe. Up to year 4.0, the two strata both exist unperturbed and have an equal number of lifeyears in each universe. However, after year 4.0 the group with a true screening result have a higher number of lifeyears lived in the screening universe than they do in the non-screening universe. The group with a false screening result (stratum 0) have no change to their lifeyears lived between universes.

universe

stratum

tag

year

age

number

screened_smurfs

0

smurf_life

0.0

0

3904.970480

screened_smurfs

0

smurf_life

4.0

0

3096.596602

screened_smurfs

1

smurf_life

4.0

0

4352.426803

no_screening

0

smurf_life

0.0

0

3904.970480

no_screening

0

smurf_life

4.0

0

3096.596602

no_screening

1

smurf_life

4.0

0

3739.351464

Alternatively, we can also preserve the final stratum of the first universe for all individuals. This way, individuals do not hop across strata in the second universe anymore. Instead, individuals are born in the stratum that they died in during the first universe and do not change stratum during their lives. To do so, we set preserve_stratum="final" in model.run(). Again, we run screened_smurfs as our first universe as we can only preserve the final stratum of the first universe in the universes argument.

150result = model.run(n=1000, duration_years=[4], n_stratum_hops=1,
151                   preserve_stratum="final", seed=0)
152
153print(result.events)
154print(result.durations)

In our result.durations dataframe, we now see 1984 years in stratum 1 from year 0 to 4. This means that some of our smurfs were born in stratum 1. For that reason, we see that the life years in statum 0 from year 0 to 4 has decreased by 1984 compared to preserving the dynamic stratification. After year 4, the smurfs are in the same stratum as they were under preserving dynamic stratifcation, so the number has not changed. Preserving the final stratum can be used to compute the overdiagnosis of a screening programme (it is currently implemented in the BC process).

universe

stratum

tag

year

age

number

screened_smurfs

0

smurf_life

0.0

0

3904.970480

screened_smurfs

0

smurf_life

4.0

0

3096.596602

screened_smurfs

1

smurf_life

4.0

0

4352.426803

no_screening

0

smurf_life

0.0

0

1920.970480

no_screening

0

smurf_life

4.0

0

3096.596602

no_screening

1

smurf_life

0.0

0

1984.000000

no_screening

1

smurf_life

4.0

0

3739.351464

Using dynamic stratification to isolate prevalent cases

Calibration efforts are often focused on populations which have been filtered in some way. For example, clinical trials likely include individuals with a particular age profile, but more importantly, includes individuals who are not in the clinical phase of the disease of interest at baseline. When we calibrate to such trials, we want to apply a similar filtering to make our simulation representative of the situation which generated the data.

Whether someone is in a particular state at a particular year is not something known beforehand to the population. By applying dynamic stratification, we can step in at the relevant timepoint to separate our results into the results of interest (all individuals not in a clinical phase) and the results of the individuals that we want to filter out. Consider below an example of how to do so in a simple lung cancer screening process.

 1from miscore import Process
 2
 3
 4class lc_screening_filtered(Process):
 5    event_tags = ["screen"]
 6
 7    def __init__(self, ct_sensitivity=0.80, start_age=50.0, end_age=80.0, interval=1.0):
 8        self.ct_sensitivity = ct_sensitivity
 9        self.start_age = start_age
10        self.end_age = end_age
11        self.interval = interval
12
13        self.callbacks = {
14            "__start__": self.schedule_screen,
15            "screen": self.lc_screen
16        }
17
18    def properties(self, rng, n, properties):
19        return {
20            "test_r": rng.random(n)
21        }
22
23    def schedule_screen(self, individual):
24        individual.add_to_queue_age(self.start_age, "screen")
25
26    def lc_screen(self, individual):
27        if ("lc_clinical" in individual.memory):
28            if (individual.age == self.start_age):
29                individual.hop_stratum(1)
30            return
31
32        individual.log_event("lc_screen")
33        if individual.age <= (self.end_age - self.interval):
34            individual.add_to_queue(1.0, "screen")
35
36        if individual.properties("test_r") < self.ct_sensitivity:
37            if "lc_state" in individual.memory:
38                individual.add_to_queue(0., "lc_clinical", stage="IA", screen_detected=True)

Note that individuals with clinical lung cancer never participate in screening, since their screening visit always ends in a return statement before any transition to screen-detected lung cancer occurs. However, only individuals at baseline (those who match the start_age) are placed in a different stratum. This ensures that individuals who get a clinical lung cancer in the intermittent period (i.e. those who would already be included by their health status at baseline) are still accounted for in our analysis. From the age at baseline onwards, all results for individuals with prevalent cases are written to a different stratum. One could also terminate the simulation for these individuals by adding a mortality event together with the hop_stratum call. However, we recommend leaving them in at least initially to check that your stratification happens correctly.

Note

This approach requires adjusting your disease or your disease screening process. You may also construct a secondary process which performs these checks just before the screening event, but special care must be taken that the sequence of events is performed correctly, such that no events of those with prevalent cases are written to the default stratum (use the priority argument for the add_to_queue method to schedule events at the same time but have them processed first).