Using stratum¶

With population models, we can stratify our results by cohort to observe outcomes specific to a particular sex or birth cohort. However, sometimes we may be interested in splitting up our results by properties that are generated as part of our simulation. For this purpose, the Stratum process can be employed. This tutorial provides a working example of using Stratum to split up results by properties generated during simulation, and also demonstrates how we can go further by employing dynamic stratification, which allows individuals to change their stratum during their simulated lifetime.

A basic example¶

To execute the examples used in this tutorial, download the script 1_stratified_model.py.

For the sake of the tutorial, we will employ an example process which simulates Smurf lifetimes. We assume that their total lifetime is Weibull distributed with shape 2 and scale 15, for an average life expectancy of about 13 years. However, at year 5 some of our smurfs meet an early end by being eaten by a cat. Smurfs may either be born clumsy, in which case they have a 0.5 probability of being eaten, or regular, in which case they only have a 0.1 probability of being eaten. For more information on creating processes such as these, see the tutorial Create a process. This tutorial also assumes knowing how to build and run a model, as demonstrated in Run a model. It will also be useful to have studied how to interpret model output, as shown in Model output.

from miscore import Model, Process, Universe
from miscore.processes import Stratum
from miscore.utils import WeibullDistribution


class Clumsy(Process):
    # Process to divide smurfs into clumsy or regular
    def __init__(self, prob_clumsy: float = 0.5, name: str = "clumsy"):
        self.prob_clumsy = prob_clumsy
        self.name = name

    def properties(self, rng, n, properties):
        clumsy_category = rng.choice(
            ["clumsy", "regular"],
            size=n,
            p=[self.prob_clumsy, 1 - self.prob_clumsy]
        )
        return {
            "clumsy_category": clumsy_category
        }


class Smurf(Process):
    event_tags = [
        "eaten_early",
        "smurf_death"
    ]

    duration_tags = [
        "smurf_life"
    ]

    # Process to simulate smurf lifetimes
    def __init__(self, survival_shape: float = 2., survival_scale: float = 15,
                 name: str = "smurfs"):
        self.survival = WeibullDistribution(scale=survival_scale, shape=survival_shape)
        self.name = name

        self.callbacks = {
            "__start__": self.schedule_survival,
            "eaten_early": self.eaten_early,
            "__end__": self.log_durations
        }

    def properties(self, rng, n, properties):
        survival_r = rng.random(n)
        eaten_early_r = rng.random(n)
        return {
            "survival_r": survival_r,
            "eaten_early_r": eaten_early_r
        }

    def schedule_survival(self, individual):
        individual.add_to_queue_age(5., "eaten_early")
        natural_mortality_age = self.survival(individual.properties("survival_r"))
        individual.add_to_queue_age(natural_mortality_age, message="smurf_death",
                                    terminate=True)

    def eaten_early(self, individual):
        threshold = 0.1
        if individual.properties("clumsy_category") == "clumsy":
            threshold = 0.5
        if individual.properties("eaten_early_r") < threshold:
            individual.log_event("eaten_early")
            individual.add_to_queue(
                delta=0.,
                message="smurf_death",
                terminate=True
            )

    def log_durations(self, individual, message):
        individual.log_duration(
            tag="smurf_life",
            start_age=0,
            end_age=individual.age
        )

Using the Stratum process¶

Given what we know about our smurf population, we may be interested in splitting up our results between the clumsy and regular smurfs. To this end, we may create an instance of the Stratum process that places our smurfs in separate strata when writing out our results. We do so by specifying a list of conditions for each Stratum we want to generate. By default, all individuals are placed in stratum 0. Each list of conditions reflect the properties that place an individual in new stratums from stratum 1 onwards. The first list of conditions passed to stratum places individuals in stratum 1, the second list places individuals in stratum 2, and so onwards for any number of lists that may be passed to stratum. Each condition is a tuple of the name of the property by which we wish to stratify, and a string of the expression we wish to evaluate on that property. If multiple conditions are required for a stratum, multiple tuples may be entered into the list. A valid call of Stratum will follow the form demonstrated below.

Stratum(
    [("property_name_1", ">", 10)], # Singular condition for stratum 1
    [("property_name_1", "<", 5), ("property_name_2", "!=", "blue")] # Two conditions for placing in stratum 2
)

Make sure that no overlap exists between the conditions supplied to Stratum, since this will challenge the interpretability of the acquired results. Continuing, we define our strata, and perform a model run to observe the results by stratum.

clumsy = Clumsy()

smurf = Smurf()

stratum = Stratum(
    [("clumsy_category", "==", "clumsy")]
)

universe = Universe(name="smurfs", processes=[clumsy, smurf, stratum])

model = Model(
    universes=[universe]
)

result = model.run(n=1000, seed=0)

print(result.events)
print(result.durations)

We see in our result.events that the stratum 1, which corresponds to our clumsy smurfs, has a much higher count of being eaten than the stratum 0, which should contain all our regular smurfs.

universe	stratum	tag	number
smurfs	0	eaten_early	51
smurfs	0	smurf_death	512
smurfs	1	eaten_early	209
smurfs	1	smurf_death	488

We can also study the average lifetime of our smurfs in the result.durations DataFrame, where we find clumsy smurfs from our first stratum to have a lower number of total lifeyears lived.

universe	stratum	tag	year	age	number
smurfs	0	smurf_life	0	0	6198.715778
smurfs	1	smurf_life	0	0	4542.202767

Note

For each example in this tutorial, strata are given in integer values 0 and 1, which is how the strata are stored internally. To make your events, durations and snapshots DataFrames more intuitive, you may also pass stratum_names as an argument to Model.run(). The argument must be a sequence of str values, where each entry denotes the name of the corresponding stratum, such that the integer value of the stratum is the index of the string in the sequence of stratum names. In this most recent example, we may pass stratum_names=["regular", "clumsy"] to obtain:

universe	stratum	tag	year	age	number
smurfs	regular	smurf_life	0	0	6198.715778
smurfs	clumsy	smurf_life	0	0	4542.202767

Dynamic stratification¶

The Stratum process assumes that we want to stratify our results by information given in the properties dictionary. This allows us to split up results by the simulated characteristics of our individuals, but does not allow any interaction between the stratum an individual is in, and the events that take place during simulation. It may be interesting to stratify results by outcomes that occur in the simulated events, which we generally do not know from the properties. For example, we may wish to study the outcomes by whether someone has shown up for at least one screening event, or we may have a risk factor that changes over time by which we may want to split up our results.

To stratify our results dynamically, we may use Individual.hop_stratum() in our Process. This function notes a change in stratum for the individual at the age at which the call to the function occurs. From this age onwards, any events and durations are written to the new stratum. Any events and durations that occurred before this age are still written to the old stratum.

Note

Note that the Individual.hop_stratum() call must be integrated in the process definition to be used, as opposed to the Stratum process, which can be defined after-the-fact. We therefore encourage that any use of hop_stratum includes an option in the process __init__ for turning the stratum changes on and off. Dynamic stratification is expected to be a slight computational burden, and of course sometimes we may wish to have aggregated, rather than stratified results.

Note

Memory is allocated to store a maximum of 256 hops per individual.

To demonstrate this function, we define a screening process for identifying clumsy smurfs, where we implement a change in stratum for anyone with a correct screening result. The screening is performed at year 4., and has a 0.5 probability of correctly identifying whether a smurf is clumsy or regular. We place any smurf with a correct screening result into a new stratum with the Individual.hop_stratum() method.

After identification, a clumsy smurf has a 0.6 probability of being protected from their clumsiness, at which point we remove their eaten_early event from the queue, saving them from an early demise.

class Smurf_Screen(Process):
    event_tags = ["smurf_screen"]

    def __init__(self, test_sens: float = 0.5, cure_prob: float = 0.6):
        self.test_sens = test_sens
        self.cure_prob = cure_prob

        self.callbacks = {
            "__start__": self.schedule_screen,
            "smurf_screen": self.smurf_screen
        }

    def properties(self, rng, n, properties):
        return {
            "test_r": rng.random(n),
            "cure_r": rng.random(n)
        }

    def schedule_screen(self, individual):
        individual.add_to_queue_age(4, "smurf_screen")

    def smurf_screen(self, individual):
        individual.log_event("smurf_screen")
        if individual.properties("test_r") < self.test_sens:
            individual.hop_stratum(1)
            if individual.properties("clumsy_category"):
                if individual.properties("cure_r") < self.cure_prob:
                    individual.remove_from_queue(0)

We now have a process which performs dynamic stratification, such that any smurf with an accurate screening is placed in a new stratum from the moment of the screen onwards. However, to correctly run the model, we need to inform the model.run() method that we will be performing such a change in stratum. Since the model is blind to the workings of any process, it cannot pre-allocate space in our results for additional strata, unless we explicitly tell it that we will be adding strata during simulation. By supplying the argument n_stratum_hops=1, the model knows it should allocate for at least 2 strata, since we will be adding another on top of the default stratum of 0.

We run our model with screening, and split up our result.durations by the period before and after our screening to study the difference in outcomes between our correctly screened group and the group with false results.

smurf_screen = Smurf_Screen()

screening_universe = Universe(name="screened_smurfs", processes=[clumsy, smurf, smurf_screen])

model = Model(universes=[screening_universe])

result = model.run(n=1000, duration_years=[4], n_stratum_hops=1, seed=0)

print(result.events)
print(result.durations)

Looking at our result.events, we see that every smurf is still in stratum 0 at the time of the smurf_screen event, after which the group of identified smurfs is split up. We see that the stratum 1 group has a much lower rate of being eaten early than the stratum 0 group, demonstrating the positive outcome the screening has on the clumsy smurfs among stratum 1.

universe	stratum	tag	number
screened_smurfs	0	eaten_early	135
screened_smurfs	0	smurf_death	504
screened_smurfs	0	smurf_screen	926
screened_smurfs	1	eaten_early	59
screened_smurfs	1	smurf_death	496

The result.durations table shows that before year 4.0, every smurf is still allocated to stratum 0. After year 4.0, we observe that stratum 1 has a much higher cumulative number of lifeyears, even though the number of smurfs in each stratum should be roughly equal, demonstrating the improved outcomes for smurfs with a correct screening.

universe	stratum	tag	year	number
screened_smurfs	0	smurf_life	0.0	3904.970480
screened_smurfs	0	smurf_life	4.0	3096.596602
screened_smurfs	1	smurf_life	4.0	4352.426803

Preserving stratum from the first universe¶

We often perform analyses comparing outcomes between universes, where one universe may select individuals for a screening, and the other has a different screening process, or no screening at all. To compare the outcomes of screening participants particularly, we may use dynamic stratification to put individuals into a stratum depending on their outcomes in one universe and then preserve this stratum in the other universes. For example, we may study the outcomes of screening eligibles in a universe in which there is no screening at all, even if screening eligibility is only revealed during the simulation process. Using MISCore, we can either preserve the dynamic stratification of the first universe or we can preserve the final stratum of each individual.

By preserving the dynamic stratification, the moments at which the simulated individuals hop across the strata in the first universe is stored, and applied in the other universes. For our smurfs, we may do so by running the same analysis as before, and then running a non-screening universe in which we preserve the stratum hops from the screening universe. We do so by setting preserve_stratum="dynamic" in model.run(). Note that we can only preserve the stratum hops from the first universe in the universes argument, as otherwise the model does not have the correct stratum at hand when noting results. We therefore run screened_smurfs as our first universe, since this is the universe in which our initial dynamic stratification occurs.

no_screening_universe = Universe(name="no_screening", processes=[clumsy, smurf])

model = Model([screening_universe, no_screening_universe])

result = model.run(n=1000, duration_years=[4], n_stratum_hops=1,
                   preserve_stratum="dynamic", seed=0)

print(result.events)
print(result.durations)

Studying our result.durations dataframe, we now see the explicit loss of lifeyears of the correctly screened group in the non-screening universe. Up to year 4.0, the two strata both exist unperturbed and have an equal number of lifeyears in each universe. However, after year 4.0 the group with a true screening result have a higher number of lifeyears lived in the screening universe than they do in the non-screening universe. The group with a false screening result (stratum 0) have no change to their lifeyears lived between universes.

universe	stratum	tag	year	number
screened_smurfs	0	smurf_life	0.0	3904.970480
screened_smurfs	0	smurf_life	4.0	3096.596602
screened_smurfs	1	smurf_life	4.0	4352.426803
no_screening	0	smurf_life	0.0	3904.970480
no_screening	0	smurf_life	4.0	3096.596602
no_screening	1	smurf_life	4.0	3739.351464

Alternatively, we can also preserve the final stratum of the first universe for all individuals. This way, individuals do not hop across strata in the second universe anymore. Instead, individuals are born in the stratum that they died in during the first universe and do not change stratum during their lives. To do so, we set preserve_stratum="final" in model.run(). Again, we run screened_smurfs as our first universe as we can only preserve the final stratum of the first universe in the universes argument.

result = model.run(n=1000, duration_years=[4], n_stratum_hops=1,
                   preserve_stratum="final", seed=0)

print(result.events)
print(result.durations)

In our result.durations dataframe, we now see 1984 years in stratum 1 from year 0 to 4. This means that some of our smurfs were born in stratum 1. For that reason, we see that the life years in statum 0 from year 0 to 4 has decreased by 1984 compared to preserving the dynamic stratification. After year 4, the smurfs are in the same stratum as they were under preserving dynamic stratifcation, so the number has not changed. Preserving the final stratum can be used to compute the overdiagnosis of a screening programme (it is currently implemented in the BC process).

universe	stratum	tag	year	number
screened_smurfs	0	smurf_life	0.0	3904.970480
screened_smurfs	0	smurf_life	4.0	3096.596602
screened_smurfs	1	smurf_life	4.0	4352.426803
no_screening	0	smurf_life	0.0	1920.970480
no_screening	0	smurf_life	4.0	3096.596602
no_screening	1	smurf_life	0.0	1984.000000
no_screening	1	smurf_life	4.0	3739.351464

Using dynamic stratification to isolate prevalent cases¶

Calibration efforts are often focused on populations which have been filtered in some way. For example, clinical trials likely include individuals with a particular age profile, but more importantly, includes individuals who are not in the clinical phase of the disease of interest at baseline. When we calibrate to such trials, we want to apply a similar filtering to make our simulation representative of the situation which generated the data.

Whether someone is in a particular state at a particular year is not something known beforehand to the population. By applying dynamic stratification, we can step in at the relevant timepoint to separate our results into the results of interest (all individuals not in a clinical phase) and the results of the individuals that we want to filter out. Consider below an example of how to do so in a simple lung cancer screening process.

from miscore import Process


class lc_screening_filtered(Process):
    event_tags = ["screen"]

    def __init__(self, ct_sensitivity=0.80, start_age=50.0, end_age=80.0, interval=1.0):
        self.ct_sensitivity = ct_sensitivity
        self.start_age = start_age
        self.end_age = end_age
        self.interval = interval

        self.callbacks = {
            "__start__": self.schedule_screen,
            "screen": self.lc_screen
        }

    def properties(self, rng, n, properties):
        return {
            "test_r": rng.random(n)
        }

    def schedule_screen(self, individual):
        individual.add_to_queue_age(self.start_age, "screen")

    def lc_screen(self, individual):
        if ("lc_clinical" in individual.memory):
            if (individual.age == self.start_age):
                individual.hop_stratum(1)
            return

        individual.log_event("lc_screen")
        if individual.age <= (self.end_age - self.interval):
            individual.add_to_queue(1.0, "screen")

        if individual.properties("test_r") < self.ct_sensitivity:
            if "lc_state" in individual.memory:
                individual.add_to_queue(0., "lc_clinical", stage="IA", screen_detected=True)

Note that individuals with clinical lung cancer never participate in screening, since their screening visit always ends in a return statement before any transition to screen-detected lung cancer occurs. However, only individuals at baseline (those who match the start_age) are placed in a different stratum. This ensures that individuals who get a clinical lung cancer in the intermittent period (i.e. those who would already be included by their health status at baseline) are still accounted for in our analysis. From the age at baseline onwards, all results for individuals with prevalent cases are written to a different stratum. One could also terminate the simulation for these individuals by adding a mortality event together with the hop_stratum call. However, we recommend leaving them in at least initially to check that your stratification happens correctly.

Note

This approach requires adjusting your disease or your disease screening process. You may also construct a secondary process which performs these checks just before the screening event, but special care must be taken that the sequence of events is performed correctly, such that no events of those with prevalent cases are written to the default stratum (use the priority argument for the add_to_queue method to schedule events at the same time but have them processed first).