Run a model

This tutorial explains how to run a built-in model. It also teaches how to get the right model outputs and the basics on how to make sure your output is reproducible. Since MISCore was initially designed to simulate cancer screening intervention strategies, this tutorial will teach you how to run the built-in endometrial cancer (screening) model.

Basic model structure

First, we introduce you to the basic MISCore structure. Basically, MISCore works with Processes, Universes and Models. Everything that happens during an individual’s life (also called events) are modelled by a Process. Examples are birth, death by other causes and everything that is related to a disease or screening.

A Universe contains multiple processes and brings the events of these processes together. For example, a Universe could contain a process for birth, other-cause mortality and colorectal cancer. A second Universe could contain a colorectal cancer screening process on top of that.

A Model is the highest structure level in MISCore and it contains one or multiple Universes. In a Model, exactly the same individuals are simulated in all Universes that are included. By considering the differences between the Universes, it is possible to analyze, for example, the effect of screening.

In the remainder of this tutorial, we will put the above structure into practice. We will show how to run a basic Model with one Universe and multiple Processes using built-in data.

Minimal working example

First, let’s create a minimal working example. We simulate a Universe in which everybody dies at the age of 80. Since we do not assume a birth year, nor a disease, this Universe requires a Process for other-cause mortality only.

We’ll need the Universe and Model classes from MISCore, so let’s import those.

1from miscore import Model, Universe

We also need the Process for other-cause mortality OC. This process is built-in and can thus be imported from miscore.processes.

2from miscore.processes import OC

Every process needs to be initialized with some parameters. One way to intialize the OC process is by simply specifying an age of death, as shown in the code below. As stated before, it’s assumed everyone will die at the age of 80.

4# OC process
5oc = OC(
6    age=80
7)

Now that we created our Process, we can create our Universe. The code below shows how a Universe is initialized: it needs a name and a list of processes. By putting oc between squared brackets, we make it a list.

 9# Universe
10universe = Universe(
11    name="universe",
12    processes=[oc]
13)

The created universe(s) should then be gathered in a Model. Again, we put our universe between squared brackets because the Model only takes a list of processes.

15# Model
16model = Model(
17    universes=[universe]
18)

Finally, let’s run the model! We can do so by calling the run() method of the Model we just created. The only argument that is required by this method is n: the number of individuals to simulate. In this example, however, we also pass verbose and set it to True. This way MISCore shows its progress.

20# Run
21result = model.run(
22    n=1000,
23    verbose=True
24)

The entire file is shown below.

 1from miscore import Model, Universe
 2from miscore.processes import OC
 3
 4# OC process
 5oc = OC(
 6    age=80
 7)
 8
 9# Universe
10universe = Universe(
11    name="universe",
12    processes=[oc]
13)
14
15# Model
16model = Model(
17    universes=[universe]
18)
19
20# Run
21result = model.run(
22    n=1000,
23    verbose=True
24)

To actually run this file:

  • PyCharm: Open the context menu by right-clicking in the file and select Run 'filename'.

  • Spyder: Click ‘Run’ or press F5.

  • Command line: Use the command python path/to/filename.py.

Alternatively, in PyCharm, you can also run (a part of) your code by selecting it and pressing alt + shift + E. This method may be useful when developing or debugging your code but is discouraged for big simulation runs.

While running the simulation, you will see a progress bar. A statement is printed when the simulation is done.

Exporting

However, after the simulation is finished we do not see any output. For that we have to do a few extra steps. MISCore outputs several items, but for now we are only interested in the events and durations:

  • Events happen at a certain point in time, e.g. the number people that die at a certain age, get colorectal cancer or undergo a screening test.

  • Durations keep track of a period of time, e.g. the total number of years that people are alive, or the number of years that the population lives with a cancer.

MISCore stores these two as a pandas DataFrame. We can write these to a CSV file by adding these two lines to our script, using the to_csv() method.

 1from miscore import Model, Universe
 2from miscore.processes import OC
 3
 4# OC process
 5oc = OC(
 6    age=80
 7)
 8
 9# Universe
10universe = Universe(
11    name="universe",
12    processes=[oc]
13)
14
15# Model
16model = Model(
17    universes=[universe]
18)
19
20# Run
21result = model.run(
22    n=1000,
23    verbose=True
24)
25
26# Exports
27result.events.to_csv("events.csv")
28result.durations.to_csv("durations.csv")

After running the script, the exported events.csv will look as follows.

universe

stratum

tag

year

age

number

universe

0

oc_death

0.0

0.0

1000

Indeed, we see that in our universe called universe, all 1000 people had an event oc_death, i.e. died of an other-cause.

Make sure to close the .csv file(s) each time before proceeding to the next steps because the csv files can’t be overwritten while they are opened. This will result in a ‘Permission denied’ error.

Logging

Note that in our previous example, all oc_death events are logged at age 0.0, while we specified age 80. By default, MISCore logs all events and durations in a single age- and year interval, from 0 to infinity, here denoted by 0. To change this, we can specify the event_ages, event_years, duration_ages and duration_years arguments to the run() method. The arguments specify the lower bounds of the intervals in which events and durations are stored. In the example below, the events and durations are logged at the ages between 0 and 100 and years between 1995 and 2095.

 1from miscore import Model, Universe
 2from miscore.processes import OC
 3
 4# OC process
 5oc = OC(
 6    age=80
 7)
 8
 9# Universe
10universe = Universe(
11    name="universe",
12    processes=[oc]
13)
14
15# Model
16model = Model(
17    universes=[universe]
18)
19
20# Run
21result = model.run(
22    n=1000,
23    event_ages=range(0, 101),
24    event_years=range(1995, 2096),
25    duration_ages=range(0, 101),
26    duration_years=range(1995, 2096),
27    verbose=True
28)
29
30# Exports
31result.events.to_csv("events.csv")
32result.durations.to_csv("durations.csv")

Indeed, after adding the lines and running the script, all oc_death events are logged at age 80 in the events.csv file. Note that only age 80 is shown because no events occurred at the other ages.

universe

stratum

tag

year

age

number

universe

0

oc_death

0.0

80.0

1000

Note

In the above example, events and durations are aggregated in groups of 1 year. It is also possible to aggregate to other age groups. The code below shows this for age groups 0-39, 40-49, 50-59, 60-69, 70-79 and 80+. Events and durations are logged at the lower limit of the corresponding age group. This is useful if, for example, calibration data only specifies the age group of a patient, rather than the exact age.

result = model.run(
    n=1000,
    event_ages=[0, 40, 50, 60, 70, 80],
    duration_ages=[0, 40, 50, 60, 70, 80],
    verbose=True
)

Note

It is possible to log a subset of the event or duration tags. This is useful if you are not interested in all tags output by your model. Specify a list in the event_tags or duration_tags parameters in the run() function, and set the verify_tags argument to False.

Multiple processes

A Universe can consist of multiple Processes. Let’s add a Birth process. For now, it’s assumed everyone is born in 1995. Note how the Birth process was imported and how the initialized process is added to the list of processes in our Universe.

 1from miscore import Model, Universe
 2from miscore.processes import Birth, OC
 3
 4# Birth process
 5birth = Birth(
 6    year=1995
 7)
 8
 9# OC process
10oc = OC(
11    age=80
12)
13
14# Universe
15universe = Universe(
16    name="universe",
17    processes=[birth, oc]
18)
19
20# Model
21model = Model(
22    universes=[universe]
23)
24
25# Run
26result = model.run(
27    n=1000,
28    event_ages=range(0, 101),
29    event_years=range(1995, 2096),
30    duration_ages=range(0, 101),
31    duration_years=range(1995, 2096),
32    verbose=True
33)
34
35# Exports
36result.events.to_csv("events.csv")
37result.durations.to_csv("durations.csv")

After running the above Model, events.csv shows that everybody died in 2075: 80 years after 1995.

universe

stratum

tag

year

age

number

universe

0

oc_death

2075.0

80.0

1000

Also, the Birth process is our first process that logs durations. If you open durations.csv, you see that 1000 lifeyears are logged per year at the corresponding age until everyone dies in 2075.

Using built-in data

Obviously, it’s not realistic that every individual in our simulation dies exactly at the age of 80. Each built-in process features a data module with variables to initialize the process with. You can find an overview of all built-in data on the Built-in data page of this documentation. Here it is shown how to import and use the built-in US 2017 life table for females from MISCore.

 1from miscore import Model, Universe
 2from miscore.processes import Birth, OC
 3from miscore.processes.oc.data import us_2017
 4
 5# Birth process
 6birth = Birth(
 7    year=1995
 8)
 9
10# OC process
11oc = OC(
12    life_table=us_2017.life_table_female
13)
14
15# Universe
16universe = Universe(
17    name="universe",
18    processes=[birth, oc]
19)
20
21# Model
22model = Model(
23    universes=[universe]
24)
25
26# Run
27result = model.run(
28    n=1000,
29    event_ages=range(0, 101),
30    event_years=range(1995, 2096),
31    duration_ages=range(0, 101),
32    duration_years=range(1995, 2096),
33    verbose=True
34)
35
36# Exports
37result.events.to_csv("events.csv")
38result.durations.to_csv("durations.csv")

After running the above, check the events.csv file to see that the oc_death events now also occur at ages other than 80. Also you see a steady decrease in lifeyears in the durations.csv file.

Reproducibility

Now run the model again and take another look at the durations and events files. You might notice that you obtain different results each time you run the model. This can be fixed by passing an integer as seed to the run() method of Model.

 1from miscore import Model, Universe
 2from miscore.processes import Birth, OC
 3from miscore.processes.oc.data import us_2017
 4
 5# Birth process
 6birth = Birth(
 7    year=1995
 8)
 9
10# OC process
11oc = OC(
12    life_table=us_2017.life_table_female
13)
14
15# Universe
16universe = Universe(
17    name="universe",
18    processes=[birth, oc]
19)
20
21# Model
22model = Model(
23    universes=[universe]
24)
25
26# Run
27result = model.run(
28    n=1000,
29    seed=123,
30    event_ages=range(0, 101),
31    event_years=range(1995, 2096),
32    duration_ages=range(0, 101),
33    duration_years=range(1995, 2096),
34    verbose=True
35)
36
37# Exports
38result.events.to_csv("events.csv")
39result.durations.to_csv("durations.csv")

Now you will get the same result if you run the model twice which is convenient when you are developing your model. The Reproducibility tutorial will elaborate on seeding your model.

Disease processes

Now we are ready to add a third type of Process to our Universe: a disease Process. MISCore includes a built-in disease process for endometrial cancer (EC). An overview of all built-in processes can be found on the Example MISCAN processes page. In this tutorial, let’s add an EC process to our Universe and Model.

Like the Birth and OC processes, EC must also be initialized with necessary parameters. Similarly, MISCore has built-in data that can be used for that. There are two ways to initialize such built-in processes.

The easiest way is to to use the from_data() method which allows us to initialize the process with one line of code. The from_data() method collects the required variables from the provided data file, in this case the US EC data. Note how we first import the EC process in line 2 and how we add ec to the list of processes of our Universe in line 24.

 1from miscore import Model, Universe
 2from miscore import processes
 3from miscore.processes import Birth, OC
 4from miscore.processes.oc.data import us_2017
 5
 6# Birth process
 7birth = Birth(
 8    year=1995
 9)
10
11# OC process
12oc = OC(
13    life_table=us_2017.life_table_female
14)
15
16# EC process
17ec = processes.EC.from_data(
18    processes.ec.data.us
19)
20
21# Universe
22universe = Universe(
23    name="universe",
24    processes=[birth, oc, ec]
25)
26
27# Model
28model = Model(
29    universes=[universe]
30)
31
32# Run
33result = model.run(
34    n=1000,
35    seed=123,
36    event_ages=range(0, 101),
37    event_years=range(1995, 2096),
38    duration_ages=range(0, 101),
39    duration_years=range(1995, 2096),
40    verbose=True
41)
42
43# Exports
44result.events.to_csv("events.csv")
45result.durations.to_csv("durations.csv")

Secondly, it is possible to specify the variables separately. The following example shows how to do this, again using the NL data for the EC process:

1ec = processes.EC(
2    name="ec",
3    hazard=processes.ec.data.us.hazard,
4    age_factors=processes.ec.data.us.age_factors,
5    dwell_hyperplasia=processes.ec.data.us.dwell_hyperplasia,
6    dwell_preclinical=processes.ec.data.us.dwell_preclinical,
7    cure=processes.ec.data.us.cure,
8    time_to_death=processes.ec.data.us.time_to_death,
9)

Though the first method is very convenient, and automatically incorporates updates when parameters are added to a process, we are less aware of the actual parameters in the process.

Now add one of these methods to your code and run it. We can see that disease specific events and durations are added to your output files. For example, events.csv now contains the tags ec_death and ec_clinical and durations.csv specifies the number of lifeyears that people had clinical EC (ec_clinical).

Modify built-in data

Up to now, we only used built-in data in our model. In PyCharm, You can easily check the built-in data file by pointing with your mouse at us in processes.ec.data.us, pressing the left-mouse button and pressing ctrl + B on your keyboard.

If the built-in data is not suitable for your model, you can modify it to your wishes. For most processes, there are two ways.

The first method is most suitable if you want to modify a few parameters. For example, currently the hazard of developing EC hyperplasia is 30. If you want to simulate a high-risk population whose risk of EC is doubled, you can change the hazard parameter to 60. Additionally, we could assume that the dwell time for hyperplasia without atypia is half the current value:

 1dwell_hyperplasia = {
 2    "without_atypia": {"shape": .75, "mean": 114.40339 / 2},
 3    "with_atypia": {"shape": .75, "mean": 7.766915},
 4}
 5
 6# EC process
 7ec = processes.EC.from_data(
 8    processes.ec.data.us,
 9    hazard=60.,
10    dwell_hyperplasia=dwell_hyperplasia
11)

This example uses the US EC data, but overwrites the hazard and dwell_hyperplasia parameters. Make sure that you use the same structure of the parameter as in the original data file, like we did for the dwell_hyperplasia parameter. You can see the structure of the parameters on the specific Example MISCAN processes page, or in the original data file.

The second method is to copy-paste the original data file to your own Python project and modify the parameters in this new file. This is most convenient if you modify (nearly) all parameters. You can download these files from the Built-in data pages. The EC process can for example be initialized as follows:

from my_project import custom_data_file
...

# EC process
ec = processes.EC.from_data(
    custom_data_file
)

Example run

The code below gives a final overview of the code we wrote up to now. It simulates 1000 females that are born in 1995 and that die according to the US life table in 2017. This population is a high-risk population for endometrial cancer: we increased the relative risk by doubling the hazard of getting hyperplasia to 60 and halving the dwell time of hyperplasia without atypia.

Note that we also cleaned the imports in this final version. Up to now, we imported the Birth and OC process and their data separately from the processes. In this version of the code, we only import processes and get the Birth and OC processes from there, like we did for the EC process before.

 1from miscore import Model, processes, Universe
 2
 3# Birth process
 4birth = processes.Birth(
 5    year=1995
 6)
 7
 8# OC process
 9oc = processes.OC(
10    life_table=processes.oc.data.us_2017.life_table_female
11)
12
13# EC process
14dwell_hyperplasia = {
15    "without_atypia": {"shape": .75, "mean": 114.40339 / 2},
16    "with_atypia": {"shape": .75, "mean": 7.766915},
17}
18ec = processes.EC.from_data(
19    processes.ec.data.us,
20    hazard=60.,
21    dwell_hyperplasia=dwell_hyperplasia
22)
23
24# Universe
25universe = Universe(
26    name="universe",
27    processes=[birth, oc, ec]
28)
29
30# Model
31model = Model(
32    universes=[universe]
33)
34
35# Run
36result = model.run(
37    n=1000,
38    seed=123,
39    event_ages=range(0, 101),
40    event_years=range(1995, 2096),
41    duration_ages=range(0, 101),
42    duration_years=range(1995, 2096),
43    verbose=True
44)
45
46# Exports
47result.events.to_csv("events.csv")
48result.durations.to_csv("durations.csv")