Reproducibility

Since microsimulation models usually are stochastic, the outcomes of MISCore simulations are subjected to random variations. In the Run a model tutorial, you learned how to set a seed such that the output of your simulation remains the same. This way, you apply the Common Random Numbers technique. In this tutorial we dive a bit deeper in the concept of seeding in MISCore.

Use a single seed

The simplest way to seed a simulation is to use a single seed. Based on this seed, MISCore draws all required seeds. Simply pass an integer as seed to the run() method of Model.

 1from miscore import Model, processes, Universe
 2
 3birth = processes.Birth(
 4    year=1975
 5)
 6
 7oc = processes.OC(
 8    life_table=processes.oc.data.us_2017.life_table_female,
 9)
10
11ec = processes.EC.from_data(
12    processes.ec.data.us
13)
14
15ec_screening = processes.EC_screening.from_data(
16    processes.ec_screening.data.example
17)
18
19no_screening = Universe(
20    name="no_screening",
21    processes=[birth, oc, ec]
22)
23
24screening = Universe(
25    name="screening",
26    processes=[birth, oc, ec, ec_screening]
27)
28
29model = Model(
30    universes=[no_screening, screening]
31)
32
33result = model.run(
34    n=1000,
35    seed=123
36)
37
38print(result)

In the last line, we print the Result object. This yields the following output:

Result(n=10000, block_size=2000, version='0.27.0', released=True, seeds_properties={'birth': 4937772249435845478, 'oc': 14184741768772312494, 'ec': 4917050030189248263}, seeds_properties_tmp={'ec': 16137686710547183666, 'ec_screening': 6993609550205317205}, seeds_random={})

The Result object contains all the seeds that were used. However, the seed we passed (123) is not in the result object. Instead, the object contains three types of seeds (seeds_properties, seeds_properties_tmp, seeds_random) for each of the processes. The seed 123 was used to draw these seeds.

The different types of seeds are further explained in MISCore structure. For now it is sufficient to know that each process can have three different seeds.

Use multiple seeds

For optimal control, you might want to specify the seeds for specific processes and random streams. Although this is not commonly applied, it can be useful if you want to reproduce a run, but only know the seeds of the individual processes. In the example above, the seed 123 is not saved in the result object, only the seeds of the individual processes are stored.

To specify the seeds_properties and seeds_properties_tmp, you should know the names of the processes. It is best practice to pass a name to each process, so you can be sure you are using the correct name.

The names of the random number generators, to be used in seeds_random, cannot be changed. These names can be retrieved using the random_number_generators() property of the Model.

The example below shows how to name each process in the model and specify all required seeds.

 1from miscore import Model, processes, Universe
 2
 3birth = processes.Birth(
 4    name="birth",
 5    year=1975
 6)
 7
 8oc = processes.OC(
 9    name="oc",
10    life_table=processes.oc.data.us_2017.life_table
11)
12
13ec = processes.EC.from_data(
14    processes.ec.data.us,
15    name="ec"
16)
17
18ec_screening = processes.EC_screening.from_data(
19    processes.ec_screening.data.example,
20    name="ec_screening"
21)
22
23no_screening = Universe(
24    name="no_screening",
25    processes=[birth, oc, ec]
26)
27
28screening = Universe(
29    name="screening",
30    processes=[birth, oc, ec, ec_screening]
31)
32
33model = Model(
34    universes=[no_screening, screening]
35)
36
37result = model.run(
38    n=1000,
39    seeds_properties={
40        "birth": 1646,
41        "oc": 9321,
42        "ec": 4841,
43        "ec_screening": 6540
44    },
45    seeds_properties_tmp={
46        "ec": 8010,
47        "ec_screening": 1234
48    },
49    seeds_random={}
50)
51
52print(result)

This yields the following output:

Result(n=10000, block_size=2000, version='0.27.0', released=True, seeds_properties={'birth': 1646, 'oc': 9321, 'ec': 4841}, seeds_properties_tmp={'ec': 8010, 'ec_screening': 1234}, seeds_random={})

Note that passing both specific seeds and a seed is also possible. Any required seeds that are not passed will be drawn using a random number generator seeded with seed.

Requirements for reproducibility

To ensure that a model run reproduces the same results, you should be aware of a few parameters. Besides the seed(s), the values of n and block_size passed to run() should also be equal across runs.

Results can also vary across different versions of MISCore. To make sure that you are able to replicate your model, it is strongly advised to save the Result object as a .result file after your simulation. This object includes the used version, n, block_size and all the seeds.