Random numbers and properties¶

Sometimes, when you modify your process, you need to draw new random numbers. You should draw them using the MISCore structure. This has the following advantages:

It is possible to draw the random numbers for many simulated individuals simultaneously, which is faster than drawing them one by one;
It is easy to seed the random number streams so you can ensure reproducibility.

There are three ways to generate random numbers: properties(), properties_tmp() and random streams. It is also possible to import properties from an external model.

Properties¶

The properties() are useful when you need a random number for every individual for a certain element. As you have seen in the MISCore structure tutorial, the properties are generated before the simulation starts. Therefore, for every person, the random number are drawn at the same time, before the simulation which saves running time. Also, the properties can be stored and output after the simulation (see Model output).

To add a property you should add it to the properties() function in your Process(). For example, the following draws a relative risk from a normal distribution for each individual in the simulation. Note that you should also add your property in the return statement.

def properties(self, rng, n, properties):
    relative_risk = rng.normal(mean=1, sd=2, size=n)

    return {
        "relative_risk": relative_risk
    }

Next, if you want to use your property in the simulation, you can retrieve it by

r = individual.properties("relative_risk")

Properties_tmp¶

Sometimes you want to draw random number for each lesion instead of each individual. In a disease for which individuals can get many lesions, generating the random numbers for all individuals of your simulation might take up a lot of memory. To prevent that, you can use the properties_tmp(). As shown in the MISCore structure tutorial, the properties_tmp are generated for a block of individuals. By default the block_size is 2000, which means that these numbers are drawn for 2000 people at once. The numbers are discarded after the block has been simulated. This could save a lot of memory space. The drawback of this approach is that you cannot output the properties_tmp after the simulation.

Adding a temporary property is similar to adding a property. For example, the following draws 100 relative_risk from a normal distribution for each individual in the simulation.

def properties(self, rng, n, properties):
    sojourn_times = rng.normal(mean=1, sd=2, size=(100, n)).T

    return {
        "random_normal": sojourn_times
    }

You can retrieve the 10th value of your temporary property with:

r = individual.properties_tmp("sojourn_times")[9]

Note

Python starts counting at 0. Therefore, the first value is retrieved with individual.properties_tmp("sojourn_times")[0] and the 10th with individual.properties_tmp("sojourn_times")[9].

Note

The temporary property above is a 2-dimensional property: for each individual, we drew 100 values. In such cases, it is recommended to use size=(100, n) and then transpose your matrix using .T, as shown above. This makes sure that the properties of your first individuals remain the same when you increase your sample size. For example, when you first run 1 million individuals and then run 2 million, the above approach ensures that the properties of the first 1 million individuals are the same in both simulations.

Random stream¶

There will also be cases that it is hard to draw random numbers in advance. For example, the random values for false positive or false negative results. Often, it is not known before the simulation when an individual will do a test. That depends on the screening strategy, but that also depends on the findings at previous tests and whether someone went to surveillance. Also, the sensitivity of a test often depends on the health state of an individual, or even on the age. Then it is impossible to draw a random value for possible combination of test, health state and age.

With the random streams, it is possible to draw random numbers “on the go” during the simulation. This might save you from drawing a lot of unused random numbers.

To include random number streams in your process, you should add a list with names of the streams in your process:

class Disease(Process)
    ...
    random_number_generators = [
        "disease_random_stream",
    ]

Then in your disease Process, you can use the following to draw random numbers:

r = individual.random["disease_random_stream"].random
random_value = r()

Note

You can seed the properties, properties_tmp and random streams of all Processes. This is shown in the Reproducibility tutorial.

Overwriting properties¶

You can also overwrite properties from your Process. This could be useful in two cases.

The first is when you want to import properties externally. For example, for some cancers, the risk of getting it increases when an individual has some kind of infection. When an individual’s age of getting an infection is modelled with an external, sophisticated model, it might be hard to incorporate this model in your Process. Alternatively, this external model can generate a list of when individuals get an infection which you can import in MISCore.

To that end, you should import your properties as a NumPy array. Then you should include this array in a dictionary. Then you can insert this imported array in the properties parameter in the run() function. Finally, make sure to adapt your disease Process such that you use the imported values in your natural history.

The second use case is when you want to change the way your properties are drawn without modifying the Process. For example, in the case of EC, you may want to model a scenario in which all women have an onset age between age 24 and 26. You could do this by rewriting the properties in the EC process, but then you have to subclass the Process which might be a hassle. Alternatively, you could generate these values outside the model.

The following example shows how to do this:

import numpy as np

from miscore import Model, processes, Universe

# Generate 1000 random EC onset ages between 24 and 26
random_ages = np.random.uniform(low=24, high=26, size=1000)

# Put the random hazards in a dictionary.
# The keys of the dictionary must correspond to names of properties
custom_properties = {
    "ec_onset_age": random_ages
}

# Define the OC and EC processes
oc = processes.OC(life_table=processes.oc.data.nl_2017.life_table_female)

ec = processes.EC.from_data(
    processes.ec.data.us,
    name="ec"
)

# Define the universe and the model
universe = Universe(processes=[oc, ec],
                    name="universe")
model = Model(universes=[universe])

# Run the model and import the properties.
# We also indicate that this property should be returned so we can do a check.
result = model.run(
    n=1e3,
    properties=custom_properties,
    return_properties=["ec_onset_age"]
)

# Check if indeed the EC onset ages were between 24 and 26
print(f"Maximum onset age: {max(result.properties['ec_onset_age'])}")
print(f"Minimum onset age: {min(result.properties['ec_onset_age'])}")

Note that we return the ec_onset_age property to check if the property was correctly overwritten. For big runs, it is advised to not do this because it takes a lot of the memory. The output is:

Maximum onset age: 25.99627621232709
Minimum onset age: 24.001851652922895

Indeed, the onset ages that the EC Process used were between 24 and 26.