Calibration¶
The calibration module offers a flexible toolbox to calibrate any model.
- class miscore.tools.calibration.GoodnessOfFit[source]¶
The base class for goodness of fit functions.
See also:
van der Steen, A., van Rosmalen, J., Kroep, S., van Hees, F., Steyerberg, E. W., de Koning, H. J., … & Lansdorp-Vogelaar, I. (2016). Calibrating parameters for microsimulation disease models: a review and comparison of different goodness-of-fit criteria. Medical Decision Making, 36(5), 652-665.
- __call__(obs, sim, n, m, weights=None)[source]¶
Validates the input using the
check_input()method. Then, it calculates the goodness of fit value using thecalculate()method and returns the resulting value.- Parameters:
obs (
ndarray) – The observed outcomes in absolute numbers.sim (
ndarray) – The simulated outcomes in absolute numbers.n (
ndarray) – The total number of individuals relevant to the observed outcomes.m (
ndarray) – The total number of individuals relevant to the simulated outcomes.weights (
ndarray|None) – The weights with which the deviance should be multiplied. Should be a vector with age-specific weights.
- Return type:
- Returns:
The goodness of fit value.
- static check_input(obs, sim, n, m, weights=None)[source]¶
This method checks the input for deviance functions and raises an error in case the input is invalid.
- Parameters:
obs (
ndarray) – The observed outcomes in absolute numbers.sim (
ndarray) – The simulated outcomes in absolute numbers.n (
ndarray) – The total number of individuals relevant to the observed outcomes.m (
ndarray) – The total number of individuals relevant to the simulated outcomes.weights (
ndarray|None) – The weights with which the deviance should be multiplied.
- Raises:
AssertionError – An AssertionError is raised if the input is invalid.
- Return type:
- abstract calculate(obs, sim, n, m, weights)[source]¶
When subclassing, overwrite this method to determine how the goodness of fit should be calculated.
- Parameters:
obs (
ndarray) – The observed outcomes in absolute numbers.sim (
ndarray) – The simulated outcomes in absolute numbers.n (
ndarray) – The total number of individuals relevant to the observed outcomes.m (
ndarray) – The total number of individuals relevant to the simulated outcomes.weights (
ndarray) – The weights with which the deviance should be multiplied.
- Return type:
- class miscore.tools.calibration.PoissonDeviance(eps=0.01)[source]¶
Bases:
GoodnessOfFitImplementation of Poisson deviance. Often used for incidence.
If \(sim_{t,i} > m_{t}\) in period \(t\) and category \(i\), \(sim_{t,i}\) is set equal to \(m_{t}\). The same is done for the observed values. Simulated values \(sim_{t,i}\) are scaled with a factor of \(\frac{n_{t}}{m_{t}}\) in each period \(t\). After scaling, simulated values are set to \(\epsilon\) if \(sim_{t,i} < \epsilon\).
Let \(obs_t=\sum_{i=1}^{N}{obs_{t,i}}\) and \(sim_t=\sum_{i=1}^{N}{sim_{t,i}}\). The deviance \(D_{t}\) in period \(t\) is then calculated as follows:
\[\begin{split}D_{t} = 2 &\left[ obs_t \ln \left( \frac{obs_t}{sim_t} \right) - (obs_t - sim_t) \right. \\ &\left. + \sum_{i=1}^{N} obs_{t,i} \left( \ln \left( \frac{obs_{t, i}}{obs_t} \right) - \ln \left( \frac{sim_{t, i}}{sim_t} \right) \right) \right].\end{split}\]It is assumed that \(\lim_{x \to 0} x\ln(x)=0\).
The deviance per period is summed to obtain the total deviance \(D\):
\[D = \sum_{t=1}^{T}{D_{t}}.\]- Parameters:
eps (
float) – The minimum value of ‘sim’ (after scaling). The default value is 1e-2.- Raises:
AssertionError – Epsilon should be non-negative.
- class miscore.tools.calibration.BinomialDeviance(eps=1e-10)[source]¶
Bases:
GoodnessOfFitImplementation of binomial deviance. Often used for prevalence.
If \(sim_{t,i} > m_{t}\) in period \(t\) and category \(i\), \(sim_{t,i}\) is set equal to \(m_{t}\). If the proportion \(\frac{sim_{t,i}}{m_{t}}\) is equal to \(0\) or \(1\), the binomial deviance is not defined. Therefore, it is clipped to the interval \([\epsilon, 1 - \epsilon]\), where \(\epsilon\) is a value in \([0, 0.5)\). Similar operations are performed on the observed values.
The deviance \(D_{t}\) in period \(t\) is then calculated as follows:
\[\begin{split}D_{t} = 2 \sum_{i=1}^{N} &\left[ obs_{t,i} \left( \ln \left( \frac{obs_{t,i}}{n_{t}} \right) - \ln \left( \frac{sim_{t,i}}{m_{t}} \right) \right) \right. \\ &\left. + (n_{t} - obs_{t,i}) \left( \ln \left(1 - \frac{obs_{t,i}}{n_{t}} \right) - \ln \left(1 - \frac{sim_{t,i}}{m_{t}} \right) \right) \right].\end{split}\]The deviance per period is summed to obtain the total deviance \(D\):
\[D = \sum_{t=1}^{T}{D_{t}}.\]- Parameters:
eps (
float) – Binomial deviance is not defined for proportions of 0 and 1. Therefore, these are clipped to the interval [eps, 1 - eps]. The default value is 1e-10.- Raises:
AssertionError – Epsilon should be a value in [0, 0.5).
- class miscore.tools.calibration.MultinomialDeviance(eps=1e-10, diff=0)[source]¶
Bases:
GoodnessOfFitImplementation of multinomial deviance. Often used for stage distributions.
If the proportion \(\frac{sim_{t,i}}{m_{t}}\) is equal to \(0\), the multinomial deviance is not defined. Therefore, the proportion is replaced with \(\epsilon\) wherever \(\frac{sim_{t,i}}{m_{t}} < \epsilon\). Here, \(\epsilon\) is a value in \([0, 1]\). Similar operations are performed on the observed values.
The deviance \(D_{t}\) in period \(t\) is then calculated as follows:
\[D_{t} = 2 \sum_{i=1}^{N} \left[ obs_{t,i} \left( \ln \left( \frac{obs_{t,i}}{n_{t}} \right) - \ln \left( \frac{sim_{t,i}}{m_{t}} \right) \right) \right].\]The deviance per period is summed to obtain the total deviance \(D\):
\[D = \sum_{t=1}^{T}{D_{t}}.\]- Parameters:
eps (
float) – Multinomial deviance is not defined for proportions of 0. Therefore, the maximum of the proportion and eps is used. The default value is 1e-10.diff (
float) – Maximum allowed difference between sum of sim and m in each period. Default is 0, can be set at a small number (0.01) to avoid raising an error due to rounding of non-integer simulated outcomes such as durations.
- Raises:
AssertionError – Epsilon should be a value in [0, 1].
AssertionError – Sum of obs should match n in each period.
AssertionError – Sum of sim should match m in each period.
- class miscore.tools.calibration.Target(x, obs, n, result_to_sim, result_to_m, goodness_of_fit, weight=1.0, name=None, category_names=None)[source]¶
Use this class to define calibration targets.
- Parameters:
x (
Sequence[SupportsFloat]) – The x-values (i.e. ages or years) corresponding to obs and n.obs (
Sequence[Sequence[SupportsFloat]] |Sequence[SupportsFloat]) – The observed outcomes in absolute numbers.n (
Sequence[SupportsFloat]) – The number of evaluated persons relevant to the observed outcomes.result_to_sim (
Callable[[Result],Sequence[SupportsFloat]]) – A callable that returns the simulated outcomes based on aResultinstance.result_to_m (
Callable[[Result],Sequence[SupportsFloat]]) – A callable that returns the total number of individuals relevant to the simulated outcomes based on aResultinstance.goodness_of_fit (
GoodnessOfFit) – An implementation of theGoodnessOfFitbase class.weight (
SupportsFloat|Sequence[SupportsFloat]) – The factor or list of factors with which the deviance should be multiplied.category_names (
Sequence[str] |None) – The name of each category.
- __call__(result)[source]¶
Return the goodness of fit for a simulation result.
- Parameters:
result (
Result) – The simulation result for which the (goodness of fit of the) target should be evaluated.- Return type:
- Returns:
The weighted goodness of fit.
- Raises:
AssertionError – An error may be raised by the goodness of fit function.
- confidence_interval(distribution, method='normal', alpha=0.05)[source]¶
Calculates confidence intervals for the observation data.
The following distributions and calculation methods are available. Here, \(\hat{p}=obs/n\) and \(z=\Phi^{-1}(1-\alpha/2)\).
Binomial distribution (distribution='binomial')
Normal approximation (method='normal')
\[\hat{p} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]Wilson score interval (method='wilson')
\[\frac{\hat{p} +\frac{z^2}{2n}}{1 + \frac{z^2}{n}} \pm \frac{z}{1 + \frac{z^2}{n}} \sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z^2}{4n^2}}\]Poisson distribution (distribution='poisson')
Normal approximation (method='normal')
\[\hat{p} \pm z \sqrt{\frac{\hat{p}}{n}}\]
- Parameters:
distribution (
str) – The assumed distribution of the data (options: 'binomial', 'poisson').method (
str) – The method to be used to calculate the intervals (options for distribution='binomial': 'normal', 'wilson'; options for distribution='poisson': 'normal').alpha (
float) – The significance level (default: 0.05).
- Return type:
- Returns:
The lower and upper bounds of the confidence intervals.
- Raises:
NotImplementedError – An error is raised when the distribution is not implemented or if the method is not implemented for the chosen distribution.
- plot(result=None, x=None, sim=None, m=None, confidence_interval=None, show=False, outcome_when_m_is_zero=0, marker_simulated_values='')[source]¶
Plots the observed data. It is possible to include confidence intervals. Simulated data can also be plotted, either by passing ‘result’ or by passing ‘x’, ‘sim’ and ‘m’.
- Parameters:
x (
Sequence[SupportsFloat] |None) – The x-values (i.e. ages or years) corresponding to sim and m.sim (
Sequence[SupportsFloat] |None) – The simulated outcomes in absolute numbers.m (
Sequence[SupportsFloat] |None) – The number of evaluated persons relevant to the simulated outcomes.confidence_interval (
Tuple[ndarray,ndarray] |None) – Confidence intervals to plot around the observed data.show (
bool) – Whether or not to show the plot (default: False).outcome_when_m_is_zero (
float) – Value for the division of sim by m, when m is zero (default: 0). Other values can be given to show ornumpy.nanto show nothing.marker_simulated_values (
str) – Marker of the simulated values. Can be any marker from matplotlib. Default is “” (no marker) if multiple x-values are used and “p” (pentagon) if only one x-value is used.
- Return type:
- Returns:
The created plot.
- Raises:
AssertionError – An error is raised when the passed simulation data is invalid.
- class miscore.tools.calibration.Fit(model_from_x, targets, best_n=50, **run_arguments)[source]¶
This class can be used to construct a fitness value that can be used in any optimization algorithm. It returns the total goodness of fit for any input array. !! If you are using multiprocessing in an outer calibration loop to divide the Fit calls across multiple cores, the Fit.found and Fit.best will give unreliable results. Alternatively, you could use the
coresparameter when initializing yourFitinstance for multiprocessing within the model run. !!- Parameters:
model_from_x (
Callable) – This callable should return aModel(or any other object with a similarrun()method) when called with a sequence of input values.best_n (
int) – The size of the ‘best’ list, which holds the best found solutions so far.run_arguments – Any other keyword arguments will be passed to the
run()method to run theModelinstances created using the ‘model_from_x’ argument.
- miscore.tools.calibration.events_from_result(tags, ages=None, years=None, conditions=None)[source]¶
Returns a function that extracts events from a
Resultinstance and sums events at ‘ages’ or ‘years’. Can be passed toTargetas result_to_sim or result_to_m.- Parameters:
tags (
Sequence[Collection[str]]) – Tags to extract.ages (
Sequence[Collection[float]] |None) – Exact ages in theagecolumn in aResult.events instance for which to sum the events. Specify eitheragesoryears.years (
Sequence[Collection[float]] |None) – Exact years in theyearcolumn in aResult.events instance for which to sum the events. Specify eitheragesoryears.conditions (
Sequence[Tuple[str,str,str|float]] |None) – Conditions to the events, e.g. [(“year”, “>=”, 1985), (“universe”, “==”, “screening”)]. Only events that adhere to all conditions are included.
- Return type:
- Returns:
The function that extracts events from a Result instance.
- Raises:
ValueError – Raised when both ages and years are not specified.
- miscore.tools.calibration.durations_from_result(tags, ages=None, years=None, conditions=None)[source]¶
Returns a function that extracts durations from a
Resultinstance and sums durations at ‘ages’ or ‘years’. Can be passed toTargetas result_to_sim or result_to_m.- Parameters:
tags (
Sequence[Collection[str]]) – Tags to extract.ages (
Sequence[Collection[float]] |None) – Exact ages in theagecolumn in aResult.durations instance for which to sum the durations. Specify eitheragesoryears.years (
Sequence[Collection[float]] |None) – Exact years in theyearcolumn in aResult.events instance for which to sum the durations. Specify eitheragesoryears.conditions (
Sequence[Tuple[str,str,str|float]] |None) – Conditions to the durations, e.g. [(“year”, “>=”, 1985), (“universe”, “==”, “screening”)]. Only durations that adhere to all conditions are included.
- Return type:
- Returns:
The function that extracts durations from a Result instance.
- Raises:
ValueError – Raised when both ages and years are not specified.
- miscore.tools.calibration.snapshots_ages_from_result(tags, ages, conditions=None)[source]¶
Returns a function that extracts and sums age snapshots from a
Resultinstance. Can be passed toTargetas result_to_sim or result_to_m.- Parameters:
tags (
Sequence[Collection[str]]) – Tags to extract.ages (
Sequence[Collection[float]]) – Exact ages in theagecolumn in aResult.snapshots_ages instance for which to sum the snapshots.conditions (
Sequence[Tuple[str,str,str|float]] |None) – Conditions to the snapshot_ages to extract, e.g. [(“year”, “>=”, 1985), (“universe”, “==”, “screening”)]. Only snapshots that adhere to all conditions are included.
- Return type:
- Returns:
The function that extracts snapshots by age from a Result instance.
- miscore.tools.calibration.snapshots_years_from_result(tags, years, conditions=None)[source]¶
Returns a function that extracts and sums year snapshots from a
Resultinstance. Can be passed toTargetas result_to_sim or result_to_m.- Parameters:
tags (
Sequence[Collection[str]]) – Tags to extract.years (
Sequence[Collection[float]]) – Exact years in theyearcolumn in aResult.snapshots_years instance for which to sum the snapshots.conditions (
Sequence[Tuple[str,str,str|float]] |None) – Conditions to the snapshot_years to extract, e.g. [(“year”, “>=”, 1985), (“universe”, “==”, “screening”)]. Only snapshots that adhere to all conditions are included.
- Return type:
- Returns:
The function that extracts snapshots by year from a Result instance.