Cost-effectiveness analysis

Cost effectiveness analysis package.

miscore.tools.cea.add_latent_events(events, rates, use_regex=False)[source]

Add_latent_events adds events to a dataframe that may occur proportionally to simulated events, but are not generated by the simulation itself.

Parameters:
  • events (DataFrame) – result.events DataFrame.

  • rates (Dict[str, Tuple[Tuple[str, float]]]) – Dictionary of latent event rates, with keys being the name of the latent event, followed by a tuple of (tag, rate) combinations of tags present in the events dataframe with their corresponding rates with which they generate the latent event.

  • use_regex (bool) – Whether to use regex to match tags to a latent event, rather than equality. Defaults to False.

Returns:

New events DataFrame that includes latent events.

miscore.tools.cea.find_pareto(df, costs_label='costs', effects_label='QALY')[source]

Find the strongly dominating universes (Pareto Optimal Frontier) in the given DataFrame, in terms of the input quantities.

Parameters:
  • df (DataFrame) – Pandas DataFrame containing the CEA outcomes per modelled universe, preferably use the result of cost_effectiveness_analysis().

  • costs_label (str) – Outcome that should be minimised in optimal strategies. Defaults to ‘costs’.

  • effects_label (str) – Outcome that should be maximised in optimal strategies. Defaults to ‘QALY’.

Return type:

DataFrame

Returns:

A new DataFrame, containing the rows of the strongly dominating universes only.

miscore.tools.cea.find_efficient(df, costs_label='costs', effects_label='QALY', near_efficiency_margin=None)[source]

Find the weakly dominating strategies (efficient frontier) in the given DataFrame and their corresponding ICERs (Incremental Cost-Effectiveness Ratio). Optionally, also returns near-efficient strategies.

Parameters:
  • df (DataFrame) – Pandas DataFrame containing the CEA outcomes per modelled universe, preferably use the result of cost_effectiveness_analysis().

  • costs_label (str) – Outcome that should be minimised in optimal strategies. Defaults to ‘costs’.

  • effects_label (str) – Outcome that should be maximised in optimal strategies. Defaults to ‘QALY’.

  • near_efficiency_margin (float | None) – Strategies of which ‘effects_label’ is within near_efficiency_margin*100% of the frontier (near-efficient strategies) are included in the frontier. Must be a number between [0, 1].

Return type:

DataFrame

Returns:

A new DataFrame, containing the rows of the efficient strategies and an extra column with the ICERs of the efficient strategies. If near_efficiency_margin was specified, the DataFrame also includes columns with the vertical and horizontal distance from the frontier for near-efficient strategies.

Raises:

NotImplementedErrornear_efficiency_margin is not yet implemented for PSA/USA CEA results.

miscore.tools.cea.scatter_ce(df, tags=None, save_location=None, x_label='costs', y_label='QALY', **kwargs)[source]

Make a scatter plot of a DataFrame of universes. NOTE: End your script with cea.draw() such that all plots are drawn.

Parameters:
  • df (DataFrame) – Pandas DataFrame containing the quantities per modelled universe, preferably use the output of cost_effectiveness_analysis(), find_pareto() or find_efficient().

  • tags (str | None) – Each scatter dot can be accompanied by a tag. Set it to ‘index’ if the tag should be the related index or to ‘number’ if it should give each dot an integer number. Otherwise don’t specify.

  • save_location (str | None) – The obtained image is saved to this location if specified. Requires a (raw) string. (For example ‘D:/Master_Thesis/scatterplot.png’)

  • x_label (str) – CEA outcome displayed on horizontal axis. Defaults to ‘costs’.

  • y_label (str) – CEA outcome displayed on vertical axis. Defaults to ‘QALY’.

  • kwargs – Any keyword arguments that should be passed to the matplotlib package.

miscore.tools.cea.show()[source]

Apply matplotlib.pyplot.show() such that matplotlib needn’t be imported in main script.

miscore.tools.cea.add_population_norms(data_quantities, data_labels, country=None, sex=None, norm_data=None, sex_as_condition=None, method='EQ5D', disutility_tags=None, disutility_method='nominal', n_psa_draws=None, return_usa_draws=False, usa_quantiles=[0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975], psa_seed=None)[source]

add_population_norms adds population norm utility values to CEA datasets. This allows the analysis to offset particular health state utilities against population reference values by age and sex category. Health utilities are read from the population_norms.csv file in the MISCore CEA data folder. Currently, values by age and sex are included for Switzerland, Netherlands, UK, and United States.

Parameters:
  • data_quantities (Dict[str, Dict]) – CEA data set.

  • data_labels (List) – CEA data_labels for data_quantities.

  • country (str | None) – Country code of country, as specified in population_norms.csv.

  • sex (str | None) – String to denote Sex pertaining to dataset, can be ‘male’ or ‘female’. Can also be ‘total’ to return a population average, although this is currently not available for Switzerland.

  • norm_data (DataFrame | None) – If not using the norm utilities integrated in population_norms.csv, DataFrame with population norm health utilities. Requisite columns are ‘Agemin’, ‘Agemax’, ‘Yearmin’, ‘Yearmax’, ‘Method’, and ‘Utility’.

  • sex_as_condition (bool | None) – Boolean whether to include sex as a condition in data_quantities. Ensure that your durations and events DataFrame has a sex column with male and female categories.

  • method (Literal['EQ5D', 'VAS']) – Method of data utility, currently supports ‘EQ5D’ or ‘VAS’.

  • disutility_tags (Iterable | None) – Tags in data_quantities for which QALY inputs are given as disutility, relative to the life utility, which may need to be adjusted to reflect the category norm utility. Often, large tuples of tags can relate to a single disutility value. In this case, also one single element of the tuple can be supplied to identify the disutility tag. Ensure that in this case no singular tags reoccur in multiple tuples.

  • disutility_method (Literal['nominal', 'nominal_censored', 'proportional']) – String to denote the method of how to adjust the given disutility_tags. Options are “nominal” (i.e. disutility of -0.3 is adjusted to -0.2 for a norm utility of 0.9 compared to 1.0), “nominal_censored” (i.e. nominal method but health-state specific disutilities cannot exceed the population norm disutility), or “proportional” (i.e. disutility of -0.3 is adjusted to -0.27 for a norm utility of 0.9 compared to 1.0). Defaults to “nominal”.

  • n_psa_draws (int | None) – Int value to define if you want to draw random values from the distributions of the disutility and the norm utility values, consistently with the disutility_tags argument.

  • return_usa_draws (bool) – Whether to return quantile draws from the supplied distributions of the disutilities and the norm utilities.

  • usa_quantiles (List) – Quantiles to use for the USA draws, if applicable.

  • psa_seed (int | None) – Seed to use for the PSA draws, if applicable.

Returns:

New CEA dataset which uses conditions to include population norms.

class miscore.tools.cea.CEAResult(cea_output, psa_n_samples, miscore_version, psa_seed, psa_seeds, psa_values, descriptive_table, cea_data, cea_labels, discount_by, discount_start, discount_rate, division_factor, reference, use_regex, usa_values=None)[source]

Simulation result containing all properties from the CEA. All arguments to cost_effectiveness_analysis() that are relevant to posterior interpretation are included, as well as the calculated costs and effects, and optionally the generated psa values and its sample distributions.

Parameters:
  • cea_output (DataFrame) – The output of the CEA tool.

  • psa_n_samples (int) – The number of PSA samples.

  • miscore_version (str) – The version of MISCore used to obtain the PSA result from the given MISCore results.

  • psa_seed (bool) – The seed used to generate label-specific RNGs to draw the random PSA values.

  • psa_seeds (Dict[Tuple[str, str, str, str], int]) – The label-specific seeds used to draw the random PSA values for the individual CEA entries.

  • psa_values (DataFrame) – A dataframe containing the generated values for each label in the cea data set. The index enumerates the draws from the sample distribution. The multilevel- columns report the relevant table, cea set, event/duration tag and outcome label, respectively.

  • usa_values (DataFrame | None) – In the case of a univariate sensitivity analysis, a DataFrame is included to give the cea_set, cea_table, cea_tag, quantile and value, corresponding to a given outcome set. The usa_set column of the cea_output table can be used to correspond the result to each of these characteristics.

  • descriptive_table (DataFrame) – A table of descriptive statistics of the sample distribution of each of the labels in the cea data set, by output table, cea_set, tag and outcome label.

  • cea_data (Dict[str, Dict]) – Input to cost_effectiveness_analysis() that contains costs/effects per tag (see tutorials of cost_effectiveness_analysis() for more details).

  • cea_labels (list) – List with the names of the quantities that are calculated.

  • discount_by (str) – Whether results were discounted by ‘year’ or ‘age’.

  • discount_start (float) – Start year or age of discounting.

  • discount_rate (float) – The discount rate(s), either a float (1 rate for all labels) or a tuple specifying a rate per tuple.

  • division_factor (float) – The division factor used by cost_effectiveness_analysis().

  • reference (float | str) – If specified, the results are calculated with respect to a reference strategy.

  • use_regex (bool) – Whether regular expressions were used to interpret the CEA input.

save(path)[source]

Save the result to a file.

Use pickle.load() to load the result from the file.

Parameters:

path (str) – The path to the file.

conditions()[source]

Get the conditions for each of the cea sets in cea_input.

Return type:

dict

ICERs(costs_label='costs', effects_label='QALY')[source]

Get the ICERS of strategies on the efficient frontier by each psa_set

Return type:

DataFrame

miscore.tools.cea.cost_effectiveness_analysis(cea_data, cea_labels, use_regex=False, result=None, durations=None, events=None, snapshots_years=None, snapshots_ages=None, discount_by='year', discount_start=None, discount_rate=None, division_factor=None, reference=None, group_by=None, verbose=True, optimize=False, perform_usa=False, quantiles_usa=(0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975), psa_n_samples=None, psa_seed=None, return_psa_seeds=False, return_psa_values=False, return_descriptives=False, return_zeroes=False)[source]

Process MISCore output. Allows calculation of costs/effects or any other quantity. It also allows for a probabilistic sensitivity analysis (PSA) with costs and QALYs. Features like discounting, and using a reference strategy are also supported.

Parameters:
  • cea_data (Dict[str, Dict]) – Input that contains costs/effects per tag (see tutorials for more details).

  • cea_labels (list) – List with the names of the quantities that are calculated.

  • use_regex (bool) – Allows regular expressions in the CEA input.

  • result (Result | None) – Result object from MISCore.

  • durations (DataFrame | None) – Durations dataframe from MISCore (cannot be specified when a Result object is specified).

  • events (DataFrame | None) – Events dataframe from MISCore (cannot be specified when a Result object is specified).

  • snapshots_years (DataFrame | None) – Snapshots_years dataframe from MISCore (cannot be specified when a Result object is specified).

  • snapshots_ages (DataFrame | None) – Snapshots_ages dataframe from MISCore (cannot be specified when a Result object is specified).

  • discount_by (str) – Whether to discount by ‘year’ or ‘age’.

  • discount_start (float | None) – Start year or age of discounting

  • discount_rate (Iterable | float | None) – The discount rate(s), either a float (1 rate for all labels) or a tuple specifying a rate per tuple.

  • division_factor (float | None) – If specified, the results are divided by the division_factor (e.g. a population size).

  • reference (Iterable | str | None) – If specified, the results are calculated with respect to a reference strategy. This can be given as the name of the universe or an iterable with reference values corresponding to each label in cea_labels.

  • group_by (List[str] | None) – Specifies the columns that should appear in the output DataFrame. If not provided, “universe” is set as default.

  • verbose (bool) – If True, shows the tags in the input data and MISCore output that are not used.

  • optimize (bool) – For a PSA, large matrix operations may be needed, which are performed with np.einsum. The optimized version of einsum may be used by setting optimize to True. This will be associated with a larger use of memory. Defaults to False.

  • perform_usa (bool) – Whether to perform a univariate sensitivity analyis, returning the outcomes at various quantiles of each of the specified PSA distributions, leaving all other values fixed.

  • quantiles_usa (Iterable) – The quantiles at which the univariate sensitivity analyses are performed, defaults to 2.5%, 5%, 25%, 50%, 75%, 95% and 97.5%.

  • psa_n_samples (int | None) – Number of samples of the costs/effects to be drawn for the PSA.

  • psa_seed – Seed with which the random costs/effects samples for the PSA are drawn

  • return_psa_seeds (bool) – Include seeds used to draw PSA values in the CEAResult object.

  • return_psa_values (bool) – Include values drawn for the PSA in the CEAResult object.

  • return_descriptives (bool) – Whether to gather descriptive statistics on the sample PSA distributions, by CEA set, result table and tag.

  • return_zeroes (bool) – Whether the output dataframe should contain elements that have no contribution to one or any of the cea labels. Defaults to False.

Return type:

CEAResult

Returns:

CEA_Result object with the cost-effectiveness analysis output.

Raises:
  • ValueError – Raised if a CEA tag is invalid, not enough discount rates are defined, or a PSA distribution cannot be found in SciPy.

  • AssertionError – Raised if group_by columns are not in all input tables.