Cost-effectiveness analysis

This tutorial explains how to use the cost-effectiveness analysis tool presented in cea. The tool calculates the costs, QALYs gained or any other label of one or multiple Universe instances. Moreover, it is possible to use this tool for probabilistic sensitivity analysis (PSA) and univariate sensitivity analysis (USA) to incorporate uncertainty in cost and QALY estimates. It can also find the (strongly or weakly) dominating Universe instances and plot them. This allows simple post-processing of MISCore simulations.

We discuss:

  1. How to write a data file to specify the costs/effects of each logged event or duration;

  2. How to implement a regular CEA and find dominating strategies;

  3. How to do a probabilistic sensitivity analysis;

  4. How to do a univariate sensitivity analysis;

  5. How to use implement different QALY values for age and sex groups.

Note

If you use built-in data for the costs and effects, you can skip the first part on the CEA data.

The starting point of any cost-effectiveness analysis is to have events and durations results from which you can calculate costs and benefits of a given scenario. As an example, you may use the following model run in which we compare a scenario of screening for endometrial cancer with a non-screening scenario. You may run it yourself to obtain the dataframes used in the examples.

Show/hide full code

 1import pandas as pd
 2
 3from miscore import Model, processes, Universe
 4from miscore.tools import cea
 5from miscore.tools.cea.data import population_norms
 6
 7# Birth process
 8birth = processes.Birth(
 9    name="birth",
10    year=1975
11)
12
13# OC process
14oc = processes.OC(
15    name="oc",
16    life_table=processes.oc.data.us_2009.life_table,
17)
18
19# EC process
20ec = processes.EC.from_data(processes.ec.data.us)
21
22# EC screening process
23ec_screening = processes.EC_screening.from_data(processes.ec_screening.data.example)
24
25# Universes
26no_screening = Universe(
27    name="no_screening",
28    processes=[
29        birth,
30        oc,
31        ec
32    ]
33)
34
35screening = Universe(
36    name="screening",
37    processes=[
38        birth,
39        oc,
40        ec,
41        ec_screening
42    ]
43)
44
45# Model
46model = Model(
47    universes=[
48        no_screening,
49        screening
50    ]
51)
52
53# Run model
54result = model.run(
55    n=1e5,
56    seed=123,
57    event_ages=range(0, 101, 1),
58    duration_ages=range(0, 101, 1),
59    event_years=range(1975, 2075, 1),
60    duration_years=range(1975, 2075, 1),
61    verbose=True
62)

The complete code used in this tutorial is shown at the bottom of this page, you may also download the .py file cea_example_run.py.

Data file

In order to calculate the costs/effects of strategies, they must be specified in a data file. A data file consists of up to three elements (one of them is optional).

Labels

The post-processing code allows to calculate multiple quantities. For example the costs, the (Quality-Adjusted) Life-Years (QALY/LY) or the number of hysterectomies in each Universe. The name of each labels to be measured has to be specified in the parameter labels in a list.

65labels = ("costs", "LY", "QALY", "n_hysterectomies")

Discount rates (optional)

The data can be discounted (as will be discussed later in this tutorial). It is possible to either specify one discount rate for all quantities or specify separate rates for each label. In the latter case, the rates have to be specified in the data file with a list:

66discount_rates = (0.03, 0.03, 0.03, 0.00)

Note that the order of the rates must correspond with the order in labels. If the discount rate is the same for all quantities, discount_rates can simply be a floating point number.

Costs and effects data

In the data file, costs and effects should be specified for tags that are logged in the events, durations and snapshots tables. The costs/effects for a tag are specified in one or multiple data sets. The use of multiple datasets in the costs and effects specification allows varying costs and utilities by subpopulation. Each data set is a dictionary within the dictionary ce which consists of up to five elements: conditions, durations, events, snapshot_years and snapshot_ages.

We will discuss the following example:

68# We declare a cost-effectiveness dataset, associating tags of interest with benefits and harms.
69# The dataset consists of two parts: lifelong costs/utilities and utilities for women aged 45-.
70cea_data = {
71    "Lifelong": {
72        "conditions": [],
73        "durations": {
74            # Total number of Life Years
75            "life": (0, 1, 1, 0),
76        },
77
78        "events": {
79            # Costs of treatment
80            "ec_clinical": (35763, 0, 0, 1),
81            "ec_screening_hysterectomy": (15276, 0, 0, 1),
82
83            # Count number of deaths in population
84            ".*death": "deaths"
85        },
86    },
87    "Menopause": {
88        "conditions": [("age", "<", 45)],
89        "durations": {
90            # Disutilities due to surgically induced menopause
91            ("ec_clinical_month1", "ec_screening_hysterectomy_month1"): (0, 0, -0.44, 0),
92            ("ec_clinical_months2-3", "ec_screening_hysterectomy_months2-3"): (0, 0, -0.26, 0),
93            ("ec_clinical_cont", "ec_screening_hysterectomy_cont"): (0, 0, -0.12, 0),
94        },
95        "events": {}
96    },
97}

Conditions

These cost-effectiveness data consist of two parts: Lifelong and Menopause. The Lifelong part does not have any conditions. The Menopause part, however, only applies for events and durations that occurred before age 45. This part contains disutilities (i.e. negative QALYs) caused by surgically induced menopause. Since women aged under 45 likely had not had their menopause yet, these disutilities only apply to events and durations before age 45.

Note

It is allowed to specify multiple conditions in conditions, for example

"conditions": [("universe", "==", "screening"), ("age", "<", 65),  ("year", ">=", 1993)]

would only apply to people in the Universe screening that were younger than 65 in the years after 1993. Conditions can be applied to any column in the durations and events tables (including stratum, cohort, year, age and Universe). The feasible operators are <, <=, ==, !=, >=, >.

Durations and events

Next, both parts specify costs and utility data for various events and duration tags. For example, the Lifelong part specifies that an occurrence of the life tag in the durations yields 0 costs, 1 QALY, 1 LY and 0 hysterectomies. This represents the total number of lifeyears of individuals. Lifelong also specifies the costs of treatment and hysterectomies. The event tags ec_clinical and ec_screening_hysterectomy are associated with costs and with 1 hysterectomy. No (reduction in) lifeyears and QALYs are associated with these tags, however.

Finally, the Lifelong part counts the total number of deaths. By specifying a string instead of a tuple of values, the specified tags are aggregated and counted, without multiplying by costs or utility values. As a result, an extra label deaths will be added to the outputs. Note that we used .*death, which means “all tags that end with ‘death’”. This is an example of regex. Their documentation will show more examples of regex rules.

The Menopause part only specifies disutilities for six duration tags. The first month of surgically induced menopause leads to a 0.44 reduction in yearly QALYs (so effectively a 0.04 QALY reduction). The second and third month lead to a 0.26 reduction in yearly QALYs. After that, there is a 0.12 reduction in yearly QALYs, until the age of 45. Note that these disutilities are the same for hysterectomies due to clinical diagnosis and due to screening. By setting two or more tags within a tuple (in brackets), the same cost/effect values can be assigned to these tags.

Note

In general, the value for QALYs are negative for all tags except “life”.

Note

In this example, costs/effects are specified for durations and events only. In a similar way, costs/effects can also be specified for snapshot_years and snapshot_ages.

Note

It is also possible to specify the disutilities of surgically induced menopause using regex. The following would yield the same results. Note, however, that the use_regex parameter in the cost_effectiveness_analysis() function should be set to True.

# Disutilities due to surgically induced menopause
"ec_.*_month1": (0, 0, -0.44, 0),
"ec_.*_months2-3": (0, 0, -0.26, 0),
"ec_.*_cont": (0, 0, -0.12, 0),

Functions in the cost-effectiveness analysis module

The cost-effectiveness analysis module contains four functions which are discussed in this section. They require the events and durations output from the simulation run as inputs. If the simulation was run in the same script, they can be retrieved from the Result object as follows:

 99# Retrieve the events and durations tables obtained by simulation
100durations = result.durations
101events = result.events

Alternatively, in case the durations and events output were generated by another script, the .csv or .pkl output files can be imported as follows.

import pandas as pd

# Import the (concatenated) events and durations tables obtained by simulation
durations = pd.read_csv("durations.csv")
events = pd.read_csv("events.csv")
import pickle

# Import the (concatenated) events and durations tables obtained by simulation
durations = pickle.load(open('durations.pkl','rb'))
events = pickle.load(open('events.pkl','rb'))

Note

The cost-effectiveness module takes only one events and one durations table as an input. These should contain the durations and events of each Universe you want to evaluate.

Therefore, if your simulation was done with more than one Model, make sure to concatenate all events and durations to one table respectively. You can do so by adding all events/durations to two separate lists and using the concat() function to concatenate all events/durations dataframes:

# If the simulation was done with three models, there are three Result instances.
# First put all events and durations dataframes in separate lists
events_list = [result1.events, result2.events, result3.events]
durations_list = [result1.durations, result2.durations, result3.durations]

# Then concatenate the lists of events and durations into one dataframe
events = pd.concat(events_list)
durations = pd.concat(durations_list)

This note can be ignored if your simulation used multiple Universe instances instead of multiple Model runs.

Calculate costs and effects

The cost_effectiveness_analysis() function calculates the costs and effects for the different simulated Universe instances. It can be used as follows:

103ce_result = cea.cost_effectiveness_analysis(durations=durations,
104                                            events=events,
105                                            cea_data=cea_data,
106                                            cea_labels=labels,
107                                            division_factor=100000,
108                                            discount_by="age",
109                                            discount_start=40,
110                                            discount_rate=discount_rates,
111                                            group_by=["universe"],
112                                            use_regex=True,
113                                            verbose=True
114                                            )

The ce_result object is a CEAResult instance as the output. The CEAResult page shows the elements included in this class. The most relevant result is ce.cea_output, which contains outcomes by type of outcome and Universe. Printing these outcomes will show the table of cost-effectiveness outputs.

116print(ce_result.cea_output)

id

universe

label

number

0

no_screening

costs

8289.900573

1

no_screening

LY

60.960182

2

no_screening

QALY

60.898495

3

no_screening

nr_hysterectomy

0.359300

4

no_screening

deaths

1.000000

5

screening

costs

15376.644948

6

screening

LY

61.111235

7

screening

QALY

60.211758

8

screening

nr_hysterectomy

0.853680

9

screening

deaths

1.000000

The cost_effectiveness_analysis() function requires the following parameters:

  • cea_data and cea_labels should be used to specify the cea input data (cea_data and labels) from the data file.

  • Simulation output, such as the durations and events tables that were generated by the simulation, and/or the snapshots_ages and snapshots_years dataframes should be specified. Alternatively, the entire result object can be supplied, from which the output dataframes will be harvested.

Additionally, the function has a large variety of optional parameters.

  • Outcomes can be discounted by age or year by setting discount_by to either “year” or “age”. This also requires setting a first year/age to start discounting as discount_start, as well as a discount_rate, which may either be a single floating point rate or an iterable with a rate for each of the outcomes in cea_labels.

  • division_factor scales the CEA results by dividing all outcomes by the specified factor. For instance, to get results per individual alive at the start of simulation, specify the population size n of your simulation. Alternatively, to scale to a given number of individuals alive at the start of simulation, use n divided by the scale. For example if you want to report result per 1,000 individuals.

  • Next, it is possible to set a reference value for each of the quantities. Then the costs/effects of all other universes are calculated with respect to the reference Universe, i.e. the costs/effects of the reference are subtracted from each other Universe. It can be specified either by a name of a Universe in the durations/events tables or by a tuple with explicit values. For example the following two specifications would give the same result:

    118# Now we include a reference strategy. There are two methods:
    119reference_strategy = (8289.90057, 60.96018, 60.89849, 0.35930, 1.0)
    120reference_strategy = "no_screening"
    121
    122ce_result = cea.cost_effectiveness_analysis(durations=durations,
    123                                            events=events,
    124                                            cea_data=cea_data,
    125                                            cea_labels=labels,
    126                                            division_factor=1000000,
    127                                            discount_by="age",
    128                                            discount_start=40,
    129                                            discount_rate=discount_rates,
    130                                            group_by=["universe"],
    131                                            use_regex=True,
    132                                            reference=reference_strategy
    133                                            )
    134
    135print(ce_result.cea_output)
    

    id

    universe

    label

    number

    5

    screening

    costs

    708.674438

    6

    screening

    LY

    0.015105

    7

    screening

    QALY

    -0.068674

    8

    screening

    n_hysterectomies

    0.049438

    Note

    If the reference values are specified by a tuple, the values must match the scale per division_factor.

  • Costs and utilities can be grouped by variables other than Universe by specifying the group_by parameter. The following example will show the costs and effects per Universe and per year. The outcomes can be grouped by universe, cohort, stratum, year and tag (the labels of events and durations), and any of their combinations.

    137# Now we group the results by universe and year only.
    138group_by = ["universe", "year"]
    139ce_result = cea.cost_effectiveness_analysis(durations=durations,
    140                                            events=events,
    141                                            cea_data=cea_data,
    142                                            cea_labels=labels,
    143                                            division_factor=1000000,
    144                                            discount_by="age",
    145                                            discount_start=40,
    146                                            discount_rate=discount_rates,
    147                                            group_by=group_by,
    148                                            use_regex=True)
    149print(ce_result.cea_output)
    

    id

    universe

    year

    label

    number

    1

    no_screening

    0.0

    LY

    0.099671

    2

    no_screening

    0.0

    QALY

    0.099671

    4

    no_screening

    0.0

    deaths

    0.000649

    5

    no_screening

    1.0

    costs

    0.178815

    6

    no_screening

    1.0

    LY

    0.099331

    1001

    screening

    99.0

    LY

    0.000386

    1002

    screening

    99.0

    QALY

    0.000386

    1003

    screening

    99.0

    n_hysterectomies

    0.000005

    1004

    screening

    99.0

    deaths

    0.000819

    1009

    screening

    100.0

    deaths

    0.001797

  • The boolean verbose, set to True by default, will warn you of any tags that were in the simulation results but were not included in the cea_data dictionary. It will also communicate the progress of the cost effectiveness analysis.

    For the events, durations and snapshots tables separately, two types of tags are printed. First, unused tags are those that are specified in the data file, but are not used in the durations/events tables. Second, unspecified tags are those that are not specified in the data file, but are used in the durations/events tables. This option is useful when generating a new data file, however it also takes extra running time. Even though the option defaults to True, it is advised to turn it to False in final runs.

    Note

    When you use regex to associate tags with outcomes and effects, make sure you set the option use_regex to True!

Finding dominating universes

The dataframe that results from cost_effectiveness_analysis() contains the costs and effects of each Universe. If each Universe is generated with different screening strategies, one may be interested in the dominating Universe set. The functions find_pareto() and find_efficient() find the strongly and weakly dominating universes respectively.

The functions expect that you have somewhat transformed your ce.cea_output data, with the labels of your outcomes as columns. We can easily get such a table by using the pandas.pivot_table function. This has the added benefit that we can specify fill_value=0 to fill in any outcomes that do not occurr for each Universe:

152# Pivot results by universe to search for efficient universes
153universe_table = pd.pivot_table(ce_result.cea_output, aggfunc="sum",
154                                columns="label", values="number", index="universe")
155print(universe_table)

Note

pd.pivot_table by default calculates the mean across the index values (in this case the different Universe instances, supply aggfunc=”sum” to get the sum of all values in the number column.

This gives us the right type of table:

universe

costs

LY

QALY

nr_hysterectomy

deaths

no_screening

828.990057

6.096018

6.089849

0.035930

0.1

screening

1537.664495

6.111124

6.021176

0.085368

0.1

We can then supply this table to our functions that determine the efficiency frontiers, containing each efficient Universe.

156# Here we search for the dominating strategies
157dominating_strategies = cea.find_pareto(universe_table)
158efficient_strategies = cea.find_efficient(universe_table)
159print(efficient_strategies)

universe

costs

LY

QALY

nr_hysterectomy

deaths

icer

no_screening

828.990057

6.096018

6.089849

0.03593

0.1

-inf

Note

By default, the functions search for the Universe instances that minimise the costs and maximise the QALYs. However, users may alter these choices by specifying costs_label and effects_label respectively. For example, Universe instances that minimise the number of hysterectomies and maximise the LY are found by

dominating_strategies = find_pareto(universe_table, costs_label="nr_hysterectomy", effects_label="LY")

Note

The find_efficient() function also has a near_efficiency_margin parameter. This parameter takes a number between 0 and 1. Strategies of which the effects_label is within near_efficiency_margin * 100% of the frontier are considered near-efficient and included in the frontier.

efficient_strategies = find_pareto(universe_table, near_efficiency_margin=0.02)

Note

If you performed a PSA as part of your cost effectiveness analysis, you may be interested in the efficient frontier and the ICERs for each of the random draws. This allows you to estimate the probability per your stochastic sensitivity analysis of your strategy being on the frontier, as well as the probability that your ICER is below a given threshold. The find_efficient() function will automatically recognize that you have output from a PSA, if you have grouped your cea_output by both Universe and psa_set, with psa_set as the first index level:

psa_table = pd.pivot_table(ce.cea_output, columns="label", values="number", index=["psa_set", "universe"])
dominating_strategies_psa = find_pareto(universe_table)

Scatter plot of universes

Finally it is possible to make a scatter plot of the costs and effects of the calculated Universe with:

161# The scatter_ce function allows us to plot the universes on the cost-QALY plane
162cea.scatter_ce(universe_table, x_label="costs", y_label="QALY", tags="index")
163cea.show()

This code generates a scatter plot showing the costs of each Universe on the horizontal axis and QALYs on the vertical axis. Note that it is possible to vary the quantities on each of the axes with the parameters x_label and y_label.

Additionally, it is possible to show a string with each scatter dot in the graph by specifying the parameter tags. The value index will add the value of the index of the specified DataFrame to the dot (usually this is the name of the Universe). The value number will assign a number to each of the dots, this may improve the readability.

Finally, the function takes kwargs that are passed to matplotlib, allowing the plots to be styled.

The figure below shows the result. Note that in your analysis the plane may be populated by more Universe outcomes than only the two generated by our example run.

../../../../_images/scatter_all.png

Probabilistic Sensitivity Analysis (PSA)

In the above examples, costs and effects were specified as a single value. However, sometimes it can be useful to perform sensitivity analyses on costs and effects of one or multiple tags. For that, you can specify a probability distribution in place of the single value in the input dictionary:

165# Next, we do an example of a probabilistic sensitivity analysis
166labels = ["costs", "LY", "QALY", "n_hysterectomy"]
167discount_rates = (0.03, 0.01, 0.01, 0.00)
168
169# PSA-like input
170# We declare a cost-effectiveness dataset, associating tags of interest with costs and effects
171psa_data = {
172    "Lifelong": {
173        "conditions": [],
174        "durations": {
175            # Total number of Life Years
176            "life": (0, 1, (1, 0.8) * 1000, 0),
177        },
178        "events": {
179            # Costs of treatment
180            "ec_clinical": (("norm", {"loc": 35763, "scale": 5960}), 0, 0, 1),
181            "ec_screening_hysterectomy": (("norm", {"loc": 15276, "scale": 2546}), 0, 0, 1),
182
183            # Count number of deaths in population
184            ".*death": "deaths"
185        },
186    },
187    "Menopause": {
188        "conditions": [("age", "<", 45)],
189        "durations": {
190            # Disutilities due to surgically induced menopause
191            ("ec_clinical_month1", "ec_screening_hysterectomy_month1"): (0, 0, -0.44, 0),
192            ("ec_clinical_months2-3", "ec_screening_hysterectomy_months2-3"): (0, 0, -0.26, 0),
193            ("ec_clinical_cont", "ec_screening_hysterectomy_cont"):
194                (0, 0, ("uniform", {"loc": -0.12, "scale": 0.10}), 0),
195        },
196        "events": {}
197    },
198}

In this example, the disutilities for the durations ec_clinical_month1 and ec_screening_hysterectomy_month1 follow a normal distribution with mean -0.44 and standard deviation 0.05. Disutilities of ec_clinical_month23 and ec_screening_hysterectomy_month23 follow a uniform distribution with lowerbound -0.39 and upperbound 0.13. Any distribution available in scipy.stats can be used. All distributions and their parameters are listed here.

Another option is to specify a tuple of possible values for costs and effects as a user. An example of this is shown for event tag ec_clinical_cont and ec_screening_hysterectomy_cont. Disutilities for these durations alternate between -0.16 and 0.08.

Note

For larger analyses, like a PSA, efficiency is essential. To speed up your run, consider removing any unnecessary tags from the events dataframe before using it in your PSA.

We then need to tailor our inputs to the cost_effectiveness_analysis() function to perform a Probabilistic Sensitivity Analysis (PSA). To do so, set the parameter psa_n_samples to the desired number of draws from the distributions defined for each of the inputs. When there are user-supplied values for each of the draws (i.e. an array of a given length n was supplied to perform the PSA), make sure psa_n_samples matches the length of the user-supplied draws. To make sure your PSA draws are reproducible, set a value for psa_seed.

200# Calculate costs and effects for each PSA draw
201ce_result = cea.cost_effectiveness_analysis(result=result,
202                                            cea_data=psa_data, cea_labels=labels,
203                                            division_factor=1e5, discount_by="age",
204                                            discount_start=40,
205                                            discount_rate=discount_rates,
206                                            group_by=["universe"],
207                                            psa_n_samples=2000,
208                                            psa_seed=42195)
209print(ce_result.cea_output)

We are now returned a result DataFrame with separate outputs by label and Universe for each of the possible parameter draws out of the 2000 we simulated. The distribution of your outputs across these draws should inform you of the uncertainty surrounding your initial estimate.

id

universe

psa_set

label

number

0

no_screening

0

costs

8171.210975

1

no_screening

0

LY

69.894512

2

no_screening

0

QALY

69.846467

3

no_screening

0

n_hysterectomy

0.359300

5

no_screening

1

costs

7501.383817

19993

screening

1998

n_hysterectomy

0.853680

19995

screening

1999

costs

13180.019709

19996

screening

1999

LY

70.150378

19997

screening

1999

QALY

55.839194

19998

screening

1999

n_hysterectomy

0.853680

Note

The option of performing a PSA also allows for different outputs to be included in the CEAResult object.

Setting return_psa_seeds to True, the seeds for each of the different inputs are returned. The user-supplied seed is used to generate these seeds, to ensure that each input always starts the draw from the same point in the random number stream.

Setting return_psa_values to True, will return a DataFrame object with the exact values for each PSA draw and input tag. This can be valuable in plotting the relationship between your aggregated outcomes for each draw, and the values of the inputs in each of these draws. Depending on the number of inputs used and the value of psa_n_samples, this may be a memory-intensive DataFrame object.

Setting return_descriptives to True, will return a DataFrame object with descriptive sample statistics of each of the inputs, across all of the PSA draws. This may be valuable in evaluating whether the specified distribution for an input is resulting in the intended sample distribution, and whether psa_n_samples is sufficient to reduce the discrepancy between the sample distribution and the intended distribution.

Univariate Sensitivity Analysis (USA)

Alternatively, you may be interested in the effect of individual inputs, changed one at a time. To this end, the same dataset used for performing the PSA may be used for conducting a Univariate Sensitivity Analysis (USA). To do so, set perform_usa to True. Rather than drawing all the values simultaneously, every draw will only have one value changed at a time. In the DataFrame CEAResult.usa_values, you can find which value was changed to which input for each of the usa_set values in CEAResult.cea_output.

By default, the values are each set at their 2.5%, 5%, 25%, 50%, 75%, 95% and 97.5% percentile values, per the distribution used for the PSA. For PSA inputs for which a list of n values was supplied, the sample quantile values are used. The quantiles used can be specified with the argument usa_quantiles if needed.

An example of performing a Univariate Sensitivity Analysis is given below.

211# Alternatively, we may perform a Univariate Sensitivity Analysis, in which values
212# are changed one by one.
213ce_result = cea.cost_effectiveness_analysis(result=result,
214                                            cea_data=psa_data, cea_labels=labels,
215                                            division_factor=1e5, discount_by="age",
216                                            discount_start=40,
217                                            discount_rate=discount_rates,
218                                            reference="no_screening",
219                                            group_by=["universe"],
220                                            perform_usa=True)
221# The output is now grouped by `usa_set` rather than `psa_set`
222print(ce_result.cea_output)
223
224# The `usa_values` DataFrame shows which input corresponds to which `usa_set`
225print(ce_result.usa_values)

Show full output

The output is now given by:

idx

universe

usa_set

label

number

140

screening

0

costs

6624.022297

141

screening

0

LY

0.255866

142

screening

0

QALY

-0.283104

143

screening

0

n_hysterectomy

0.494380

145

screening

1

costs

6698.415765

273

screening

26

n_hysterectomy

0.494380

275

screening

27

costs

7086.744375

276

screening

27

LY

0.255866

277

screening

27

QALY

-0.602420

278

screening

27

n_hysterectomy

0.494380

Specified by the usa_values DataFrame, where it is specified which input is set to which value (corresponding to a particular quantile of its distribution) to generate the input related to the usa_set index:

id

usa_ceaset

usa_table

usa_tag

usa_label

usa_value

usa_quantile

0

Lifelong

events

ec_clinical

costs

47444.385348

0.025

1

Lifelong

events

ec_clinical

costs

45566.327617

0.050

2

Lifelong

events

ec_clinical

costs

39782.958911

0.250

3

Lifelong

events

ec_clinical

costs

35763.000000

0.500

4

Lifelong

events

ec_clinical

costs

31743.041089

0.750

5

Lifelong

events

ec_clinical

costs

25959.672383

0.950

6

Lifelong

events

ec_clinical

costs

24081.614652

0.975

7

Lifelong

events

ec_screening_hysterectomy

costs

20266.068305

0.025

8

Lifelong

events

ec_screening_hysterectomy

costs

19463.797334

0.050

9

Lifelong

events

ec_screening_hysterectomy

costs

16993.250904

0.250

10

Lifelong

events

ec_screening_hysterectomy

costs

15276.000000

0.500

11

Lifelong

events

ec_screening_hysterectomy

costs

13558.749096

0.750

12

Lifelong

events

ec_screening_hysterectomy

costs

11088.202666

0.950

13

Lifelong

events

ec_screening_hysterectomy

costs

10285.931695

0.975

14

Lifelong

durations

life

QALY

0.800000

0.025

15

Lifelong

durations

life

QALY

0.800000

0.050

16

Lifelong

durations

life

QALY

0.800000

0.250

17

Lifelong

durations

life

QALY

0.900000

0.500

18

Lifelong

durations

life

QALY

1.000000

0.750

19

Lifelong

durations

life

QALY

1.000000

0.950

20

Lifelong

durations

life

QALY

1.000000

0.975

21

Menopause

durations

(ec_clinical_cont, ec_screening_hysterectomy_cont)

QALY

-0.022500

0.025

22

Menopause

durations

(ec_clinical_cont, ec_screening_hysterectomy_cont)

QALY

-0.025000

0.050

23

Menopause

durations

(ec_clinical_cont, ec_screening_hysterectomy_cont)

QALY

-0.045000

0.250

24

Menopause

durations

(ec_clinical_cont, ec_screening_hysterectomy_cont)

QALY

-0.070000

0.500

25

Menopause

durations

(ec_clinical_cont, ec_screening_hysterectomy_cont)

QALY

-0.095000

0.750

26

Menopause

durations

(ec_clinical_cont, ec_screening_hysterectomy_cont)

QALY

-0.115000

0.950

27

Menopause

durations

(ec_clinical_cont, ec_screening_hysterectomy_cont)

QALY

-0.117500

0.975

Using population reference QALY values for different age- and sex groups

In performing cost-effectiveness analyses, it can often be easy to set decrements for health utility for the particular health states we are simulating, and to set the QALY value for our "life" tag to 1.0. This approach, however, assumes that anyone not experiencing the specific health state, is in perfect health. Especially in older populations, other morbidities are very likely to influence the general level of health utility, causing the average QALY value to never be as high as 1.0. Even younger populations often have an average health utility level around 0.90.

It is therefore crucial to adjust the population reference values for any cost-effectiveness analysis. This could be done by adjusting the "life" tag QALY value to some value below 1.0, but we also include a tool for doing this separately for each age category. For this, the data module includes the population_norms module with the add_population_norms() function, which adjusts your cost-effectiveness input dictionary for the average population health utility, by age and sex. Population norm utilities can be supplied by the user as a dataframe, or can be taken from the internally stored values from the literature for the US, the UK, NL or CH.

In this example, the add_population_norms() function uses the internally stored population norm values for the Dutch population. We can view this data by importing the dataset only.

from miscore.tools.cea.data.population_norms import data
data[(data.Country=="nl") & (data.Sex=="total") & (data.Method=="EQ5D")].head()

Country

Agemin

Agemax

Yearmin

Yearmax

Sex

Utility

SE

Method

nl

0

25

0

2200

total

0.938

0.021

EQ5D

nl

25

35

0

2200

total

0.910

0.012

EQ5D

nl

35

45

0

2200

total

0.922

0.013

EQ5D

nl

45

55

0

2200

total

0.874

0.019

EQ5D

nl

55

65

0

2200

total

0.869

0.027

EQ5D

Note

The method by which the utilities are derived in the literature may also differ, this is what the Method column reports. For most countries, Visual Analogue Scale (VAS) values are included, as well as values derived by means of population valuations of the EQ-5D survey. Although the EQ-5D is the most commonly used method, it may be valuable to your research to perform a sensitivity analysis in which results with both value sets are compared.

Note

Yearmin and Yearmax columns are included, which may report any conditions on the calendar year for which to apply the population health utilities. It could be that survey values were used which are updated over the years, such that separate calendar years are associated with new population health utility values. For the current internally included values, these limits are set to 0 and 2200, since only one survey year is used.

An example call of the add_population_norms() function is given below. We see that we supplied a CEA dataset called ce to this function. For each age category, new entries in ce are created in which the life QALY value, and any other disutility values is adjusted. These values which were previously relative to a health utility value of 1, are now set to be relative to the population norm utility. The adjustment of disutility values depends on the disutility_method parameter. The available methods are:

  • "proportional":

    Disutilities are adjusted proportionally to the difference between the original life utility (set to 1.0) and the population norm utility. Hence, for individuals ages 0 to 25, the disutility for ec_clinical changes from -0.1 to -0.0938.

  • "nominal":

    Disutilities are adjusted by the absolute difference between the life utility and the population norm utility. Now for individuals ages 0 to 25, the disutility for ec_clinical changes from -0.1 to -0.038. Because the adjusted disutility depends on the duration of the disutility, it is not possible to change event disutilities using this method.

  • "nominal_censored":

    Similar to the "nominal" method, but the disutility values are capped at the population norm utility level. This ensures that the simulated disease state does not result in a better health utility than the general population for individuals of the same age. Before using this method, it should be carefully considered whether your diseased population could be healthier than the average individual of the same age. Also similar to the "nominal" method, it is not possible to change event disutilities using this method.

Here is how you might use the add_population_norms() function with these adjustments: Note that the inputs in the new ce dictionary are stored with tuple indexes (a, b), where a is the name of the input dataset, in this case 'x', and b is the row index in the dataframe of population norm utilities.

227# Optionally correct our cost-effectiveness input data for population norm values
228cea_data['Lifelong']['durations']['ec_clinical'] = (0, 0, -0.1, 0)
229ce_input = population_norms.add_population_norms(
230    data_quantities=cea_data,
231    country="nl",
232    sex="total",
233    data_labels=processes.ec_screening.data.cea.labels,
234    disutility_tags=['ec_clinical'],
235    disutility_method="nominal"
236)
237print(ce_input[('Lifelong', 63)]['conditions'])
238print(ce_input[('Lifelong', 63)]['durations'])
[('age', '>=', 0), ('age', '<', 25)]
{'life': (0, 1, 0.938, 0), 'ec_clinical': (0, 0, -0.03799999999999995, 0)}

If you would like the health disutilities to be taken relative to whatever the reference health utility is, simply leave the argument disutility_tags blank to prevent any disutilities from being changed.

It is also possible to supply your own population norm utility values. To do so, use the argument norm_data. Make sure the input is a pandas DataFrame which includes all the columns shown in the example above.

Note

We include also the SE column, which contains standard errors of the health utility values. These may be used in order to generate Univariate or Probabilistic Sensitivity Analysis cea sets.

To generate an adjusted cea dataset that can be used in a Probabilistic Sensitivity Analysis, simply set the argument n_psa_draws to the same n that you will use for your PSA. Normally distributed norm utility values will be drawn, per the standard error in the SE column. The resulting dataset may simply be passed on to cost_effectiveness_analysis, and for each of the n draws the corresponding norm utility will be used. The disutility values for all disutility_tags will be adjusted so as to preserve the same overall utility in the disease-specific state (i.e. if the random draw results in a decrease of the norm utility by 0.1, the disutility will be adjusted by 0.1 to yield the same nominal disease-specific utility). You may simultaneously generate random draws for the disutility tags. To do so, simply specify a distribution for the disutility value as you would for any other PSA input. If a disutility tag is present but no distribution is given, a constant nominal health-specific utility will be assumed (i.e. the sum of the norm utility and the disutility will be held constant), and a warning will be given to report that your disutility is not part of your PSA.

To generate an adjusted cea dataset that can be used in a Univariate Sensitivity Analysis, set return_usa_draws to True. Similar to generating the psa draws, the dataset will now return an array of utility values rather than single floating point values. One at a time, the utility values will be set to varying quantile values per their normal distribution. The quantile values may be supplied through the argument usa_quantiles, and will default to [0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975]. If any disutility tags are associated with a distribution, these will also be adjusted to their quantile values, after accounting for any shift in the norm utility by age. A DataFrame summarizing which USA value corresponds to which draw in the cea dataset is returned, so that the CEA output may be associated with the corresponding univariate changes in the inputs.

Note that for both USA and PSA draws, any outlier in the norm utilities that yields a value outside the domain 0 to 1 will be clipped to the edge values.