Many stochastic elements in health economics analysis such as bootstrapping, probabilistic sensitivity analyses, Monte-Carlo simulations etc.
Many stochastic elements in health economics analysis such as bootstrapping, probabilistic sensitivity analyses, Monte-Carlo simulations etc.
Optimising your code may resolve these problems. Also makes it easier to correct error, and makes your code neater and more readable.
Consider a dummy example where we want to estimate the average cost and standard error around this cost of treatment from a sample of 100 patients.
## Create dummy datasetn_sample <- 100df <- data.frame(id = c(1:n_sample), cost = rnorm(n_sample, 50, 3))
id | cost |
---|---|
1 | 48.68526 |
2 | 50.33959 |
3 | 49.67819 |
4 | 49.41089 |
5 | 42.49260 |
To obtain the standard error, we perform bootstrapping with 500 re-samples.
## Create empty vector to store mean cost in each bootstrap sampleboot_mean <- numeric()## Sample without replacement and calculate mean for each bootstrap sample boot_mean[1] <- mean(sample(df$cost, n_sample, replace = FALSE)) boot_mean[2] <- mean(sample(df$cost, n_sample, replace = FALSE)) ... ## Keep repeating this code 500 times...boot_mean[500] <- mean(sample(df$cost, n_sample, replace = FALSE))
To obtain the standard error, we perform bootstrapping with 500 re-samples.
## Create empty vector to store mean cost in each bootstrap sampleboot_mean <- numeric()## Sample without replacement and calculate mean for each bootstrap sample boot_mean[1] <- mean(sample(df$cost, n_sample, replace = FALSE)) boot_mean[2] <- mean(sample(df$cost, n_sample, replace = FALSE)) ... ## Keep repeating this code 500 times...boot_mean[500] <- mean(sample(df$cost, n_sample, replace = FALSE))
To obtain the standard error, we perform bootstrapping with 500 re-samples.
## Create empty vector to store mean cost in each bootstrap sampleboot_mean <- numeric()## Sample without replacement and calculate mean for each bootstrap sample boot_mean[1] <- mean(sample(df$cost, n_sample, replace = FALSE)) boot_mean[2] <- mean(sample(df$cost, n_sample, replace = FALSE)) ... ## Keep repeating this code 500 times...boot_mean[500] <- mean(sample(df$cost, n_sample, replace = FALSE))
To obtain the standard error, we perform bootstrapping with 500 re-samples.
## Create empty vector to store mean cost in each bootstrap sampleboot_mean <- numeric()## Sample without replacement and calculate mean for each bootstrap sample boot_mean[1] <- mean(sample(df$cost, n_sample, replace = FALSE)) boot_mean[2] <- mean(sample(df$cost, n_sample, replace = FALSE)) ... ## Keep repeating this code 500 times...boot_mean[500] <- mean(sample(df$cost, n_sample, replace = FALSE))
Wrapping the task you want to run repetitively in a function
calculate_boot_mean <- function(data, n, replacement){ mean(sample(data, n, replace = replacement)) }
Using a for
loop to perform task iteratively
boot_mean <- numeric()for(i in c(1:500)){ boot_mean[i] <- calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)}
Wrapping the task you want to run repetitively in a function
calculate_boot_mean <- function(data, n, replacement){ mean(sample(data, n, replace = replacement)) }
Using a for
loop to perform task iteratively
boot_mean <- numeric()for(i in c(1:500)){ boot_mean[i] <- calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)}
✓ Easier to debug and correct error
Wrapping the task you want to run repetitively in a function
calculate_boot_mean <- function(data, n, replacement){ mean(sample(data, n, replace = replacement)) }
Using a for
loop to perform task iteratively
boot_mean <- numeric()for(i in c(1:500)){ boot_mean[i] <- calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)}
✓ Easier to debug and correct error
? What is R actually doing in each iteration of the for
loop?
for
loopsboot_mean <- numeric()for(i in c(1:500)){ boot_mean[i] <- calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)}
boot_mean
and re-allocating memory for iti
created and gets updated in the global environment for
loopsboot_mean <- numeric()for(i in c(1:500)){ boot_mean[i] <- calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)}
boot_mean
and re-allocating memory for iti
created and gets updated in the global environment boot_mean <- numeric(500)for(i in c(1:500)){ boot_mean[i] <- calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)}
✓ Pre-allocate a vector that fits all values to avoid re-sizing/re-allocating memory
apply()
family†for
loop internally in Cboot_mean <- lapply(c(1:500), function(i){ calculate_boot_mean(data = df$cost, n = n_sample, replacement = TRUE)})
[†] Read more about the apply()
family of functions in Carlo Fanara | Datacamp tutorial on the R Apply Family. Functional programming can also be implemented using the map()
family of functions in purrr
. More details in Garrett Grolemund & Hadley Wickham | R for Data Science - Chapter 21 Iteration
apply()
family†for
loop internally in Cboot_mean <- lapply(c(1:500), function(i){ calculate_boot_mean(data = df$cost, n = n_sample, replacement = TRUE)})
[†] Read more about the apply()
family of functions in Carlo Fanara | Datacamp tutorial on the R Apply Family. Functional programming can also be implemented using the map()
family of functions in purrr
. More details in Garrett Grolemund & Hadley Wickham | R for Data Science - Chapter 21 Iteration
✓ No need to pre-define vector for storing results
✓ Variables in working environment unaffected
✓ Makes parallelisation easy
? Is it faster?
? Is it faster?
boot_mean <- numeric(500) for(i in c(1:500)){ boot_mean[i] <- calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)}
Median time taken = 10.16ms
boot_mean <- lapply(c(1:500), function(i){ calculate_boot_mean(data = df$cost, n = n_sample, replacement = TRUE)})
Median time taken = 8.54ms
? Is it faster?
boot_mean <- numeric(500) for(i in c(1:500)){ boot_mean[i] <- calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)}
Median time taken = 10.16ms
boot_mean <- lapply(c(1:500), function(i){ calculate_boot_mean(data = df$cost, n = n_sample, replacement = TRUE)})
Median time taken = 8.54ms
Sometimes
vector1 <- c(1:4); vector2 <- c(6:9)## Element-wise computationsum <- numeric(length(vector1))for(i in seq_along(vector1)) { sum[i] <- vector1[i] + vector2[i]}## Vectorised computation vector1 + vector2
Suppose we want to test in our dummy cost dataset if the cost of treatment is more than £50.
## Element-wise computationlapply(seq_along(df$cost), function(i){df$cost[i] > 50})
Median time taken = 141.59µs
## Vectorised computation df$cost > 50
Median time taken = 1.51µs
Some useful resources on parallelising
boot_mean <- furrr::future_map(c(1:500), function(i){ calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)})
Median time taken = 23.65ms (previously 8.54ms)
furrr::future_map(seq_along(df$cost), function(i){df$cost[i] > 50})
Median time taken = 12.01ms (previously 0.14159ms)
boot_mean <- furrr::future_map(c(1:500), function(i){ calculate_boot_mean( data = df$cost, n = n_sample, replacement = TRUE)})
Median time taken = 23.65ms (previously 8.54ms)
furrr::future_map(seq_along(df$cost), function(i){df$cost[i] > 50})
Median time taken = 12.01ms (previously 0.14159ms)
? Parallelising here is actually SLOWER?
[†] Read more about this in Imre Gera | Parallelization caveats in R #1: performance issues
df_cost <- df$costboot_mean <- lapply(c(1:500), function(i){ calculate_boot_mean( data = df_cost, n = n_sample, replacement = TRUE)})
Median time taken = 8.1ms (previously 8.54ms)
e.g. Very simple simulation model (finite combination of states) predicting the state patient is in, and performing costs and QALYs calculation in each cycle
e.g. Very simple simulation model (finite combination of states) predicting the state patient is in, and performing costs and QALYs calculation in each cycle
Sometimes R is just slow. But R has interfaces to other faster languages via packages.
Materials used in this case study is based on the work conducted as part of the 2019 R for Health Economics Hackathon. The team behind this work comprise of:
The R code for the original work is accessible from GitHub.
For an implementation of this work, see Sam Abbott's package SpeedyMarkov
(presentation on 12th October 0910-0935)
Markov model with 10 states (time-homogeneous; all transitions permitted)
n.treatments = 2
| no. of treatment arms n.samples = 25000
| no. of PSA samplesn.cyles = 100
| number of cycles to runExtract of code for Markov model (minimal, non-executable)
# Loop over the treatment optionsfor(i.treatment in 1:n.treatments){ # Extract transition matrix for treatment transition.matrices_tr <- transition.matrices[i.treatment,,,] # Loop over the PSA samples for(i.sample in 1:n.samples){ transition.matrices_tr_sample <- transition.matrices_tr[i.sample,,] # Loop over the cycles for(i.cycle in 2:n.cycles){ # Markov update cohort.vectors[i.treatment, i.sample,i.cycle,]<- cohort.vectors[i.treatment, i.sample,i.cycle-1,] %*% transition.matrices_tr_sample } cohort.vectors_tr_sample <- cohort.vectors[i.treatment,i.sample,,] }}
Tx | PSA sample | Cycle | Median time across 20 iterations |
---|---|---|---|
for loop | for loop | for loop | 16.1s |
lapply()
over treatments# Functional loop over treatment optionslapply(c(1:n.treatments), function(i.treatment){ transition.matrices_tr <- transition.matrices[i.treatment,,,] ## Dimension reduction cohort.vectors_tr <- cohort.vectors[i.treatment,,,] for(i.sample in 1:n.samples){ transition.matrices_tr_sample <- transition.matrices_tr[i.sample,,] ## Dimension reduction cohort.vectors_tr_sample <- cohort.vectors_tr[i.sample,,] for(i.cycle in 2:n.cycles){ cohort.vectors_tr_sample[i.cycle,] <- cohort.vectors_tr_sample[i.cycle-1,] %*% transition.matrices_tr_sample } }})
Tx | PSA sample | Cycle | Median time across 20 iterations |
---|---|---|---|
for loop | for loop | for loop | 16.1s |
lapply | for loop | for loop | 11.3s |
lapply()
over PSA samplesfor(i.treatment in c(1:n.treatments)){ transition.matrices_tr <- transition.matrices[i.treatment,,,] cohort.vectors_tr <- cohort.vectors[i.treatment,,,] # Functional loop over PSA samples lapply(c(1:n.samples), function(i.sample){ transition.matrices_tr_sample <- transition.matrices_tr[i.sample,,] cohort.vectors_tr_sample <- cohort.vectors_tr[i.sample,,] for(i.cycle in 2:n.cycles){ cohort.vectors_tr_sample[i.cycle,] <- cohort.vectors_tr_sample[i.cycle-1,] %*% transition.matrices_tr_sample } }) }
Tx | PSA sample | Cycle | Median time across 20 iterations |
---|---|---|---|
for loop | for loop | for loop | 16.1s |
lapply | for loop | for loop | 11.3s |
for loop | lapply | for loop | 14.3s |
for(i.treatment in c(1:n.treatments)){ transition.matrices_tr <- transition.matrices[i.treatment,,,] cohort.vectors_tr <- cohort.vectors[i.treatment,,,] # Parallelised over PSA samples furrr::future_map(c(1:n.samples), function(i.sample){ transition.matrices_tr_sample <- transition.matrices_tr[i.sample,,] cohort.vectors_tr_sample <- cohort.vectors_tr[i.sample,,] for(i.cycle in 2:n.cycles){ cohort.vectors_tr_sample[i.cycle,] <- cohort.vectors_tr_sample[i.cycle-1,] %*% transition.matrices_tr_sample } }) }
Tx | PSA sample | Cycle | Median time across 20 iterations |
---|---|---|---|
for loop | for loop | for loop | 16.1s |
lapply | for loop | for loop | 11.3s |
for loop | lapply | for loop | 14.3s |
for loop | parallel | for loop | 14.8s |
lapply()
over treatments & vectorised operation over samples# Functional loop over the treatment optionslapply(c(1:n.treatments), function(i.treatment){ transition.matrices_tr <- transition.matrices[i.treatment,,,] cohort.vectors_tr <- cohort.vectors[i.treatment,,,] for(i.cycle in 2:n.cycles){ ## Reformulated such that vectorised operation over PSA samples for(i.state in 1:n.states){ cohort.vectors_tr[ ,i.cycle,i.state]<- rowSums(cohort.vectors_tr[ ,i.cycle-1, ] * transition.matrices_tr[,,i.state]) } }})
Tx | PSA sample | Cycle | Median time across 20 iterations |
---|---|---|---|
for loop | for loop | for loop | 16.06s |
lapply | for loop | for loop | 11.28s |
for loop | lapply | for loop | 14.27s |
for loop | parallel | for loop | 14.79s |
lapply | vectorised | for loop | 9.21s |
lapply()
over treatments & rcpp_loop()
over cycles# Functional loop over the treatment optionslapply(c(1:n.treatments), function(i.treatment){ transition.matrices_tr <- transition.matrices[i.treatment,,,] cohort.vectors_tr <- cohort.vectors[i.treatment,,,] for(i.sample in 1:n.samples){ transition.matrices_tr_sample <- transition.matrices_tr[i.sample,,] cohort.vectors_tr_sample <- cohort.vectors_tr[i.sample,,] # Loop over the cycles using Rcpp cohort.vectors_tr_sample <- rcpp_loop(mat_in = cohort.vectors_tr_sample, transition = transition.matrices_tr_sample, n = n.cycles) }})
Tx | PSA sample | Cycle | Median time across 20 iterations |
---|---|---|---|
for loop | for loop | for loop | 16.06s |
lapply | for loop | for loop | 11.28s |
for loop | lapply | for loop | 14.27s |
for loop | parallel | for loop | 14.79s |
lapply | vectorised | for loop | 9.21s |
lapply | for loop | rcpp | 5.85s |
Functionals | Vectorising | Parallelising etc.
Functionals | Vectorising | Parallelising etc.
Always aim for readability | Profile your code to find bottlenecks†
[†] One way to easily profile your code is with profvis
. More about code profiling in Hadley Wickham | Advanced R - Chapter 23 Measuring performance
Functionals | Vectorising | Parallelising etc.
Always aim for readability | Profile your code to find bottlenecks†
[†] One way to easily profile your code is with profvis
. More about code profiling in Hadley Wickham | Advanced R - Chapter 23 Measuring performance
First make it work correctly. Then, perhaps.
Functionals | Vectorising | Parallelising etc.
Always aim for readability | Profile your code to find bottlenecks†
[†] One way to easily profile your code is with profvis
. More about code profiling in Hadley Wickham | Advanced R - Chapter 23 Measuring performance
First make it work correctly. Then, perhaps.
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. — Donald Knuth
Gillespie, C. and R. Lovelace (2017). Efficient R Programming : A Practical Guide to Smarter Programming. Eng. First. Sebastopol, CA: O'Reilly. ISBN: 978-1-4919-5078-4.
Hadley Wickham and Garrett Grolemund (2016). R for Data Science. Eng. 1st. O'Reilly Media, Inc..
Wickham, H. (2019). Advanced R. 2nd edition.. Chapman & Hall/CRC the r Series. Boca Raton. ISBN: 978-1-351-20129-2.
Many stochastic elements in health economics analysis such as bootstrapping, probabilistic sensitivity analyses, Monte-Carlo simulations etc.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |