---
title: "Bootstrap p-values and confidence intervals for regression models"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{boot_summary}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Background
p-values can be computed by inverting the corresponding confidence intervals, as described in Section 14.2 of [Thulin (2024)](https://www.modernstatisticswithr.com/mathschap.html#confintequal) and Section 3.12 in [Hall (1992)](https://link.springer.com/book/10.1007/978-1-4612-4384-7). This package contains functions for computing bootstrap p-values in this way. The approach relies on the fact that:

- The p-value of the two-sided test for the parameter theta is the smallest alpha such that theta is not contained in the corresponding 1-alpha confidence interval,
- For a test of the parameter theta with significance level alpha, the set of values of theta that aren't rejected by the two-sided test (when used as the null hypothesis) is a 1-alpha confidence interval for theta.


## Summaries for regression models
Summary tables with confidence intervals and p-values for the coefficients of regression models can be obtained using the `boot_summary` (most models) and `censboot_summary` (models with censored response variables) functions. Currently, the following models are supported:

- Linear models fitted using `lm`,
- Generalised linear models fitted using `glm` or `MASS::glm.nb`,
- Nonlinear models fitted using `nls`,
- Robust linear models fitted using `MASS::rlm`,
- Ordered logistic or probit regression models fitted (without weights) using `MASS::polr`,
- Linear mixed models fitted using `lme4::lmer` or `lmerTest::lmer`,
- Generalised linear mixed models fitted using `lme4::glmer`.
- Cox PH regression models fitted using `survival::coxph` (using `censboot_summary`).
- Accelerated failure time models fitted using `survival::survreg` or `rms::psm` (using `censboot_summary`).
- Any regression model such that: `residuals(object, type="pearson")` returns Pearson residuals; `fitted(object)` returns fitted values; `hatvalues(object)` returns the leverages, or perhaps the value 1 which will effectively ignore setting the hatvalues. In addition, the `data` argument should contain no missing values among the columns actually used in fitting the model.

A number of examples are available in Chapters 8 and 9 of [Modern Statistics with R](https://www.modernstatisticswithr.com/).

Here are some simple examples with a linear regression model for the `mtcars` data:

```{r message=FALSE}
library(boot.pval)

# Bootstrap summary of a linear model for mtcars:
model <- lm(mpg ~ hp + vs, data = mtcars)
boot_summary(model)

# Use 9999 bootstrap replicates and adjust p-values for
# multiplicity using Holm's method:
boot_summary(model, R = 9999, adjust.method = "holm")

# Use case resampling instead of residual resampling:
boot_summary(model, method = "case")

# Export results to a gt table:
boot_summary(model, R = 9999) |>
  summary_to_gt()
```

See Davison and Hinkley (1997) for details about residual resampling (the default) and case resampling.

```{r eval = FALSE}
# Export results to a Word document:
library(flextable)
boot_summary(model, R = 9999) |>
  summary_to_flextable() |> 
  save_as_docx(path = "my_table.docx")
```

And a toy example for a generalised linear mixed model (using a small number of bootstrap repetitions):

```{r eval = FALSE}
library(lme4)
model <- glmer(TICKS ~ YEAR + (1|LOCATION),
           data = grouseticks, family = poisson)
boot_summary(model, R = 99)
```

## Speeding up computations
For complex models, speed can be greatly improved by using parallelisation. For `lmer` and `glmer` models, this is set using the `parallel` (available options are `"multicore"` and `"snow"`). The number of CPUs to use is set using `ncpus`.

```{r eval = FALSE}
model <- glmer(TICKS ~ YEAR + (1|LOCATION),
           data = grouseticks, family = poisson)
boot_summary(model, R = 999, parallel = "multicore", ncpus = 10)
```

For other models, use `ncores`:

```{r eval = FALSE}
model <- lm(mpg ~ hp + vs, data = mtcars)
boot_summary(model, R = 9999, ncores = 10)
```

## Survival models
Survival regression models should be fitted using the argument `model = TRUE`. A summary table can then be obtained using `censboot_summary`. By default, the table contains exponentiated coefficients (i.e. hazard ratios, in the case of a Cox PH model).

```{r message = FALSE}
library(survival)
# Weibull AFT model:
model <- survreg(formula = Surv(time, status) ~ age + sex, data = lung,
                dist = "weibull", model = TRUE)
# Table with exponentiated coefficients:
censboot_summary(model)

# Cox PH model:
model <- coxph(formula = Surv(time, status) ~ age + sex, data = lung,
               model = TRUE)
# Table with hazard ratios:
censboot_summary(model)
# Table with original coefficients:
censboot_summary(model, coef = "raw")
```

To speed up computations using parallelisation, use the `parallel` and `ncpus` arguments:

```{r eval = FALSE}
censboot_summary(model, parallel = "multicore", ncpus = 10)
```

## References
* Davison, A.C. and Hinkley, D.V. (1997) _Bootstrap Methods and Their Application_. Cambridge University Press.
* Hall P (1992). _The Bootstrap and Edgeworth Expansion_. Springer, New York.
* Thulin, M. (2024) _Modern Statistics with R_. Second edition. Chapman & Hall/CRC Press.