Package 'cv' reference manual

Package 'cv'

Title:	Cross-Validating Regression Models
Description:	Cross-validation methods of regression models that exploit features of various modeling functions to improve speed. Some of the methods implemented in the package are novel, as described in the package vignettes; for general introductions to cross-validation, see, for example, Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (2021, ISBN 978-1-0716-1417-4, Secs. 5.1, 5.3), "An Introduction to Statistical Learning with Applications in R, Second Edition", and Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009, ISBN 978-0-387-84857-0, Sec. 7.10), "The Elements of Statistical Learning, Second Edition".
Authors:	John Fox [aut] , Georges Monette [aut, cre]
Maintainer:	Georges Monette <[email protected]>
License:	GPL (>= 2)
Version:	2.0.3
Built:	2025-03-12 15:26:22 UTC
Source:	https://github.com/gmonette/cv

Title:

Cross-Validating Regression Models

Description:

Cross-validation methods of regression models that exploit features of various modeling functions to improve speed. Some of the methods implemented in the package are novel, as described in the package vignettes; for general introductions to cross-validation, see, for example, Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (2021, ISBN 978-1-0716-1417-4, Secs. 5.1, 5.3), "An Introduction to Statistical Learning with Applications in R, Second Edition", and Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009, ISBN 978-0-387-84857-0, Sec. 7.10), "The Elements of Statistical Learning, Second Edition".

Authors:

John Fox [aut]

, Georges Monette [aut, cre]

Maintainer:

Georges Monette <[email protected]>

License:

GPL (>= 2)

Version:

2.0.3

Built:

2025-03-12 15:26:22 UTC

Source:

https://github.com/gmonette/cv

Help Index

Cross-Validate Regression Models

Description

cv() is a parallelized generic k-fold (including n-fold, i.e., leave-one-out) cross-validation function, with a default method, specific methods for linear and generalized-linear models that can be much more computationally efficient, and a method for robust linear models. There are also cv() methods for mixed-effects models, for model-selection procedures, and for several models fit to the same data, which are documented separately.

Usage

cv(model, data, criterion, k, reps = 1L, seed, ...)

## Default S3 method:
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = 10L,
  reps = 1L,
  seed = NULL,
  criterion.name = deparse(substitute(criterion)),
  details = k <= 10L,
  confint = n >= 400L,
  level = 0.95,
  ncores = 1L,
  type = "response",
  start = FALSE,
  model.function,
  ...
)

## S3 method for class 'lm'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = 10L,
  reps = 1L,
  seed = NULL,
  details = k <= 10L,
  confint = n >= 400L,
  level = 0.95,
  method = c("auto", "hatvalues", "Woodbury", "naive"),
  ncores = 1L,
  ...
)

## S3 method for class 'glm'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = 10L,
  reps = 1L,
  seed = NULL,
  details = k <= 10L,
  confint = n >= 400L,
  level = 0.95,
  method = c("exact", "hatvalues", "Woodbury"),
  ncores = 1L,
  start = FALSE,
  ...
)

## S3 method for class 'rlm'
cv(model, data, criterion, k, reps = 1L, seed, ...)

## S3 method for class 'cv'
print(x, digits = getOption("digits"), ...)

## S3 method for class 'cv'
summary(object, digits = getOption("digits"), ...)

## S3 method for class 'cvList'
print(x, ...)

## S3 method for class 'cvList'
summary(object, ...)

## S3 method for class 'cv'
plot(x, y, what = c("CV criterion", "coefficients"), ...)

## S3 method for class 'cvList'
plot(
  x,
  y,
  what = c("adjusted CV criterion", "CV criterion"),
  confint = TRUE,
  ...
)

cvInfo(object, what, ...)

## S3 method for class 'cv'
cvInfo(
  object,
  what = c("CV criterion", "adjusted CV criterion", "full CV criterion", "confint", "SE",
    "k", "seed", "method", "criterion name"),
  ...
)

## S3 method for class 'cvModList'
cvInfo(
  object,
  what = c("CV criterion", "adjusted CV criterion", "full CV criterion", "confint", "SE",
    "k", "seed", "method", "criterion name"),
  ...
)

## S3 method for class 'cvList'
cvInfo(
  object,
  what = c("CV criterion", "adjusted CV criterion", "full CV criterion", "confint", "SE",
    "k", "seed", "method", "criterion name"),
  ...
)

## S3 method for class 'cv'
as.data.frame(
  x,
  row.names = NULL,
  optional = TRUE,
  rows = c("cv", "folds"),
  columns = c("criteria", "coefficients"),
  ...
)

## S3 method for class 'cvList'
as.data.frame(x, row.names = NULL, optional = TRUE, ...)

## S3 method for class 'cvDataFrame'
print(x, digits = getOption("digits") - 2L, ...)

## S3 method for class 'cvDataFrame'
summary(
  object,
  formula,
  subset = NULL,
  fun = mean,
  include = c("cv", "folds", "all"),
  ...
)
cv(model, data, criterion, k, reps = 1L, seed, ...)

## Default S3 method:
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = 10L,
  reps = 1L,
  seed = NULL,
  criterion.name = deparse(substitute(criterion)),
  details = k <= 10L,
  confint = n >= 400L,
  level = 0.95,
  ncores = 1L,
  type = "response",
  start = FALSE,
  model.function,
  ...
)

## S3 method for class 'lm'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = 10L,
  reps = 1L,
  seed = NULL,
  details = k <= 10L,
  confint = n >= 400L,
  level = 0.95,
  method = c("auto", "hatvalues", "Woodbury", "naive"),
  ncores = 1L,
  ...
)

## S3 method for class 'glm'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = 10L,
  reps = 1L,
  seed = NULL,
  details = k <= 10L,
  confint = n >= 400L,
  level = 0.95,
  method = c("exact", "hatvalues", "Woodbury"),
  ncores = 1L,
  start = FALSE,
  ...
)

## S3 method for class 'rlm'
cv(model, data, criterion, k, reps = 1L, seed, ...)

## S3 method for class 'cv'
print(x, digits = getOption("digits"), ...)

## S3 method for class 'cv'
summary(object, digits = getOption("digits"), ...)

## S3 method for class 'cvList'
print(x, ...)

## S3 method for class 'cvList'
summary(object, ...)

## S3 method for class 'cv'
plot(x, y, what = c("CV criterion", "coefficients"), ...)

## S3 method for class 'cvList'
plot(
  x,
  y,
  what = c("adjusted CV criterion", "CV criterion"),
  confint = TRUE,
  ...
)

cvInfo(object, what, ...)

## S3 method for class 'cv'
cvInfo(
  object,
  what = c("CV criterion", "adjusted CV criterion", "full CV criterion", "confint", "SE",
    "k", "seed", "method", "criterion name"),
  ...
)

## S3 method for class 'cvModList'
cvInfo(
  object,
  what = c("CV criterion", "adjusted CV criterion", "full CV criterion", "confint", "SE",
    "k", "seed", "method", "criterion name"),
  ...
)

## S3 method for class 'cvList'
cvInfo(
  object,
  what = c("CV criterion", "adjusted CV criterion", "full CV criterion", "confint", "SE",
    "k", "seed", "method", "criterion name"),
  ...
)

## S3 method for class 'cv'
as.data.frame(
  x,
  row.names = NULL,
  optional = TRUE,
  rows = c("cv", "folds"),
  columns = c("criteria", "coefficients"),
  ...
)

## S3 method for class 'cvList'
as.data.frame(x, row.names = NULL, optional = TRUE, ...)

## S3 method for class 'cvDataFrame'
print(x, digits = getOption("digits") - 2L, ...)

## S3 method for class 'cvDataFrame'
summary(
  object,
  formula,
  subset = NULL,
  fun = mean,
  include = c("cv", "folds", "all"),
  ...
)

Arguments

`model`	a regression model object (see Details).
`data`	data frame to which the model was fit (not usually necessary).
`criterion`	cross-validation criterion ("cost" or lack-of-fit) function of form `f(y, yhat)` where `y` is the observed values of the response and `yhat` the predicted values; the default is `mse` (the mean-squared error).
`k`	perform k-fold cross-validation (default is `10`); `k` may be a number or `"loo"` or `"n"` for n-fold (leave-one-out) cross-validation.
`reps`	number of times to replicate k-fold CV (default is `1`).
`seed`	for R's random number generator; optional, if not supplied a random seed will be selected and saved; not needed for n-fold cross-validation.
`...`	to match generic; passed to `predict()` for the default `cv()` method; passed to the `Tapply()` function in the car package for `summary.cvDataFrame()`; passed to default `plot()` method for `plot.cvList()` or `plot.cv()`.
`criterion.name`	a character string giving the name of the CV criterion function in the returned `"cv"` object (not usually needed).
`details`	if `TRUE` (the default if the number of folds `k <= 10`), save detailed information about the value of the CV criterion for the cases in each fold and the regression coefficients with that fold deleted.
`confint`	if `TRUE` (the default if the number of cases is 400 or greater), compute a confidence interval for the bias-corrected CV criterion, if the criterion is the average of casewise components; for `plot.cvList()`, whether to plot confidence intervals around the biased-adjusted CV criterion, defaulting to `TRUE` and applicable only if confidence intervals are included in the `"cv"` object.
`level`	confidence level (default `0.95`).
`ncores`	number of cores to use for parallel computations (default is `1`, i.e., computations aren't done in parallel).
`type`	for the default method, value to be passed to the `type` argument of `predict()`; the default is `type="response"`, which is appropriate, e.g., for a `"glm"` model and may be recognized or ignored by `predict()` methods for other model classes.
`start`	if `TRUE` (the default is `FALSE`), the `start` argument to `update()` is set to the vector of regression coefficients for the model fit to the full data, possibly making the CV updates faster, e.g., for a GLM.
`model.function`	a regression function, typically for a new `cv()` method that that calls `cv.default()` via `NextMethod()`, residing in a package that's not a declared dependency of the cv package, e.g., `nnet::multinom`. It's usually not necessary to specify `model.function` to make `cv.default()` work.
`method`	computational method to apply to a linear (i.e., `"lm"`) model or to a generalized linear (i.e., `"glm"`) model. See Details for an explanation of the available options.
`x`	a `"cv"`, `"cvList"`, or `"cvDataFrame"` object to be plotted or summarized.
`digits`	significant digits for printing, default taken from the `"digits"` option.
`object`	an object to summarize or a `"cv"`, `"cvModlist"`, or `"cvList"` object from which to extract information via `cvInfo()`.
`y`	to match the `plot()` generic function, ignored.
`what`	for `plot()` methods, what to plot: for the `"cv"` method, either `"CV criterion"` (the default), or `"coefficients"`; for the `"cvList"` method, either `"adjusted CV criterion"` (the default if present in the `"cv"` object) or `"CV object"`. For `cvInfo()`, the information to extract from a `"cv"`, `"cvModList"`, or `"cvList"` object, one of: `"CV criterion"`, `"adjusted CV criterion"`, `"full CV criterion"` (the CV criterion applied to the model fit to the full data set), `"SE"` (the standard error of the adjusted CV criterion), `"confint"` (confidence interval for the adjusted CV criterion), `"k"`, (the number of folds), `"seed"` (the seed employed for R's random-number generator), `"method"` (the computational method employed, e.g., for a `"lm"` model object), or `"criterion name"` (the CV criterion employed); not all of these elements may be present, in which case `cvInfo()` would return `NULL`. Partial matching is supported, so, e.g., `cvInfo(cv-object, "adjusted")` is equivalent to `cvInfo(cv-object, "adjusted CV criterion")`
`row.names`	optional row names for the result, defaults to `NULL`.
`optional`	to match the `as.data.frame()` generic function; if `FALSE` (the default is `TRUE`), then the names of the columns of the returned data frame, including the names of coefficients, are coerced to syntactically correct names.
`rows`	the rows of the resulting data frame to retain: setting `rows="cv"` retains rows pertaining to the overall CV result (marked as "`fold 0`" ); setting `rows="folds"` retains rows pertaining to individual folds 1 through k; the default is `rows = c("cv", "folds")`, which retains all rows.
`columns`	the columns of the resulting data frame to retain: setting `columns="critera"` retains columns pertaining to CV criteria; setting `columns="coefficients"` retains columns pertaining to model coefficients (broadly construed); the default is `columns = c("criteria", "coefficients")`, which retains both; and the columns `"model"`, `"rep"`, and `"fold"`, if present, are always retained.
`formula`	of the form `some.criterion ~ classifying.variable(s)` (see examples).
`subset`	a subsetting expression; the default (`NULL`) is not to subset the `"cvDataFrame"` object.
`fun`	summary function to apply, defaulting to `mean`.
`include`	which rows of the `"cvDataFrame"` to include in the summary. One of `"cv"` (the default), rows representing the overall CV results; `"folds"`, rows for individual folds; `"all"`, all rows (generally not sensible).

Details

The default cv() method uses update() to refit the model to each fold, and should work if there are appropriate update() and predict() methods, and if the default method for GetResponse() works or if a GetResponse() method is supplied.

The "lm" and "glm" methods can use much faster computational algorithms, as selected by the method argument. The linear-model method accommodates weighted linear models.

For both classes of models, for the leave-one-out (n-fold) case, fitted values for the folds can be computed from the hat-values via method="hatvalues" without refitting the model; for GLMs, this method is approximate, for LMs it is exact.

Again for both classes of models, when more than one case is omitted in each fold, fitted values may be obtained without refitting the model by exploiting the Woodbury matrix identity via method="Woodbury". As for hatvalues, this method is exact for LMs and approximate for GLMs.

The default for linear models is method="auto", which is equivalent to method="hatvalues" for n-fold cross-validation and method="Woodbury" otherwise; method="naive" refits the model via update() and is generally much slower. The default for generalized linear models is method="exact", which employs update(). This default is conservative, and it is usually safe to use method="hatvalues" for n-fold CV or method="Woodbury" for k-fold CV.

There is also a method for robust linear models fit by rlm() in the MASS package (to avoid inheriting the "lm" method for which the default "auto" computational method would be inappropriate).

For additional details, see the "Cross-validating regression models" vignette (vignette("cv", package="cv")).

cv() is designed to be extensible to other classes of regression models; see the "Extending the cv package" vignette (vignette("cv-extend", package="cv")).

Value

The cv() methods return an object of class "cv", with the CV criterion ("CV crit"), the bias-adjusted CV criterion ("adj CV crit"), the criterion for the model applied to the full data ("full crit"), the confidence interval and level for the bias-adjusted CV criterion ("confint"), the number of folds ("k"), and the seed for R's random-number generator ("seed"). If details=TRUE, then the returned object will also include a "details" component, which is a list of two elements: "criterion", containing the CV criterion computed for the cases in each fold; and "coefficients", regression coefficients computed for the model with each fold deleted. Some methods may return a subset of these components and may add additional information. If reps > 1, then an object of class "cvList" is returned, which is literally a list of "cv" objects.

Methods (by class)

cv(default): "default" method.
cv(lm): "lm" method.
cv(glm): "glm" method.
cv(rlm): "rlm" method (to avoid inheriting the "lm" method).

Methods (by generic)

print(cv): print() method for "cv" objects.
summary(cv): summary() method for "cv" objects.
plot(cv): plot() method for "cv" objects.
as.data.frame(cv): as.data.frame() method for "cv" objects.

Functions

print(cvList): print() method for "cvList" objects.
summary(cvList): summary() method for "cvList" objects.
plot(cvList): plot() method for "cvList" objects.
cvInfo(): extract information from a "cv" object.
as.data.frame(cvList): as.data.frame() method for "cvList" objects.
print(cvDataFrame): print() method for "cvDataFrame" objects.
summary(cvDataFrame): summary() method for "cvDataFrame" objects.

Examples

if (requireNamespace("ISLR2", quietly=TRUE)){
withAutoprint({
data("Auto", package="ISLR2")
m.auto <- lm(mpg ~ horsepower, data=Auto)
cv(m.auto,  k="loo")
summary(cv(m.auto,  k="loo"))
summary(cv.auto <- cv(m.auto, seed=1234))
compareFolds(cv.auto)
plot(cv.auto)
plot(cv.auto, what="coefficients")
summary(cv.auto.reps <- cv(m.auto, seed=1234, reps=3))
cvInfo(cv.auto.reps, what="adjusted CV criterion")
plot(cv.auto.reps)
plot(cv(m.auto, seed=1234, reps=10, confint=TRUE))
D.auto.reps <- as.data.frame(cv.auto.reps)
head(D.auto.reps)
summary(D.auto.reps, mse ~ rep + fold, include="folds")
summary(D.auto.reps, mse ~ rep + fold, include = "folds",
        subset = fold <= 5) # first 5 folds
summary(D.auto.reps, mse ~ rep, include="folds")
summary(D.auto.reps, mse ~ rep, fun=sd, include="folds")
})
} else {
cat("\n install 'ISLR2' package to run these examples\n")
}

if (requireNamespace("carData", quietly=TRUE)){
withAutoprint({
data("Mroz", package="carData")
m.mroz <- glm(lfp ~ ., data=Mroz, family=binomial)
summary(cv.mroz <- cv(m.mroz, criterion=BayesRule, seed=123))
cvInfo(cv.mroz)
cvInfo(cv.mroz, "adjusted")
cvInfo(cv.mroz, "confint")

data("Duncan", package="carData")
m.lm <- lm(prestige ~ income + education, data=Duncan)
m.rlm <- MASS::rlm(prestige ~ income + education,
                   data=Duncan)
summary(cv(m.lm, k="loo", method="Woodbury"))
summary(cv(m.rlm, k="loo"))
})
} else {
cat("\n install 'carData' package to run these examples\n")
}

if (requireNamespace("ISLR2", quietly=TRUE)){
withAutoprint({
data("Auto", package="ISLR2")
m.auto <- lm(mpg ~ horsepower, data=Auto)
cv(m.auto,  k="loo")
summary(cv(m.auto,  k="loo"))
summary(cv.auto <- cv(m.auto, seed=1234))
compareFolds(cv.auto)
plot(cv.auto)
plot(cv.auto, what="coefficients")
summary(cv.auto.reps <- cv(m.auto, seed=1234, reps=3))
cvInfo(cv.auto.reps, what="adjusted CV criterion")
plot(cv.auto.reps)
plot(cv(m.auto, seed=1234, reps=10, confint=TRUE))
D.auto.reps <- as.data.frame(cv.auto.reps)
head(D.auto.reps)
summary(D.auto.reps, mse ~ rep + fold, include="folds")
summary(D.auto.reps, mse ~ rep + fold, include = "folds",
        subset = fold <= 5) # first 5 folds
summary(D.auto.reps, mse ~ rep, include="folds")
summary(D.auto.reps, mse ~ rep, fun=sd, include="folds")
})
} else {
cat("\n install 'ISLR2' package to run these examples\n")
}

if (requireNamespace("carData", quietly=TRUE)){
withAutoprint({
data("Mroz", package="carData")
m.mroz <- glm(lfp ~ ., data=Mroz, family=binomial)
summary(cv.mroz <- cv(m.mroz, criterion=BayesRule, seed=123))
cvInfo(cv.mroz)
cvInfo(cv.mroz, "adjusted")
cvInfo(cv.mroz, "confint")

data("Duncan", package="carData")
m.lm <- lm(prestige ~ income + education, data=Duncan)
m.rlm <- MASS::rlm(prestige ~ income + education,
                   data=Duncan)
summary(cv(m.lm, k="loo", method="Woodbury"))
summary(cv(m.rlm, k="loo"))
})
} else {
cat("\n install 'carData' package to run these examples\n")
}

Cross-Validate a Model-Selection Procedure

Description

The cv() "function" method is a general function to cross-validate a model-selection procedure, such as the following: selectStepAIC() is a procedure that applies the stepAIC() model-selection function in the MASS package; selectTrans() is a procedure for selecting predictor and response transformations in regression, which uses the powerTransform() function in the car package; selectTransAndStepAIC() combines predictor and response transformations with predictor selection; and selectModelList() uses cross-validation to select a model from a list of models created by models() and employs (meta) cross-validation to assess the predictive accuracy of this procedure.

Usage

## S3 method for class ''function''
cv(
  model,
  data,
  criterion = mse,
  k = 10L,
  reps = 1L,
  seed = NULL,
  working.model = NULL,
  y.expression = NULL,
  confint = n >= 400L,
  level = 0.95,
  details = k <= 10L,
  save.model = FALSE,
  ncores = 1L,
  ...
)

selectStepAIC(
  data,
  indices,
  model,
  criterion = mse,
  AIC = TRUE,
  details = TRUE,
  save.model = FALSE,
  ...
)

selectTrans(
  data,
  indices,
  details = TRUE,
  save.model = FALSE,
  model,
  criterion = mse,
  predictors,
  response,
  family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
  family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
  rounded = TRUE,
  ...
)

selectTransStepAIC(
  data,
  indices,
  details = TRUE,
  save.model = FALSE,
  model,
  criterion = mse,
  predictors,
  response,
  family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
  family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
  rounded = TRUE,
  AIC = TRUE,
  ...
)

selectModelList(
  data,
  indices,
  model,
  criterion = mse,
  k = 10L,
  k.meta = k,
  details = k <= 10L,
  save.model = FALSE,
  seed = FALSE,
  quietly = TRUE,
  ...
)

compareFolds(object, digits = 3, ...)

## S3 method for class 'cvSelect'
coef(object, average, NAs = 0, ...)

## S3 method for class 'cvSelect'
cvInfo(
  object,
  what = c("CV criterion", "adjusted CV criterion", "full CV criterion", "confint", "SE",
    "k", "seed", "method", "criterion name", "selected model"),
  ...
)
## S3 method for class ''function''
cv(
  model,
  data,
  criterion = mse,
  k = 10L,
  reps = 1L,
  seed = NULL,
  working.model = NULL,
  y.expression = NULL,
  confint = n >= 400L,
  level = 0.95,
  details = k <= 10L,
  save.model = FALSE,
  ncores = 1L,
  ...
)

selectStepAIC(
  data,
  indices,
  model,
  criterion = mse,
  AIC = TRUE,
  details = TRUE,
  save.model = FALSE,
  ...
)

selectTrans(
  data,
  indices,
  details = TRUE,
  save.model = FALSE,
  model,
  criterion = mse,
  predictors,
  response,
  family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
  family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
  rounded = TRUE,
  ...
)

selectTransStepAIC(
  data,
  indices,
  details = TRUE,
  save.model = FALSE,
  model,
  criterion = mse,
  predictors,
  response,
  family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
  family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
  rounded = TRUE,
  AIC = TRUE,
  ...
)

selectModelList(
  data,
  indices,
  model,
  criterion = mse,
  k = 10L,
  k.meta = k,
  details = k <= 10L,
  save.model = FALSE,
  seed = FALSE,
  quietly = TRUE,
  ...
)

compareFolds(object, digits = 3, ...)

## S3 method for class 'cvSelect'
coef(object, average, NAs = 0, ...)

## S3 method for class 'cvSelect'
cvInfo(
  object,
  what = c("CV criterion", "adjusted CV criterion", "full CV criterion", "confint", "SE",
    "k", "seed", "method", "criterion name", "selected model"),
  ...
)

Arguments

`model`	a regression model object fit to data, or for the `cv()` `"function"` method, a model-selection procedure function (see Details).
`data`	full data frame for model selection.
`criterion`	a CV criterion ("cost" or lack-of-fit) function.
`k`	perform k-fold cross-validation (default is 10); `k` may be a number or `"loo"` or `"n"` for n-fold (leave-one-out) cross-validation.
`reps`	number of times to replicate k-fold CV (default is `1`)
`seed`	for R's random number generator; not used for n-fold cross-validation. If not explicitly set, a seed is randomly generated and saved to make the results reproducible. In some cases, for internal use only, `seed` is set to `FALSE` to suppress automatically setting the seed.
`working.model`	a regression model object fit to data, typically to begin a model-selection process; for use with `selectModelList()`, a list of competing models created by `models()`.
`y.expression`	normally the response variable is found from the `model` or `working.model` argument; but if, for a particular selection procedure, the `model` or `working.model` argument is absent, or if the response can't be inferred from the model, the response can be specified by an expression, such as `expression(log(income))`, to be evaluated within the data set provided by the `data` argument.
`confint`	if `TRUE` (the default if the number of cases is 400 or greater), compute a confidence interval for the bias-corrected CV criterion, if the criterion is the average of casewise components.
`level`	confidence level (default `0.95`).
`details`	if `TRUE`, save detailed information about the value of the CV criterion for the cases in each fold and the regression coefficients (and possibly other information) with that fold deleted; default is `TRUE` if `k` is 10 or smaller, `FALSE` otherwise.
`save.model`	save the model that's selected using the full data set (default, `FALSE`).
`ncores`	number of cores to use for parallel computations (default is `1`, i.e., computations aren't done in parallel)
`...`	for `cvSelect()` and the `cv()` `"function"` method, arguments to be passed to `procedure()`; for `selectStepAIC()` and `selectTransStepAIC()`, arguments to be passed to `stepAIC()`.
`indices`	indices of cases in data defining the current fold.
`AIC`	if `TRUE` (the default) use the AIC as the model-selection criterion; if `FALSE`, use the BIC. The `k` argument to `stepAIC()` is set accordingly (note that this is distinct from the number of folds `k`).
`predictors`	character vector of names of the predictors in the model to transform; if missing, no predictors will be transformed.
`response`	name of the response variable; if missing, the response won't be transformed.
`family`	transformation family for the predictors, one of `"bcPower", "bcnPower", "yjPower", "basicPower"`, with `"bcPower"` as the default. These are the names of transformation functions in the car package; see `bcPower()`.
`family.y`	transformation family for the response, with `"bcPower"` as the default.
`rounded`	if `TRUE` (the default) use nicely rounded versions of the estimated transformation parameters (see `bcPower()`).
`k.meta`	the number of folds for meta CV; defaults to the value of `k`; may be specified as `"loo"` or `"n"` as well as an integer.
`quietly`	if `TRUE` (the default), simple messages (for example about the value to which the random-number generator seed is set), but not warnings or errors, are suppressed.
`object`	an object of class `"cvSelect"`.
`digits`	significant digits for printing coefficients (default `3`).
`average`	if supplied, a function, such as `mean` or `median`, to use us in averaging estimates across folds; if missing, the estimates for each fold are returned.
`NAs`	values to substitute for `NA`s in calculating averaged estimates; the default, `0`, is appropriate, e.g., for regression coefficients; the value `1` might be appropriate for power-transformation estimates.
`what`	the information to extract from a `"cvSelect"` object, one of: `"CV criterion"`, `"adjusted CV criterion"`, `"full CV criterion"` (the CV criterion applied to the model fit to the full data set), `"SE"` (the standard error of the adjusted CV criterion), `"confint"` (confidence interval for the adjusted CV criterion), `"k"`, (the number of folds), `"seed"` (the seed employed for R's random-number generator), `"method"` (the computational method employed, e.g., for a `"lm"` model object), `"criterion name"` (the CV criterion employed), or `"selected model"` (the model object for the model that was selected); not all of these elements may be present, in which case `cvInfo()` would return `NULL`.

Details

The model-selection function supplied as the procedure (for cvSelect()) or model (for cv()) argument should accept the following arguments:

data: set to the data argument to cvSelect() or cv().
indices: the indices of the rows of data defining the current fold; if missing, the model-selection procedure is applied to the full data.
other arguments: to be passed via ... from cvSelect() or cv().

procedure() or model() should return a list with the following named elements: fit.i, the vector of predicted values for the cases in the current fold computed from the model omitting these cases; crit.all.i, the CV criterion computed for all of the cases using the model omitting the current fold; and (optionally) coefficients, parameter estimates from the model computed omitting the current fold.

When the indices argument is missing, procedure() returns the cross-validation criterion for all of the cases based on the model fit to all of the cases.

For examples of model-selection functions for the procedure argument, see the code for selectStepAIC(), selectTrans(), and selectTransAndStepAIC().

For additional information, see the "Cross-validating model selection" vignette (vignette("cv-select", package="cv")) and the "Extending the cv package" vignette (vignette("cv-extend", package="cv")).

Value

An object of class "cvSelect", inheriting from class "cv", with the CV criterion ("CV crit"), the bias-adjusted CV criterion ("adj CV crit"), the criterion for the model applied to the full data ("full crit"), the confidence interval and level for the bias-adjusted CV criterion ("confint"), the number of folds ("k"), the seed for R's random-number generator ("seed"), and (optionally) a list of coefficients (or, in the case of selectTrans(), estimated transformation parameters, and in the case of selectTransAndStepAIC(), both regression coefficients and transformation parameters) for the selected models for each fold ("coefficients"). If reps > 1, then an object of class c("cvSelectList", "cvList") is returned, which is literally a list of c("cvSelect", "cv") objects.

Functions

cv(`function`): cv() method for applying a model model-selection (or specification) procedure.
selectStepAIC(): select a regression model using the stepAIC() function in the MASS package.
selectTrans(): select transformations of the predictors and response using powerTransform() in the car package.
selectTransStepAIC(): select transformations of the predictors and response, and then select predictors.
selectModelList(): select a model using (meta) CV.
compareFolds(): print the coefficients from the selected models for the several folds.
coef(cvSelect): extract the coefficients from the selected models for the several folds and possibly average them.

Examples

if (requireNamespace("ISLR2", quietly=TRUE)){
withAutoprint({
data("Auto", package="ISLR2")
m.auto <- lm(mpg ~ . - name - origin, data=Auto)
cv(selectStepAIC, Auto, seed=123, working.model=m.auto)
cv(selectStepAIC, Auto, seed=123, working.model=m.auto,
         AIC=FALSE, k=5, reps=3) # via BIC
})
} else {
cat("\n install the 'ISLR2' package to run these examples\n")
}
if (requireNamespace("carData", quietly=TRUE)){
withAutoprint({
data("Prestige", package="carData")
m.pres <- lm(prestige ~ income + education + women,
             data=Prestige)
cvt <- cv(selectTrans, data=Prestige, working.model=m.pres, seed=123,
          predictors=c("income", "education", "women"),
          response="prestige", family="yjPower")
cvt
compareFolds(cvt)
coef(cvt, average=median, NAs=1) # NAs not really needed here
cv(m.pres, seed=123)
})
} else {
cat("install the 'carData' package to run these examples\n")
}
if (requireNamespace("ISLR2", quietly=TRUE)){
withAutoprint({
Auto$year <- as.factor(Auto$year)
Auto$origin <- factor(Auto$origin,
                      labels=c("America", "Europe", "Japan"))
rownames(Auto) <- make.names(Auto$name, unique=TRUE)
Auto$name <- NULL
m.auto <- lm(mpg ~ . , data=Auto)
cvs <- cv(selectTransStepAIC, data=Auto, seed=76692, working.model=m.auto,
          criterion=medAbsErr,
          predictors=c("cylinders", "displacement", "horsepower",
                       "weight", "acceleration"),
          response="mpg", AIC=FALSE)
cvs
compareFolds(cvs)
})
}
data("Duncan", package="carData")
m1 <- lm(prestige ~ income + education, data=Duncan)
m2 <- lm(prestige ~ income + education + type, data=Duncan)
m3 <- lm(prestige ~ (income + education)*type, data=Duncan)
summary(cv.sel <- cv(selectModelList, data=Duncan, seed=5963,
                     working.model=models(m1, m2, m3),
                     save.model=TRUE)) # meta CV
cvInfo(cv.sel, "selected model")

if (requireNamespace("ISLR2", quietly=TRUE)){
withAutoprint({
data("Auto", package="ISLR2")
m.auto <- lm(mpg ~ . - name - origin, data=Auto)
cv(selectStepAIC, Auto, seed=123, working.model=m.auto)
cv(selectStepAIC, Auto, seed=123, working.model=m.auto,
         AIC=FALSE, k=5, reps=3) # via BIC
})
} else {
cat("\n install the 'ISLR2' package to run these examples\n")
}
if (requireNamespace("carData", quietly=TRUE)){
withAutoprint({
data("Prestige", package="carData")
m.pres <- lm(prestige ~ income + education + women,
             data=Prestige)
cvt <- cv(selectTrans, data=Prestige, working.model=m.pres, seed=123,
          predictors=c("income", "education", "women"),
          response="prestige", family="yjPower")
cvt
compareFolds(cvt)
coef(cvt, average=median, NAs=1) # NAs not really needed here
cv(m.pres, seed=123)
})
} else {
cat("install the 'carData' package to run these examples\n")
}
if (requireNamespace("ISLR2", quietly=TRUE)){
withAutoprint({
Auto$year <- as.factor(Auto$year)
Auto$origin <- factor(Auto$origin,
                      labels=c("America", "Europe", "Japan"))
rownames(Auto) <- make.names(Auto$name, unique=TRUE)
Auto$name <- NULL
m.auto <- lm(mpg ~ . , data=Auto)
cvs <- cv(selectTransStepAIC, data=Auto, seed=76692, working.model=m.auto,
          criterion=medAbsErr,
          predictors=c("cylinders", "displacement", "horsepower",
                       "weight", "acceleration"),
          response="mpg", AIC=FALSE)
cvs
compareFolds(cvs)
})
}
data("Duncan", package="carData")
m1 <- lm(prestige ~ income + education, data=Duncan)
m2 <- lm(prestige ~ income + education + type, data=Duncan)
m3 <- lm(prestige ~ (income + education)*type, data=Duncan)
summary(cv.sel <- cv(selectModelList, data=Duncan, seed=5963,
                     working.model=models(m1, m2, m3),
                     save.model=TRUE)) # meta CV
cvInfo(cv.sel, "selected model")

Cross-Validate Mixed-Effects Model

Description

cv() methods for mixed-effect models of class "merMod", fit by the lmer() and glmer() functions in the lme4 package; for models of class "lme" fit by the lme() function in the nlme package; and for models of class "glmmTMB" fit by the glmmTMB() function in the glmmTMB package.

Usage

## S3 method for class 'merMod'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = NULL,
  reps = 1L,
  seed,
  details = NULL,
  ncores = 1L,
  clusterVariables,
  ...
)

## S3 method for class 'lme'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = NULL,
  reps = 1L,
  seed,
  details = NULL,
  ncores = 1L,
  clusterVariables,
  ...
)

## S3 method for class 'glmmTMB'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = NULL,
  reps = 1L,
  seed,
  details = NULL,
  ncores = 1L,
  clusterVariables,
  ...
)
## S3 method for class 'merMod'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = NULL,
  reps = 1L,
  seed,
  details = NULL,
  ncores = 1L,
  clusterVariables,
  ...
)

## S3 method for class 'lme'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = NULL,
  reps = 1L,
  seed,
  details = NULL,
  ncores = 1L,
  clusterVariables,
  ...
)

## S3 method for class 'glmmTMB'
cv(
  model,
  data = insight::get_data(model),
  criterion = mse,
  k = NULL,
  reps = 1L,
  seed,
  details = NULL,
  ncores = 1L,
  clusterVariables,
  ...
)

Arguments

`model`	a mixed-effects model object for which a `cv()` method is available.
`data`	data frame to which the model was fit (not usually necessary).
`criterion`	cross-validation ("cost" or lack-of-fit) criterion function of form `f(y, yhat)` where `y` is the observed values of the response and `yhat` the predicted values; the default is `mse` (the mean-squared error).
`k`	perform k-fold cross-validation; `k` may be a number or `"loo"` or `"n"` for n-fold (leave-one-out) cross-validation; the default is `10` if cross-validating individual cases and `"loo"` if cross-validating clusters.
`reps`	number of times to replicate k-fold CV (default is `1`), or greater), compute a confidence interval for the bias-corrected CV criterion, if the criterion is the average of casewise components.
`seed`	for R's random number generator; optional, if not supplied a random seed will be selected and saved; not needed for n-fold cross-validation.
`details`	if `TRUE` (the default if the number of folds `k <= 10`), save detailed information about the value of the CV criterion for the cases in each fold and the regression coefficients with that fold deleted.
`ncores`	number of cores to use for parallel computations (default is `1`, i.e., computations aren't done in parallel).
`clusterVariables`	a character vector of names of the variables defining clusters for a mixed model with nested or crossed random effects; if missing, cross-validation is performed for individual cases rather than for clusters.
`...`	for `cv()` methods, to match generic, and for `cvMixed()`, arguments to be passed to `update()`.

Details

For mixed-effects models, cross-validation can be done by "clusters" or by individual observations. If the former, predictions are based only on fixed effects; if the latter, predictions include the random effects (i.e., are the best linear unbiased predictors or "BLUPS").

Value

The methods cv.merMod(), cv.lme(), and cv.glmmTMB(), return objects of class "cv", or, if reps > 1, of class "cvList" (see cv()).

Functions

cv(merMod): cv() method for lmer() and glmer() models from the lme4 package.
cv(lme): cv() method for lme() models from the nlme package.
cv(glmmTMB): cv() method for glmmTMB() models from the glmmTMB package.

Examples

library("lme4")
# from ?lmer:
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy))
summary(cv(fm1, clusterVariables="Subject")) # LOO CV of clusters
summary(cv(fm1, seed=447)) # 10-fold CV of cases
summary(cv(fm1, clusterVariables="Subject", k=5,
   seed=834, reps=3)) # 5-fold CV of clusters, repeated 3 times

library("nlme")
# from ?lme
(fm2 <- lme(distance ~ age + Sex, data = Orthodont,
            random = ~ 1))
summary(cv(fm2)) # LOO CV of cases
summary(cv(fm2, clusterVariables="Subject",
        k=5, seed=321)) # 5-fold CV of clusters

library("glmmTMB")
# from ?glmmTMB
(m1 <- glmmTMB(count ~ mined + (1|site),
               zi=~mined,
               family=poisson, data=Salamanders))
summary(cv(m1, seed=97816, k=5,
          clusterVariables="site")) # 5-fold CV of clusters
summary(cv(m1, seed=34506, k=5)) # 5-fold CV of cases
library("lme4")
# from ?lmer:
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy))
summary(cv(fm1, clusterVariables="Subject")) # LOO CV of clusters
summary(cv(fm1, seed=447)) # 10-fold CV of cases
summary(cv(fm1, clusterVariables="Subject", k=5,
   seed=834, reps=3)) # 5-fold CV of clusters, repeated 3 times

library("nlme")
# from ?lme
(fm2 <- lme(distance ~ age + Sex, data = Orthodont,
            random = ~ 1))
summary(cv(fm2)) # LOO CV of cases
summary(cv(fm2, clusterVariables="Subject",
        k=5, seed=321)) # 5-fold CV of clusters

library("glmmTMB")
# from ?glmmTMB
(m1 <- glmmTMB(count ~ mined + (1|site),
               zi=~mined,
               family=poisson, data=Salamanders))
summary(cv(m1, seed=97816, k=5,
          clusterVariables="site")) # 5-fold CV of clusters
summary(cv(m1, seed=34506, k=5)) # 5-fold CV of cases

Cross-Validate Several Models Fit to the Same Data

Description

A cv() method for an object of class "modlist", created by the models() function. This cv() method simplifies the process of cross-validating several models on the same set of CV folds and may also be used for meta CV, where CV is used to select one from among several models. models() performs some "sanity" checks, warning if the models are of different classes, and reporting an error if they are fit to apparently different data sets or different response variables.

Usage

## S3 method for class 'modList'
cv(
  model,
  data,
  criterion = mse,
  k,
  reps = 1L,
  seed,
  quietly = TRUE,
  meta = FALSE,
  ...
)

models(...)

## S3 method for class 'cvModList'
print(x, ...)

## S3 method for class 'cvModList'
summary(object, ...)

## S3 method for class 'cvModList'
plot(
  x,
  y,
  spread = c("range", "sd"),
  confint = TRUE,
  xlab = "",
  ylab,
  main,
  axis.args = list(labels = names(x), las = 3L),
  col = palette()[2L],
  lwd = 2L,
  grid = TRUE,
  ...
)

## S3 method for class 'cvModList'
as.data.frame(x, row.names = NULL, optional = TRUE, ...)
## S3 method for class 'modList'
cv(
  model,
  data,
  criterion = mse,
  k,
  reps = 1L,
  seed,
  quietly = TRUE,
  meta = FALSE,
  ...
)

models(...)

## S3 method for class 'cvModList'
print(x, ...)

## S3 method for class 'cvModList'
summary(object, ...)

## S3 method for class 'cvModList'
plot(
  x,
  y,
  spread = c("range", "sd"),
  confint = TRUE,
  xlab = "",
  ylab,
  main,
  axis.args = list(labels = names(x), las = 3L),
  col = palette()[2L],
  lwd = 2L,
  grid = TRUE,
  ...
)

## S3 method for class 'cvModList'
as.data.frame(x, row.names = NULL, optional = TRUE, ...)

Arguments

`model`	a list of regression model objects, created by `models()`.
`data`	(required) the data set to which the models were fit.
`criterion`	the CV criterion ("cost" or lack-of-fit) function, defaults to `mse`.
`k`	the number of CV folds; may be omitted, in which case the value will depend on the default for the `cv()` method invoked for the individual models.
`reps`	number of replications of CV for each model (default is 1).
`seed`	(optional) seed for R's pseudo-random-number generator, to be used to create the same set of CV folds for all of the models; if omitted, a seed will be randomly generated and saved. Not used for leave-one-out CV.
`quietly`	if `TRUE` (the default), simple messages (for example about the value to which the random-number generator seed is set), but not warnings or errors, are suppressed.
`meta`	if `TRUE` (the default is `FALSE`), cross-validation is performed recursively to select a "best" model deleting each fold in turn by calculating the CV estimate of the criterion for the remaining folds; this is equivalent to employing the `selectModelList()` model-selection procedure.
`...`	for `cv.modList()`, additional arguments to be passed to the `cv()` method applied to each model. For `models()`, two or more competing models fit to the the same data; the several models may be named. It is also possible to specify a single argument, which should then be list of models (which has the effect of turning a list of models into a `"modList"` object). For the `print()` method, arguments to be passed to the `print()` method for the individual model cross-validations. For the `plot()` method, arguments to be passed to the base `plot()` function.
`x`	an object of class `"cvModList"` to be printed or plotted.
`object`	an object to summarize.
`y`	the name of the element in each `"cv"` object to be plotted; defaults to `"adj CV crit"`, if it exists, or to `"CV crit"`.
`spread`	if `"range"`, the default, show the range of CV criteria for each model along with their average; if `"sd"`, show the average plus or minus 1 standard deviation.
`confint`	if `TRUE` (the default) and if confidence intervals are in any of the `"cv"` objects, then plot the confidence intervals around the CV criteria.
`xlab`	label for the x-axis (defaults to blank).
`ylab`	label for the y-axis (if missing, a label is constructed).
`main`	main title for the graph (if missing, a label is constructed).
`axis.args`	a list of arguments for the `axis()` function, used to draw the horizontal axis. In addition to the axis arguments given explicitly, `side=1` (the horizontal axis) and `at=seq(along=x)` (i.e., 1 to the number of models) are used and can't be modified.
`col`	color for the line and points, defaults to the second element of the color palette; see `palette()`.
`lwd`	line width for the line (defaults to 2).
`grid`	if `TRUE` (the default), include grid lines on the graph.
`row.names`	optional row names for the result, defaults to `NULL`.
`optional`	to match the `as.data.frame()` generic function; if `FALSE` (the default is `TRUE`), then the names of the columns of the returned data frame, including the names of coefficients, are coerced to syntactically correct names.

Value

models() returns a "modList" object, the cv() method for which returns a "cvModList" object, or, when meta=TRUE, an object of class c("cvSelect", "cv").

Functions

cv(modList): cv() method for "modList" objects.
models(): create a list of models.
print(cvModList): print() method for "cvModList" objects.
summary(cvModList): summary() method for "cvModList" objects.
plot(cvModList): plot() method for "cvModList" objects.
as.data.frame(cvModList): as.data.frame() method for "cvModList" objects.

Examples

if (requireNamespace("carData", quietly=TRUE)){
withAutoprint({
data("Duncan", package="carData")
m1 <- lm(prestige ~ income + education, data=Duncan)
m2 <- lm(prestige ~ income + education + type, data=Duncan)
m3 <- lm(prestige ~ (income + education)*type, data=Duncan)
(cv.models <- cv(models(m1=m1, m2=m2, m3=m3),
                 data=Duncan, seed=7949, reps=5))
D.cv.models <- as.data.frame(cv.models)
head(D.cv.models)
summary(D.cv.models, criterion ~ model + rep, include="folds")
plot(cv.models)
(cv.models.ci <- cv(models(m1=m1, m2=m2, m3=m3),
                    data=Duncan, seed=5963, confint=TRUE, level=0.50))
                 # nb: n too small for accurate CIs
plot(cv.models.ci)
(cv.models.meta <- cv(models(m1=m1, m2=m2, m3=m3),
                      data=Duncan, seed=5963,
                      meta=TRUE, save.model=TRUE))
cvInfo(cv.models.meta, "selected model")
})
} else {
cat("install the 'carData' package to run these examples\n")
}
if (requireNamespace("carData", quietly=TRUE)){
withAutoprint({
data("Duncan", package="carData")
m1 <- lm(prestige ~ income + education, data=Duncan)
m2 <- lm(prestige ~ income + education + type, data=Duncan)
m3 <- lm(prestige ~ (income + education)*type, data=Duncan)
(cv.models <- cv(models(m1=m1, m2=m2, m3=m3),
                 data=Duncan, seed=7949, reps=5))
D.cv.models <- as.data.frame(cv.models)
head(D.cv.models)
summary(D.cv.models, criterion ~ model + rep, include="folds")
plot(cv.models)
(cv.models.ci <- cv(models(m1=m1, m2=m2, m3=m3),
                    data=Duncan, seed=5963, confint=TRUE, level=0.50))
                 # nb: n too small for accurate CIs
plot(cv.models.ci)
(cv.models.meta <- cv(models(m1=m1, m2=m2, m3=m3),
                      data=Duncan, seed=5963,
                      meta=TRUE, save.model=TRUE))
cvInfo(cv.models.meta, "selected model")
})
} else {
cat("install the 'carData' package to run these examples\n")
}

Utility Functions for the cv Package

Description

These functions are primarily useful for writing methods for the cv() generic function. They are used internally in the package and can also be used for extensions (see the vignette "Extending the cv package, vignette("cv-extend", package="cv")).

Usage

cvCompute(
  model,
  data = insight::get_data(model),
  criterion = mse,
  criterion.name,
  k = 10L,
  reps = 1L,
  seed,
  details = k <= 10L,
  confint,
  level = 0.95,
  method = NULL,
  ncores = 1L,
  type = "response",
  start = FALSE,
  f,
  fPara = f,
  locals = list(),
  model.function = NULL,
  model.function.name = NULL,
  ...
)

cvMixed(
  model,
  package,
  data = insight::get_data(model),
  criterion = mse,
  criterion.name,
  k,
  reps = 1L,
  confint,
  level = 0.95,
  seed,
  details,
  ncores = 1L,
  clusterVariables,
  predict.clusters.args = list(object = model, newdata = data),
  predict.cases.args = list(object = model, newdata = data),
  fixed.effects,
  ...
)

cvSelect(
  procedure,
  data,
  criterion = mse,
  criterion.name,
  model,
  y.expression,
  k = 10L,
  confint = n >= 400,
  level = 0.95,
  reps = 1L,
  save.coef,
  details = k <= 10L,
  save.model = FALSE,
  seed,
  ncores = 1L,
  ...
)

folds(n, k)

fold(folds, i, ...)

## S3 method for class 'folds'
fold(folds, i, ...)

## S3 method for class 'folds'
print(x, ...)

GetResponse(model, ...)

## Default S3 method:
GetResponse(model, ...)

## S3 method for class 'merMod'
GetResponse(model, ...)

## S3 method for class 'lme'
GetResponse(model, ...)

## S3 method for class 'glmmTMB'
GetResponse(model, ...)

## S3 method for class 'modList'
GetResponse(model, ...)
cvCompute(
  model,
  data = insight::get_data(model),
  criterion = mse,
  criterion.name,
  k = 10L,
  reps = 1L,
  seed,
  details = k <= 10L,
  confint,
  level = 0.95,
  method = NULL,
  ncores = 1L,
  type = "response",
  start = FALSE,
  f,
  fPara = f,
  locals = list(),
  model.function = NULL,
  model.function.name = NULL,
  ...
)

cvMixed(
  model,
  package,
  data = insight::get_data(model),
  criterion = mse,
  criterion.name,
  k,
  reps = 1L,
  confint,
  level = 0.95,
  seed,
  details,
  ncores = 1L,
  clusterVariables,
  predict.clusters.args = list(object = model, newdata = data),
  predict.cases.args = list(object = model, newdata = data),
  fixed.effects,
  ...
)

cvSelect(
  procedure,
  data,
  criterion = mse,
  criterion.name,
  model,
  y.expression,
  k = 10L,
  confint = n >= 400,
  level = 0.95,
  reps = 1L,
  save.coef,
  details = k <= 10L,
  save.model = FALSE,
  seed,
  ncores = 1L,
  ...
)

folds(n, k)

fold(folds, i, ...)

## S3 method for class 'folds'
fold(folds, i, ...)

## S3 method for class 'folds'
print(x, ...)

GetResponse(model, ...)

## Default S3 method:
GetResponse(model, ...)

## S3 method for class 'merMod'
GetResponse(model, ...)

## S3 method for class 'lme'
GetResponse(model, ...)

## S3 method for class 'glmmTMB'
GetResponse(model, ...)

## S3 method for class 'modList'
GetResponse(model, ...)

Arguments

`model`	a regression model object.
`data`	data frame to which the model was fit (not usually necessary, except for `cvSelect()`).
`criterion`	cross-validation criterion ("cost" or lack-of-fit) function of form `f(y, yhat)` where `y` is the observed values of the response and `yhat` the predicted values; the default is `mse` (the mean-squared error).
`criterion.name`	a character string giving the name of the CV criterion function in the returned `"cv"` object).
`k`	perform k-fold cross-validation (default is `10`); `k` may be a number or `"loo"` or `"n"` for n-fold (leave-one-out) cross-validation; for `folds()`, `k` must be a number.
`reps`	number of times to replicate k-fold CV (default is `1`).
`seed`	for R's random number generator; optional, if not supplied a random seed will be selected and saved; not needed for n-fold cross-validation.
`details`	if `TRUE` (the default if the number of folds `k <= 10`), save detailed information about the value of the CV criterion for the cases in each fold and the regression coefficients with that fold deleted.
`confint`	if `TRUE` (the default if the number of cases is 400 or greater), compute a confidence interval for the bias-corrected CV criterion, if the criterion is the average of casewise components.
`level`	confidence level (default `0.95`).
`method`	computational method to apply; use by some `cv()` methods.
`ncores`	number of cores to use for parallel computations (default is `1`, i.e., computations aren't done in parallel).
`type`	used by some `cv()` methods, such as the default method, where `type` is passed to the `type` argument of `predict()`; the default is `type="response"`, which is appropriate, e.g., for a `"glm"` model and may be recognized or ignored by `predict()` methods for other model classes.
`start`	used by some `cv()` methods; if `TRUE` (the default is `FALSE`), the `start` argument, set to the vector of regression coefficients for the model fit to the full data, is passed to `update()`, possibly making the CV updates faster, e.g. for a GLM.
`f`	function to be called by `cvCompute()` for each fold.
`fPara`	function to be called by `cvCompute()` for each fold using parallel computation.
`locals`	a named list of objects that are required in the local environment of `cvCompute()` for `f()` or `fPara()`.
`model.function`	a regression function, typically for a new `cv()` method, residing in a package that's not a declared dependency of the cv package, e.g., `nnet::multinom`.
`model.function.name`	the quoted name of the regression function, e.g., `"multinom"`.
`...`	to match generic; passed to `predict()` for the default method, and to `fPara()` (for parallel computations) in `cvCompute()`.
`package`	the name of the package in which mixed-modeling function (or functions) employed resides; used to get the namespace of the package.
`clusterVariables`	a character vector of names of the variables defining clusters for a mixed model with nested or crossed random effects; if missing, cross-validation is performed for individual cases rather than for clusters
`predict.clusters.args`	a list of arguments to be used to predict the whole data set from a mixed model when performing CV on clusters; the first two elements should be `model` and `newdata`; see the "Extending the cv package" vignette (`vignette("cv-extend", package="cv")`).
`predict.cases.args`	a list of arguments to be used to predict the whole data set from a mixed model when performing CV on cases; the first two elements should be `model` and `newdata`; see the "Extending the cv package" vignette (`vignette("cv-extend", package="cv")`).
`fixed.effects`	a function to be used to compute fixed-effect coefficients for cluster-based CV when `details = TRUE`.
`procedure`	a model-selection procedure function (see Details).
`y.expression`	normally the response variable is found from the `model` argument; but if, for a particular selection procedure, the `model` argument is absent, or if the response can't be inferred from the model, the response can be specified by an expression, such as `expression(log(income))`, to be evaluated within the data set provided by the `data` argument.
`save.coef`	save the coefficients from the selected models? Deprecated in favor of the `details` argument; if specified, `details` is set is set to the value of `save.coef`.
`save.model`	save the model that's selected using the full data set.
`n`	number of cases, for constructed folds.
`folds`	an object of class `"folds"`.
`i`	a fold number for an object of class `"folds"`.
`x`	a `"cv"`, `"cvList"`, or `"folds"` object to be printed

Value

The utility functions return various kinds of objects:

cvCompute() returns an object of class "cv", with the CV criterion ("CV crit"), the bias-adjusted CV criterion ("adj CV crit"), the criterion for the model applied to the full data ("full crit"), the confidence interval and level for the bias-adjusted CV criterion ("confint"), the number of folds ("k"), and the seed for R's random-number generator ("seed"). If details=TRUE, then the returned object will also include a "details" component, which is a list of two elements: "criterion", containing the CV criterion computed for the cases in each fold; and "coefficients", regression coefficients computed for the model with each fold deleted. Some cv() methods calling cvCompute() may return a subset of these components and may add additional information. If reps > 1, then an object of class "cvList" is returned, which is literally a list of "cv" objects.
cvMixed() also returns an object of class "cv" or "cvList".
cvSelect returns an object of class "cvSelect" inheriting from "cv", or an object of class "cvSelectList" inheriting from "cvList".
folds() returns an object of class folds, for which there are fold() and print() methods.
GetResponse() returns the (numeric) response variable from the model.

The supplied default method returns the model$y component of the model object, or, if model is an S4 object, the result returned by the get_response() function in the insight package. If this result is NULL, the result of model.response(model.frame(model)) is returned, checking in any case whether the result is a numeric vector.

There are also "lme", "merMod" and "glmmTMB" methods that convert factor responses to numeric 0/1 responses, as would be appropriate for a generalized linear mixed model with a binary response.

Functions

cvCompute(): used internally by cv() methods (not for direct use); exported to support new cv() methods.
cvMixed(): used internally by cv() methods for mixed-effect models (not for direct use); exported to support new cv() methods.
cvSelect(): used internally by cv() methods for cross-validating a model-selection procedure; may also be called directly for this purpose, but use via cv() is preferred. cvSelect() is exported primarily to support new model-selection procedures.
folds(): used internally by cv() methods (not for direct use).
fold(): to extract a fold from a "folds" object.
fold(folds): fold() method for "folds" objects.
print(folds): print() method for "folds" objects.
GetResponse(): function to return the response variable from a regression model.
GetResponse(default): default method.
GetResponse(merMod): "merMod" method.
GetResponse(lme): "lme" method.
GetResponse(glmmTMB): "glmmTMB" method.
GetResponse(modList): "modList" method.

Examples

fit <- lm(mpg ~ gear, mtcars)
GetResponse(fit)

set.seed(123)
(ffs <- folds(n=22, k=5))
fold(ffs, 2)

fit <- lm(mpg ~ gear, mtcars)
GetResponse(fit)

set.seed(123)
(ffs <- folds(n=22, k=5))
fold(ffs, 2)

Cost Functions for Fitted Regression Models

Description

Compute cost functions (cross-validation criteria) for fitted regression models.

Usage

mse(y, yhat)

rmse(y, yhat)

medAbsErr(y, yhat)

BayesRule(y, yhat)

BayesRule2(y, yhat)
mse(y, yhat)

rmse(y, yhat)

medAbsErr(y, yhat)

BayesRule(y, yhat)

BayesRule2(y, yhat)

Arguments

`y`	response
`yhat`	fitted value

Details

Cost functions (cross-validation criteria) are meant to measure lack-of-fit. Several cost functions are provided:

mse() returns the mean-squared error of prediction for a numeric response variable y and predictions yhat; and rmse() returns the root-mean-squared error and is just the square-root of mse().
medAbsErr() returns the median absolute error of prediction for a numeric response y and predictions yhat.
BayesRule() and BayesRule2() report the proportion of incorrect predictions for a dichotomous response variable y, assumed coded (or coercible to) 0 and 1. The yhat values are predicted probabilities and are rounded to 0 or 1. The distinction between BayesRule() and BayesRule2() is that the former checks that the y values are all either 0 or 1 and that the yhat values are all between 0 and 1, while the latter doesn't and is therefore faster.

Value

In general, cost functions should return a single numeric value measuring lack-of-fit. mse() returns the mean-squared error; rmse() returns the root-mean-squared error; medAbsErr() returns the median absolute error; and BayesRule() and BayesRule2() return the proportion of misclassified cases.

Functions

mse(): Mean-square error.
rmse(): Root-mean-square error.
medAbsErr(): Median absolute error.
BayesRule(): Bayes Rule for a binary response.
BayesRule2(): Bayes rule for a binary response (without bounds checking).

Examples

if (requireNamespace("carData", quietly=TRUE)){
withAutoprint({
data("Duncan", package="carData")
m.lm <- lm(prestige ~ income + education, data=Duncan)
mse(Duncan$prestige, fitted(m.lm))

data("Mroz", package="carData")
m.glm <- glm(lfp ~ ., data=Mroz, family=binomial)
BayesRule(Mroz$lfp == "yes", fitted(m.glm))
})
} else {
cat("\n install 'carData' package to run these examples\n")
}
if (requireNamespace("carData", quietly=TRUE)){
withAutoprint({
data("Duncan", package="carData")
m.lm <- lm(prestige ~ income + education, data=Duncan)
mse(Duncan$prestige, fitted(m.lm))

data("Mroz", package="carData")
m.glm <- glm(lfp ~ ., data=Mroz, family=binomial)
BayesRule(Mroz$lfp == "yes", fitted(m.glm))
})
} else {
cat("\n install 'carData' package to run these examples\n")
}

Body Weights of 48 Pigs in 9 Successive Weeks

Description

This data set appears in Table 3.1 of Diggle, Liang, and Zeger (1994).

Usage

data("Pigs", package = "cv")
data("Pigs", package = "cv")

Format

A data frame with 432 rows and 3 columns.

id: Pig id number, 1–48.
week: Week number, 1–9.
weight: Weight in kg.

Source

P. J. Diggle, K.-Y. Liang, and S. L. Zeger, Analysis of Longitudinal Data (Oxford, 1994).

Examples

library("lme4")
m.p <- lmer(weight ~ week + (1 | id) + (1 | week),
            data=Pigs, REML=FALSE,
            control=lmerControl(optimizer="bobyqa"))
summary(m.p)
cv(m.p, clusterVariables=c("id", "week"), k=10, seed=8469)
library("lme4")
m.p <- lmer(weight ~ week + (1 | id) + (1 | week),
            data=Pigs, REML=FALSE,
            control=lmerControl(optimizer="bobyqa"))
summary(m.p)
cv(m.p, clusterVariables=c("id", "week"), k=10, seed=8469)

Package 'cv'

Help Index

Cross-Validate Regression Models

Description

Usage

Arguments

Details

Value

Methods (by class)

Methods (by generic)

Functions

See Also

Examples

Cross-Validate a Model-Selection Procedure

Description

Usage

Arguments

Details

Value

Functions

See Also

Examples

Cross-Validate Mixed-Effects Model

Description

Usage

Arguments

Details

Value

Functions

See Also

Examples

Cross-Validate Several Models Fit to the Same Data

Description

Usage

Arguments

Value

Functions

See Also

Examples

Utility Functions for the cv Package

Description

Usage

Arguments

Value

Functions

See Also

Examples

Cost Functions for Fitted Regression Models

Description

Usage

Arguments

Details

Value

Functions

See Also

Examples

Body Weights of 48 Pigs in 9 Successive Weeks

Description

Usage

Format

Source

Examples