Title: | Stratified and Personalised Models Based on Model-Based Trees and Forests |
---|---|
Description: | Model-based trees for subgroup analyses in clinical trials and model-based forests for the estimation and prediction of personalised treatment effects (personalised models). Currently partitioning of linear models, lm(), generalised linear models, glm(), and Weibull models, survreg(), is supported. Advanced plotting functionality is supported for the trees and a test for parameter heterogeneity is provided for the personalised models. For details on model-based trees for subgroup analyses see Seibold, Zeileis and Hothorn (2016) <doi:10.1515/ijb-2015-0032>; for details on model-based forests for estimation of individual treatment effects see Seibold, Zeileis and Hothorn (2017) <doi:10.1177/0962280217693034>. |
Authors: | Heidi Seibold [aut, cre], Achim Zeileis [aut], Torsten Hothorn [aut] |
Maintainer: | Heidi Seibold <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.9-8 |
Built: | 2024-11-06 19:14:38 UTC |
Source: | https://github.com/r-forge/partykit |
For internal use.
.add_modelinfo(x, nodeids, data, model, coeffun)
.add_modelinfo(x, nodeids, data, model, coeffun)
x |
constparty object. |
nodeids |
node ids, usually the terminal ids. |
data |
data. |
model |
model. |
coeffun |
function that takes the model object and returns the coefficients. Useful when coef() does not return all coefficients (e.g. survreg). |
tree with added info. Class still to be added.
Use update function to refit model and extract info such as coef, logLik and estfun.
.modelfit(model, data, coeffun = coef, weights, control, parm = NULL)
.modelfit(model, data, coeffun = coef, weights, control, parm = NULL)
model |
model object. |
data |
data. |
coeffun |
function that takes the model object and returns the coefficients. Useful when coef() does not return all coefficients (e.g. survreg). |
weights |
weights. |
control |
control options from |
parm |
which parameters should be used for instability test? |
A function returning a list of
coefficients |
|
objfun |
|
object |
the model object. |
converged |
Did the model converge? |
estfun |
|
Prepare input for ctree/cforest from input of pmtree/pmforest
.prepare_args(model, data, zformula, control, ...)
.prepare_args(model, data, zformula, control, ...)
model |
model. |
data |
an optional data frame. |
zformula |
ormula describing which variable should be used for partitioning. |
control |
ontrol parameters, see |
... |
other arguments. |
args to be passed to ctree/cforest.
Can be used on its own but is also useable as plotfun in
node_pmterminal
.
binomial_glm_plot( mod, data = NULL, plot_data = FALSE, theme = theme_classic(), ... )
binomial_glm_plot( mod, data = NULL, plot_data = FALSE, theme = theme_classic(), ... )
mod |
A model of class glm with binomial family. |
data |
optional data frame. If NULL the data stored in mod is used. |
plot_data |
should the data in form of a mosaic type plot be plotted? |
theme |
A ggplot2 theme. |
... |
ignored at the moment. |
set.seed(2017) # number of observations n <- 1000 # balanced binary treatment # trt <- factor(rep(c("C", "A"), each = n/2), # levels = c("C", "A")) # unbalanced binary treatment trt <- factor(c(rep("C", n/4), rep("A", 3*n/4)), levels = c("C", "A")) # some continuous variables x1 <- rnorm(n) x2 <- rnorm(n) # linear predictor lp <- -0.5 + 0.5*I(trt == "A") + 1*I(trt == "A")*I(x1 > 0) # compute probability with inverse logit function invlogit <- function(x) 1/(1 + exp(-x)) pr <- invlogit(lp) # bernoulli response variable y <- rbinom(n, 1, pr) dat <- data.frame(y, trt, x1, x2) # logistic regression model mod <- glm(y ~ trt, data = dat, family = "binomial") binomial_glm_plot(mod, plot_data = TRUE) # logistic regression model tree ltr <- pmtree(mod) plot(ltr, terminal_panel = node_pmterminal(ltr, plotfun = binomial_glm_plot, confint = TRUE, plot_data = TRUE))
set.seed(2017) # number of observations n <- 1000 # balanced binary treatment # trt <- factor(rep(c("C", "A"), each = n/2), # levels = c("C", "A")) # unbalanced binary treatment trt <- factor(c(rep("C", n/4), rep("A", 3*n/4)), levels = c("C", "A")) # some continuous variables x1 <- rnorm(n) x2 <- rnorm(n) # linear predictor lp <- -0.5 + 0.5*I(trt == "A") + 1*I(trt == "A")*I(x1 > 0) # compute probability with inverse logit function invlogit <- function(x) 1/(1 + exp(-x)) pr <- invlogit(lp) # bernoulli response variable y <- rbinom(n, 1, pr) dat <- data.frame(y, trt, x1, x2) # logistic regression model mod <- glm(y ~ trt, data = dat, family = "binomial") binomial_glm_plot(mod, plot_data = TRUE) # logistic regression model tree ltr <- pmtree(mod) plot(ltr, terminal_panel = node_pmterminal(ltr, plotfun = binomial_glm_plot, confint = TRUE, plot_data = TRUE))
This function is mostly useful for plotting a pmtree. The generic plotting does not show the estimate and confidence interval of the scale parameter. This one does.
coeftable.survreg(model, confint = TRUE, digits = 2, intree = FALSE)
coeftable.survreg(model, confint = TRUE, digits = 2, intree = FALSE)
model |
model of class |
confint |
should a confidence interval be computed? Default: TRUE |
digits |
integer, used for formating numbers. Default: 2 |
intree |
is the table plotted within a tree? Default: FALSE |
None.
if(require("survival") & require("TH.data")) { ## Load data data(GBSG2, package = "TH.data") ## Weibull model bmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2, model = TRUE) ## Coefficient table grid.newpage() coeftable.survreg(bmod) ## partitioned model tr <- pmtree(bmod) ## plot plot(tr, terminal_panel = node_pmterminal(tr, plotfun = survreg_plot, confint = TRUE, coeftable = coeftable.survreg)) }
if(require("survival") & require("TH.data")) { ## Load data data(GBSG2, package = "TH.data") ## Weibull model bmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2, model = TRUE) ## Coefficient table grid.newpage() coeftable.survreg(bmod) ## partitioned model tr <- pmtree(bmod) ## plot plot(tr, terminal_panel = node_pmterminal(tr, plotfun = survreg_plot, confint = TRUE, coeftable = coeftable.survreg)) }
Can be used on its own but is also useable as plotfun in
node_pmterminal
.
coxph_plot(mod, data = NULL, theme = theme_classic(), yrange = NULL)
coxph_plot(mod, data = NULL, theme = theme_classic(), yrange = NULL)
mod |
A model of class coxph. |
data |
optional data frame. If NULL the data stored in mod is used. |
theme |
A ggplot2 theme. |
yrange |
Range of the y variable to be used for plotting. If NULL it will be 0 to max(y). |
if(require("survival")) { coxph_plot(coxph(Surv(futime, fustat) ~ factor(rx), ovarian)) }
if(require("survival")) { coxph_plot(coxph(Surv(futime, fustat) ~ factor(rx), ovarian)) }
Can be used on its own but is also useable as plotfun in
node_pmterminal
.
lm_plot( mod, data = NULL, densest = FALSE, theme = theme_classic(), yrange = NULL )
lm_plot( mod, data = NULL, densest = FALSE, theme = theme_classic(), yrange = NULL )
mod |
A model of class lm. |
data |
optional data frame. If NULL the data stored in mod is used. |
densest |
should additional to the model density kernel density estimates
(see |
theme |
A ggplot2 theme. |
yrange |
Range of the y variable to be used for plotting. If NULL the range in the data will be used. |
In case of an offset, the value of the offset variable will be set to the median of the values in the data.
## example taken from ?lm ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) data <- data.frame(weight, group) lm.D9 <- lm(weight ~ group, data = data) lm_plot(lm.D9) ## example taken from ?glm (modified version) data(anorexia, package = "MASS") anorexia$treatment <- factor(anorexia$Treat != "Cont") anorex.1 <- glm(Postwt ~ treatment + offset(Prewt), family = gaussian, data = anorexia) lm_plot(anorex.1)
## example taken from ?lm ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) data <- data.frame(weight, group) lm.D9 <- lm(weight ~ group, data = data) lm_plot(lm.D9) ## example taken from ?glm (modified version) data(anorexia, package = "MASS") anorexia$treatment <- factor(anorexia$Treat != "Cont") anorex.1 <- glm(Postwt ~ treatment + offset(Prewt), family = gaussian, data = anorexia) lm_plot(anorex.1)
Extract sum of log-Likelihood contributions of all terminal nodes. By default the degrees of freedom from the models are used but optionally degrees of freedom for splits can be incorporated.
## S3 method for class 'pmtree' logLik(object, dfsplit = 0, newdata = NULL, weights = NULL, perm = NULL, ...)
## S3 method for class 'pmtree' logLik(object, dfsplit = 0, newdata = NULL, weights = NULL, perm = NULL, ...)
object |
pmtree object. |
dfsplit |
degrees of freedom per selected split. |
newdata |
an optional new data frame for which to compute the sum of objective functions. |
weights |
weights. |
perm |
the number of permutations performed (see |
... |
ignored. |
Returns an object of class logLik
.
objfun.pmtree
for the sum of contributions to the
objective function (not the same when partitioning linear models lm
)
The plot method for party and constparty objects are rather flexible and can be extended by panel functions. The pre-defined panel-generating function of class grapcon_generator for pmtrees is documented here.
node_pmterminal( obj, coeftable = TRUE, digits = 2, confint = TRUE, plotfun, nid = function(node) paste0(nam[id_node(node)], ", n = ", node$info$nobs), ... )
node_pmterminal( obj, coeftable = TRUE, digits = 2, confint = TRUE, plotfun, nid = function(node) paste0(nam[id_node(node)], ", n = ", node$info$nobs), ... )
obj |
an object of class party. |
coeftable |
logical or function. If logical: should a table with
coefficients be added to the plot (TRUE/FALSE)? If function: A function
comparable to |
digits |
integer, used for formating numbers. |
confint |
Should a confidence interval be computed. |
plotfun |
Plotting function to be used. Needs to be of format
|
nid |
function to retrieve info on what is plottet as node ids. |
... |
arguments passed on to plotfun. |
if(require("survival")) { ## compute survreg model mod_surv <- survreg(Surv(futime, fustat) ~ factor(rx), ovarian, dist = 'weibull') survreg_plot(mod_surv) ## partition model and plot tr_surv <- pmtree(mod_surv) plot(tr_surv, terminal_panel = node_pmterminal(tr_surv, plotfun = survreg_plot, confint = TRUE)) } if(require("survival") & require("TH.data")) { ## Load data data(GBSG2, package = "TH.data") ## Weibull model bmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2, model = TRUE) ## Coefficient table grid.newpage() coeftable.survreg(bmod) ## partitioned model tr <- pmtree(bmod) ## plot with specific coeftable plot(tr, terminal_panel = node_pmterminal(tr, plotfun = survreg_plot, confint = TRUE, coeftable = coeftable.survreg)) }
if(require("survival")) { ## compute survreg model mod_surv <- survreg(Surv(futime, fustat) ~ factor(rx), ovarian, dist = 'weibull') survreg_plot(mod_surv) ## partition model and plot tr_surv <- pmtree(mod_surv) plot(tr_surv, terminal_panel = node_pmterminal(tr_surv, plotfun = survreg_plot, confint = TRUE)) } if(require("survival") & require("TH.data")) { ## Load data data(GBSG2, package = "TH.data") ## Weibull model bmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2, model = TRUE) ## Coefficient table grid.newpage() coeftable.survreg(bmod) ## partitioned model tr <- pmtree(bmod) ## plot with specific coeftable plot(tr, terminal_panel = node_pmterminal(tr, plotfun = survreg_plot, confint = TRUE, coeftable = coeftable.survreg)) }
Get the contributions of an objective function. For glm
these are the (weighted) log-likelihood contributions, for lm
the
negative (weighted) squared error.
objfun(x, ...) ## S3 method for class 'survreg' objfun(x, newdata = NULL, weights = NULL, ...) ## S3 method for class 'lm' objfun(x, newdata = NULL, weights = NULL, ...) ## S3 method for class 'glm' objfun(x, newdata = NULL, weights = NULL, log = TRUE, ...)
objfun(x, ...) ## S3 method for class 'survreg' objfun(x, newdata = NULL, weights = NULL, ...) ## S3 method for class 'lm' objfun(x, newdata = NULL, weights = NULL, ...) ## S3 method for class 'glm' objfun(x, newdata = NULL, weights = NULL, log = TRUE, ...)
x |
model object. |
... |
further arguments passed on to |
newdata |
optional. New data frame. Can be useful for model evaluation / benchmarking. |
weights |
|
log |
should the log-Likelihood contributions or the Likelhood contributions be returned? |
vector of objective function contributions.
## Example taken from ?stats::glm ## Dobson (1990) Page 93: Randomized Controlled Trial : counts <- c(18,17,15,20,10,20,25,13,12) outcome <- gl(3,1,9) treatment <- gl(3,3) print(d.AD <- data.frame(treatment, outcome, counts)) glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) logLik_contributions <- objfun(glm.D93) sum(logLik_contributions) logLik(glm.D93) if(require("survival")) { x <- survreg(Surv(futime, fustat) ~ rx, ovarian, dist = "weibull") newdata <- ovarian[3:5, ] sum(objfun(x)) x$loglik objfun(x, newdata = newdata) }
## Example taken from ?stats::glm ## Dobson (1990) Page 93: Randomized Controlled Trial : counts <- c(18,17,15,20,10,20,25,13,12) outcome <- gl(3,1,9) treatment <- gl(3,3) print(d.AD <- data.frame(treatment, outcome, counts)) glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) logLik_contributions <- objfun(glm.D93) sum(logLik_contributions) logLik(glm.D93) if(require("survival")) { x <- survreg(Surv(futime, fustat) ~ rx, ovarian, dist = "weibull") newdata <- ovarian[3:5, ] sum(objfun(x)) x$loglik objfun(x, newdata = newdata) }
Get the contributions of an objective function (e.g. likelihood contributions) and the sum thereof (e.g. log-Likelihood).
## S3 method for class 'pmodel_identity' objfun(x, ...) ## S3 method for class 'pmodel_identity' logLik(object, add_df = 0, ...)
## S3 method for class 'pmodel_identity' objfun(x, ...) ## S3 method for class 'pmodel_identity' logLik(object, add_df = 0, ...)
x , object
|
object of class pmodel_identity (obtained by |
... |
additional parameters passed on to |
add_df |
it is not very clear what the degrees of freedom are in personalised models.
With this argument you can add/substract degrees of freedom at your convenience. Default
is For examples see |
Returns the contributions to the objective function or the
sum thereof (if sum = TRUE
).
## S3 method for class 'pmtree' objfun(x, newdata = NULL, weights = NULL, perm = NULL, sum = FALSE, ...)
## S3 method for class 'pmtree' objfun(x, newdata = NULL, weights = NULL, perm = NULL, sum = FALSE, ...)
x |
pmtree object. |
newdata |
an optional new data frame for which to compute the sum of objective functions. |
weights |
weights. |
perm |
the number of permutations performed (see |
sum |
should the sum of objective functions be computed. |
... |
passed on to Note that |
objective function or the sum thereof
## generate data set.seed(2) n <- 1000 trt <- factor(rep(1:2, each = n/2)) age <- sample(40:60, size = n, replace = TRUE) eff <- -1 + I(trt == 2) + 1 * I(trt == 2) * I(age > 50) expit <- function(x) 1/(1 + exp(-x)) success <- rbinom(n = n, size = 1, prob = expit(eff)) dat <- data.frame(success, trt, age) ## compute base model bmod1 <- glm(success ~ trt, data = dat, family = binomial) ## copmute tree (tr1 <- pmtree(bmod1, data = dat)) ## compute log-Likelihood logLik(tr1) objfun(tr1, newdata = dat, sum = TRUE) objfun(tr1, sum = TRUE) ## log-Likelihood contributions of first ## 5 observations nd <- dat[1:5, ] objfun(tr1, newdata = nd)
## generate data set.seed(2) n <- 1000 trt <- factor(rep(1:2, each = n/2)) age <- sample(40:60, size = n, replace = TRUE) eff <- -1 + I(trt == 2) + 1 * I(trt == 2) * I(age > 50) expit <- function(x) 1/(1 + exp(-x)) success <- rbinom(n = n, size = 1, prob = expit(eff)) dat <- data.frame(success, trt, age) ## compute base model bmod1 <- glm(success ~ trt, data = dat, family = binomial) ## copmute tree (tr1 <- pmtree(bmod1, data = dat)) ## compute log-Likelihood logLik(tr1) objfun(tr1, newdata = dat, sum = TRUE) objfun(tr1, sum = TRUE) ## log-Likelihood contributions of first ## 5 observations nd <- dat[1:5, ] objfun(tr1, newdata = nd)
See https://stackoverflow.com/questions/50504386/check-that-model-has-only-one-factor-covariate/50514499#50514499
one_factor(object)
one_factor(object)
object |
model. |
Returns TRUE if model has a single factor covariate, FALSE otherwise.
Input a parametric model and get a forest.
pmforest( model, data = NULL, zformula = ~., ntree = 500L, perturb = list(replace = FALSE, fraction = 0.632), mtry = NULL, applyfun = NULL, cores = NULL, control = ctree_control(teststat = "quad", testtype = "Univ", mincriterion = 0, saveinfo = FALSE, lookahead = TRUE, ...), trace = FALSE, ... ) ## S3 method for class 'pmforest' gettree(object, tree = 1L, saveinfo = TRUE, coeffun = coef, ...)
pmforest( model, data = NULL, zformula = ~., ntree = 500L, perturb = list(replace = FALSE, fraction = 0.632), mtry = NULL, applyfun = NULL, cores = NULL, control = ctree_control(teststat = "quad", testtype = "Univ", mincriterion = 0, saveinfo = FALSE, lookahead = TRUE, ...), trace = FALSE, ... ) ## S3 method for class 'pmforest' gettree(object, tree = 1L, saveinfo = TRUE, coeffun = coef, ...)
model |
a model object. The model can be a parametric model with a single binary covariate. |
data |
data. If |
zformula |
formula describing which variable should be used for partitioning.
Default is to use all variables in data that are not in the model (i.e. |
ntree |
number of trees. |
perturb |
a list with arguments replace and fraction determining which type of
resampling with |
mtry |
number of input variables randomly sampled as candidates at each
node (Default |
applyfun |
see |
cores |
see |
control |
control parameters, see |
trace |
a logical indicating if a progress bar shall be printed while the forest grows. |
... |
additional parameters passed on to model fit such as weights. |
object |
an object returned by pmforest. |
tree |
an integer, the number of the tree to extract from the forest. |
saveinfo |
logical. Should the model info be stored in terminal nodes? |
coeffun |
function that takes the model object and returns the coefficients. Useful when coef() does not return all coefficients (e.g. survreg). |
cforest object
library("model4you") if(require("mvtnorm") & require("survival")) { ## function to simulate the data sim_data <- function(n = 500, p = 10, beta = 3, sd = 1){ ## treatment lev <- c("C", "A") a <- rep(factor(lev, labels = lev, levels = lev), length = n) ## correlated z variables sigma <- diag(p) sigma[sigma == 0] <- 0.2 ztemp <- rmvnorm(n, sigma = sigma) z <- (pnorm(ztemp) * 2 * pi) - pi colnames(z) <- paste0("z", 1:ncol(z)) z1 <- z[,1] ## outcome y <- 7 + 0.2 * (a %in% "A") + beta * cos(z1) * (a %in% "A") + rnorm(n, 0, sd) data.frame(y = y, a = a, z) } ## simulate data set.seed(123) beta <- 3 ntrain <- 500 ntest <- 50 simdata <- simdata_s <- sim_data(p = 5, beta = beta, n = ntrain) tsimdata <- tsimdata_s <- sim_data(p = 5, beta = beta, n = ntest) simdata_s$cens <- rep(1, ntrain) tsimdata_s$cens <- rep(1, ntest) ## base model basemodel_lm <- lm(y ~ a, data = simdata) ## forest frst_lm <- pmforest(basemodel_lm, ntree = 20, perturb = list(replace = FALSE, fraction = 0.632), control = ctree_control(mincriterion = 0)) ## personalised models # (1) return the model objects pmodels_lm <- pmodel(x = frst_lm, newdata = tsimdata, fun = identity) class(pmodels_lm) # (2) return coefficients only (default) coefs_lm <- pmodel(x = frst_lm, newdata = tsimdata) # compare predictive objective functions of personalised models versus # base model sum(objfun(pmodels_lm)) # -RSS personalised models sum(objfun(basemodel_lm, newdata = tsimdata)) # -RSS base model if(require("ggplot2")) { ## dependence plot dp_lm <- cbind(coefs_lm, tsimdata) ggplot(tsimdata) + stat_function(fun = function(z1) 0.2 + beta * cos(z1), aes(color = "true treatment\neffect")) + geom_point(data = dp_lm, aes(y = aA, x = z1, color = "estimates lm"), alpha = 0.5) + ylab("treatment effect") + xlab("patient characteristic z1") } }
library("model4you") if(require("mvtnorm") & require("survival")) { ## function to simulate the data sim_data <- function(n = 500, p = 10, beta = 3, sd = 1){ ## treatment lev <- c("C", "A") a <- rep(factor(lev, labels = lev, levels = lev), length = n) ## correlated z variables sigma <- diag(p) sigma[sigma == 0] <- 0.2 ztemp <- rmvnorm(n, sigma = sigma) z <- (pnorm(ztemp) * 2 * pi) - pi colnames(z) <- paste0("z", 1:ncol(z)) z1 <- z[,1] ## outcome y <- 7 + 0.2 * (a %in% "A") + beta * cos(z1) * (a %in% "A") + rnorm(n, 0, sd) data.frame(y = y, a = a, z) } ## simulate data set.seed(123) beta <- 3 ntrain <- 500 ntest <- 50 simdata <- simdata_s <- sim_data(p = 5, beta = beta, n = ntrain) tsimdata <- tsimdata_s <- sim_data(p = 5, beta = beta, n = ntest) simdata_s$cens <- rep(1, ntrain) tsimdata_s$cens <- rep(1, ntest) ## base model basemodel_lm <- lm(y ~ a, data = simdata) ## forest frst_lm <- pmforest(basemodel_lm, ntree = 20, perturb = list(replace = FALSE, fraction = 0.632), control = ctree_control(mincriterion = 0)) ## personalised models # (1) return the model objects pmodels_lm <- pmodel(x = frst_lm, newdata = tsimdata, fun = identity) class(pmodels_lm) # (2) return coefficients only (default) coefs_lm <- pmodel(x = frst_lm, newdata = tsimdata) # compare predictive objective functions of personalised models versus # base model sum(objfun(pmodels_lm)) # -RSS personalised models sum(objfun(basemodel_lm, newdata = tsimdata)) # -RSS base model if(require("ggplot2")) { ## dependence plot dp_lm <- cbind(coefs_lm, tsimdata) ggplot(tsimdata) + stat_function(fun = function(z1) 0.2 + beta * cos(z1), aes(color = "true treatment\neffect")) + geom_point(data = dp_lm, aes(y = aA, x = z1, color = "estimates lm"), alpha = 0.5) + ylab("treatment effect") + xlab("patient characteristic z1") } }
Compute personalised models from cforest object.
pmodel( x = NULL, model = NULL, newdata = NULL, OOB = TRUE, fun = coef, return_attr = c("modelcall", "data", "similarity") )
pmodel( x = NULL, model = NULL, newdata = NULL, OOB = TRUE, fun = coef, return_attr = c("modelcall", "data", "similarity") )
x |
cforest object or matrix of weights. |
model |
model object. If NULL the model in |
newdata |
new data. If NULL cforest learning data is used. Ignored if |
OOB |
In case of using the learning data, should patient similarities be computed out of bag? |
fun |
function to apply on the personalised model before returning. The
default |
return_attr |
which attributes to add to the object returned. If it contains
|
depends on fun.
library("model4you") if(require("mvtnorm") & require("survival")) { ## function to simulate the data sim_data <- function(n = 500, p = 10, beta = 3, sd = 1){ ## treatment lev <- c("C", "A") a <- rep(factor(lev, labels = lev, levels = lev), length = n) ## correlated z variables sigma <- diag(p) sigma[sigma == 0] <- 0.2 ztemp <- rmvnorm(n, sigma = sigma) z <- (pnorm(ztemp) * 2 * pi) - pi colnames(z) <- paste0("z", 1:ncol(z)) z1 <- z[,1] ## outcome y <- 7 + 0.2 * (a %in% "A") + beta * cos(z1) * (a %in% "A") + rnorm(n, 0, sd) data.frame(y = y, a = a, z) } ## simulate data set.seed(123) beta <- 3 ntrain <- 500 ntest <- 50 simdata <- simdata_s <- sim_data(p = 5, beta = beta, n = ntrain) tsimdata <- tsimdata_s <- sim_data(p = 5, beta = beta, n = ntest) simdata_s$cens <- rep(1, ntrain) tsimdata_s$cens <- rep(1, ntest) ## base model basemodel_lm <- lm(y ~ a, data = simdata) ## forest frst_lm <- pmforest(basemodel_lm, ntree = 20, perturb = list(replace = FALSE, fraction = 0.632), control = ctree_control(mincriterion = 0)) ## personalised models # (1) return the model objects pmodels_lm <- pmodel(x = frst_lm, newdata = tsimdata, fun = identity) class(pmodels_lm) # (2) return coefficients only (default) coefs_lm <- pmodel(x = frst_lm, newdata = tsimdata) # compare predictive objective functions of personalised models versus # base model sum(objfun(pmodels_lm)) # -RSS personalised models sum(objfun(basemodel_lm, newdata = tsimdata)) # -RSS base model if(require("ggplot2")) { ## dependence plot dp_lm <- cbind(coefs_lm, tsimdata) ggplot(tsimdata) + stat_function(fun = function(z1) 0.2 + beta * cos(z1), aes(color = "true treatment\neffect")) + geom_point(data = dp_lm, aes(y = aA, x = z1, color = "estimates lm"), alpha = 0.5) + ylab("treatment effect") + xlab("patient characteristic z1") } }
library("model4you") if(require("mvtnorm") & require("survival")) { ## function to simulate the data sim_data <- function(n = 500, p = 10, beta = 3, sd = 1){ ## treatment lev <- c("C", "A") a <- rep(factor(lev, labels = lev, levels = lev), length = n) ## correlated z variables sigma <- diag(p) sigma[sigma == 0] <- 0.2 ztemp <- rmvnorm(n, sigma = sigma) z <- (pnorm(ztemp) * 2 * pi) - pi colnames(z) <- paste0("z", 1:ncol(z)) z1 <- z[,1] ## outcome y <- 7 + 0.2 * (a %in% "A") + beta * cos(z1) * (a %in% "A") + rnorm(n, 0, sd) data.frame(y = y, a = a, z) } ## simulate data set.seed(123) beta <- 3 ntrain <- 500 ntest <- 50 simdata <- simdata_s <- sim_data(p = 5, beta = beta, n = ntrain) tsimdata <- tsimdata_s <- sim_data(p = 5, beta = beta, n = ntest) simdata_s$cens <- rep(1, ntrain) tsimdata_s$cens <- rep(1, ntest) ## base model basemodel_lm <- lm(y ~ a, data = simdata) ## forest frst_lm <- pmforest(basemodel_lm, ntree = 20, perturb = list(replace = FALSE, fraction = 0.632), control = ctree_control(mincriterion = 0)) ## personalised models # (1) return the model objects pmodels_lm <- pmodel(x = frst_lm, newdata = tsimdata, fun = identity) class(pmodels_lm) # (2) return coefficients only (default) coefs_lm <- pmodel(x = frst_lm, newdata = tsimdata) # compare predictive objective functions of personalised models versus # base model sum(objfun(pmodels_lm)) # -RSS personalised models sum(objfun(basemodel_lm, newdata = tsimdata)) # -RSS base model if(require("ggplot2")) { ## dependence plot dp_lm <- cbind(coefs_lm, tsimdata) ggplot(tsimdata) + stat_function(fun = function(z1) 0.2 + beta * cos(z1), aes(color = "true treatment\neffect")) + geom_point(data = dp_lm, aes(y = aA, x = z1, color = "estimates lm"), alpha = 0.5) + ylab("treatment effect") + xlab("patient characteristic z1") } }
This is a rudimentary test if there is heterogeneity in the model parameters. The null-hypothesis is: the base model is the correct model.
pmtest(forest, pmodels = NULL, data = NULL, B = 100) ## S3 method for class 'heterogeneity_test' plot(x, ...)
pmtest(forest, pmodels = NULL, data = NULL, B = 100) ## S3 method for class 'heterogeneity_test' plot(x, ...)
forest |
pmforest object. |
pmodels |
pmodel_identity object (pmodel(..., fun = identity)). |
data |
data. |
B |
number of bootstrap samples. |
x |
object of class heterogeneity_test. |
... |
ignored. |
list where the first element is the p-value und the second element is a data.frame with all neccessary infos to compute the p-value.
The test statistic is the difference in objective function between the base model and the personalised models. To compute the distribution under the Null we draw parametric bootstrap samples from the base model. For each bootstrap sample we again compute the difference in objective function between the base model and the personalised models. If the difference in the original data is greater than the difference in the bootstrap samples, we reject the null-hypothesis.
## Not run: set.seed(123) n <- 160 trt <- factor(rep(0:1, each = n/2)) y <- 4 + (trt == 1) + rnorm(n) z <- matrix(rnorm(n * 2), ncol = 2) dat <- data.frame(y, trt, z) mod <- lm(y ~ trt, data = dat) ## Note that ntree should usually be higher frst <- pmforest(mod, ntree = 20) pmods <- pmodel(frst, fun = identity) ## Note that B should be at least 100 ## The low B is just for demonstration ## purposes. tst <- pmtest(forest = frst, pmodels = pmods, B = 10) tst$pvalue tst plot(tst) ## End(Not run)
## Not run: set.seed(123) n <- 160 trt <- factor(rep(0:1, each = n/2)) y <- 4 + (trt == 1) + rnorm(n) z <- matrix(rnorm(n * 2), ncol = 2) dat <- data.frame(y, trt, z) mod <- lm(y ~ trt, data = dat) ## Note that ntree should usually be higher frst <- pmforest(mod, ntree = 20) pmods <- pmodel(frst, fun = identity) ## Note that B should be at least 100 ## The low B is just for demonstration ## purposes. tst <- pmtest(forest = frst, pmodels = pmods, B = 10) tst$pvalue tst plot(tst) ## End(Not run)
Input a parametric model and get a model-based tree.
pmtree( model, data = NULL, zformula = ~., control = ctree_control(), coeffun = coef, ... )
pmtree( model, data = NULL, zformula = ~., control = ctree_control(), coeffun = coef, ... )
model |
a model object. The model can be a parametric model with a binary covariate. |
data |
data. If NULL (default) the data from the model object are used. |
zformula |
formula describing which variable should be used for partitioning.
Default is to use all variables in data that are not in the model (i.e. |
control |
control parameters, see |
coeffun |
function that takes the model object and returns the coefficients.
Useful when |
... |
additional parameters passed on to model fit such as weights. |
Sometimes the number of participant in each treatment group needs to
be of a certain size. This can be accomplished by setting control$converged
.
See example below.
ctree object
if(require("TH.data") & require("survival")) { ## base model bmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2, model = TRUE) survreg_plot(bmod) ## partitioned model tr <- pmtree(bmod) plot(tr, terminal_panel = node_pmterminal(tr, plotfun = survreg_plot, confint = TRUE)) summary(tr) summary(tr, node = 1:2) logLik(bmod) logLik(tr) ## Sometimes the number of participant in each treatment group needs to ## be of a certain size. This can be accomplished using converged ## Each treatment group should have more than 33 observations ctrl <- ctree_control(lookahead = TRUE) ctrl$converged <- function(mod, data, subset) { all(table(data$horTh[subset]) > 33) } tr2 <- pmtree(bmod, control = ctrl) plot(tr2, terminal_panel = node_pmterminal(tr, plotfun = survreg_plot, confint = TRUE)) summary(tr2[[5]]$data$horTh) } if(require("psychotools")) { data("MathExam14W", package = "psychotools") ## scale points achieved to [0, 100] percent MathExam14W$tests <- 100 * MathExam14W$tests/26 MathExam14W$pcorrect <- 100 * MathExam14W$nsolved/13 ## select variables to be used MathExam <- MathExam14W[ , c("pcorrect", "group", "tests", "study", "attempt", "semester", "gender")] ## compute base model bmod_math <- lm(pcorrect ~ group, data = MathExam) lm_plot(bmod_math, densest = TRUE) ## compute tree (tr_math <- pmtree(bmod_math, control = ctree_control(maxdepth = 2))) plot(tr_math, terminal_panel = node_pmterminal(tr_math, plotfun = lm_plot, confint = FALSE)) plot(tr_math, terminal_panel = node_pmterminal(tr_math, plotfun = lm_plot, densest = TRUE, confint = TRUE)) ## predict newdat <- MathExam[1:5, ] # terminal nodes (nodes <- predict(tr_math, type = "node", newdata = newdat)) # response (pr <- predict(tr_math, type = "pass", newdata = newdat)) # response including confidence intervals, see ?predict.lm (pr1 <- predict(tr_math, type = "pass", newdata = newdat, predict_args = list(interval = "confidence"))) }
if(require("TH.data") & require("survival")) { ## base model bmod <- survreg(Surv(time, cens) ~ horTh, data = GBSG2, model = TRUE) survreg_plot(bmod) ## partitioned model tr <- pmtree(bmod) plot(tr, terminal_panel = node_pmterminal(tr, plotfun = survreg_plot, confint = TRUE)) summary(tr) summary(tr, node = 1:2) logLik(bmod) logLik(tr) ## Sometimes the number of participant in each treatment group needs to ## be of a certain size. This can be accomplished using converged ## Each treatment group should have more than 33 observations ctrl <- ctree_control(lookahead = TRUE) ctrl$converged <- function(mod, data, subset) { all(table(data$horTh[subset]) > 33) } tr2 <- pmtree(bmod, control = ctrl) plot(tr2, terminal_panel = node_pmterminal(tr, plotfun = survreg_plot, confint = TRUE)) summary(tr2[[5]]$data$horTh) } if(require("psychotools")) { data("MathExam14W", package = "psychotools") ## scale points achieved to [0, 100] percent MathExam14W$tests <- 100 * MathExam14W$tests/26 MathExam14W$pcorrect <- 100 * MathExam14W$nsolved/13 ## select variables to be used MathExam <- MathExam14W[ , c("pcorrect", "group", "tests", "study", "attempt", "semester", "gender")] ## compute base model bmod_math <- lm(pcorrect ~ group, data = MathExam) lm_plot(bmod_math, densest = TRUE) ## compute tree (tr_math <- pmtree(bmod_math, control = ctree_control(maxdepth = 2))) plot(tr_math, terminal_panel = node_pmterminal(tr_math, plotfun = lm_plot, confint = FALSE)) plot(tr_math, terminal_panel = node_pmterminal(tr_math, plotfun = lm_plot, densest = TRUE, confint = TRUE)) ## predict newdat <- MathExam[1:5, ] # terminal nodes (nodes <- predict(tr_math, type = "node", newdata = newdat)) # response (pr <- predict(tr_math, type = "pass", newdata = newdat)) # response including confidence intervals, see ?predict.lm (pr1 <- predict(tr_math, type = "pass", newdata = newdat, predict_args = list(interval = "confidence"))) }
Compute predictions from pmtree object.
## S3 method for class 'pmtree' predict( object, newdata = NULL, type = "node", predict_args = list(), perm = NULL, ... )
## S3 method for class 'pmtree' predict( object, newdata = NULL, type = "node", predict_args = list(), perm = NULL, ... )
object |
pmtree object. |
newdata |
an optional data frame in which to look for variables with
which to predict, if omitted, |
type |
character denoting the type of predicted value. The terminal node
is returned for |
predict_args |
If |
perm |
an optional character vector of variable names (or integer vector
of variable location in |
... |
passed on to predict.party (e.g. |
predictions
if(require("psychotools")) { data("MathExam14W", package = "psychotools") ## scale points achieved to [0, 100] percent MathExam14W$tests <- 100 * MathExam14W$tests/26 MathExam14W$pcorrect <- 100 * MathExam14W$nsolved/13 ## select variables to be used MathExam <- MathExam14W[ , c("pcorrect", "group", "tests", "study", "attempt", "semester", "gender")] ## compute base model bmod_math <- lm(pcorrect ~ group, data = MathExam) lm_plot(bmod_math, densest = TRUE) ## compute tree (tr_math <- pmtree(bmod_math, control = ctree_control(maxdepth = 2))) plot(tr_math, terminal_panel = node_pmterminal(tr_math, plotfun = lm_plot, confint = FALSE)) plot(tr_math, terminal_panel = node_pmterminal(tr_math, plotfun = lm_plot, densest = TRUE, confint = TRUE)) ## predict newdat <- MathExam[1:5, ] # terminal nodes (nodes <- predict(tr_math, type = "node", newdata = newdat)) # response (pr <- predict(tr_math, type = "pass", newdata = newdat)) # response including confidence intervals, see ?predict.lm (pr1 <- predict(tr_math, type = "pass", newdata = newdat, predict_args = list(interval = "confidence"))) }
if(require("psychotools")) { data("MathExam14W", package = "psychotools") ## scale points achieved to [0, 100] percent MathExam14W$tests <- 100 * MathExam14W$tests/26 MathExam14W$pcorrect <- 100 * MathExam14W$nsolved/13 ## select variables to be used MathExam <- MathExam14W[ , c("pcorrect", "group", "tests", "study", "attempt", "semester", "gender")] ## compute base model bmod_math <- lm(pcorrect ~ group, data = MathExam) lm_plot(bmod_math, densest = TRUE) ## compute tree (tr_math <- pmtree(bmod_math, control = ctree_control(maxdepth = 2))) plot(tr_math, terminal_panel = node_pmterminal(tr_math, plotfun = lm_plot, confint = FALSE)) plot(tr_math, terminal_panel = node_pmterminal(tr_math, plotfun = lm_plot, densest = TRUE, confint = TRUE)) ## predict newdat <- MathExam[1:5, ] # terminal nodes (nodes <- predict(tr_math, type = "node", newdata = newdat)) # response (pr <- predict(tr_math, type = "pass", newdata = newdat)) # response including confidence intervals, see ?predict.lm (pr1 <- predict(tr_math, type = "pass", newdata = newdat, predict_args = list(interval = "confidence"))) }
Print and summary methods for pmtree objects.
## S3 method for class 'pmtree' print( x, node = NULL, FUN = NULL, digits = getOption("digits") - 4L, footer = TRUE, ... ) ## S3 method for class 'pmtree' summary(object, node = NULL, ...) ## S3 method for class 'summary.pmtree' print(x, digits = 4, ...) ## S3 method for class 'pmtree' coef(object, node = NULL, ...)
## S3 method for class 'pmtree' print( x, node = NULL, FUN = NULL, digits = getOption("digits") - 4L, footer = TRUE, ... ) ## S3 method for class 'pmtree' summary(object, node = NULL, ...) ## S3 method for class 'summary.pmtree' print(x, digits = 4, ...) ## S3 method for class 'pmtree' coef(object, node = NULL, ...)
x |
object. |
node |
node number, if any. |
FUN |
formatinfo function. |
digits |
number of digits. |
footer |
should footer be included? |
... |
further arguments passed on to |
object |
object. |
Returns the sum of the squared residuals for a given object.
rss(object, ...) ## Default S3 method: rss(object, ...)
rss(object, ...) ## Default S3 method: rss(object, ...)
object |
model object. |
... |
passed on to specific methods. |
sum of the squared residuals.
## example from ?lm ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) lm.D9 <- lm(weight ~ group) rss(lm.D9)
## example from ?lm ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) lm.D9 <- lm(weight ~ group) rss(lm.D9)
Can be used on its own but is also useable as plotfun in
node_pmterminal
.
survreg_plot(mod, data = NULL, theme = theme_classic(), yrange = NULL)
survreg_plot(mod, data = NULL, theme = theme_classic(), yrange = NULL)
mod |
A model of class survreg. |
data |
optional data frame. If NULL the data stored in mod is used. |
theme |
A ggplot2 theme. |
yrange |
Range of the y variable to be used for plotting. If NULL it will be 0 to max(y). |
if(require("survival")) { survreg_plot(survreg(Surv(futime, fustat) ~ factor(rx), ovarian)) }
if(require("survival")) { survreg_plot(survreg(Surv(futime, fustat) ~ factor(rx), ovarian)) }
See varimp.cforest
.
## S3 method for class 'pmforest' varimp( object, nperm = 1L, OOB = TRUE, risk = function(x, ...) -objfun(x, sum = TRUE, ...), conditional = FALSE, threshold = 0.2, ... )
## S3 method for class 'pmforest' varimp( object, nperm = 1L, OOB = TRUE, risk = function(x, ...) -objfun(x, sum = TRUE, ...), conditional = FALSE, threshold = 0.2, ... )
object |
DESCRIPTION. |
nperm |
the number of permutations performed. |
OOB |
a logical determining whether the importance is computed from the out-of-bag sample or the learning sample (not suggested). |
risk |
the risk to be evaluated. By default the objective function (e.g. log-Likelihood) is used. |
conditional |
a logical determining whether unconditional or conditional computation of the importance is performed. |
threshold |
the value of the test statistic or 1 - p-value of the association between the variable of interest and a covariate that must be exceeded inorder to include the covariate in the conditioning scheme for the variable of interest (only relevant if conditional = TRUE). |
... |
passed on to |
A vector of 'mean decrease in accuracy' importance scores.