Title: | Regression Trees and Forests for Circular Responses |
---|---|
Description: | Infrastructure for fitting distributional trees and forests based on maximum-likelihood estimation of parameters for a circular response, as well as regression methods for a circular response based on maximum-likelihood estimation are provided. For both approaches the von Mises distribution is employed as circular response distribution. |
Authors: | Moritz N. Lang [aut, cre] , Lisa Schlosser [aut] , Achim Zeileis [aut] |
Maintainer: | Moritz N. Lang <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.1-0 |
Built: | 2024-11-06 19:13:58 UTC |
Source: | https://github.com/r-forge/partykit |
The function circfit
carries out maximum-likelihood estimation of
parameters for a circular response employing the von Mises distribution.
The parameters can be transformed through link functions but do not depend
on further covariates (i.e., are constant across observations).
circfit(y, weights = NULL, start = NULL, start.eta = NULL, response_range = NULL, vcov = TRUE, type.hessian = c("checklist", "analytic", "numeric"), method = "L-BFGS-B", estfun = TRUE, optim.control = list(), ...)
circfit(y, weights = NULL, start = NULL, start.eta = NULL, response_range = NULL, vcov = TRUE, type.hessian = c("checklist", "analytic", "numeric"), method = "L-BFGS-B", estfun = TRUE, optim.control = list(), ...)
y |
numeric vector of the response |
weights |
optional numeric vector of case weights. |
start |
starting values for the distribution parameters handed over to |
start.eta |
starting values for the distribution parameters on the link scale handed over to
|
response_range |
either a logical value indicating whether the response should be transformed to its original range (TRUE) or kept on the interval (-pi,pi] or a two-dimensional vector specifying a range of the circular response. |
vcov |
logical. Specifies whether or not a variance-covariance matrix should be calculated and returned. |
type.hessian |
Can either be 'checklist', 'analytic' or 'numeric' to decide how the hessian matrix should be
calculated in the fitting process in |
method |
Optimization which should be applied in |
estfun |
logical. Should the matrix of observation-wise score contributions (or empirical estimating functions) be returned? |
optim.control |
A list with |
... |
further arguments passed to |
The function circfit
fits the parameter of the von Mises distribution to a
circular response variable by applying distfit
.
An object of S3 class circfit
inheriting from class distfit
.
## example on parameter range: sdat.par <- circtree_simulate(response_range = c(-pi, pi)) cf.par <- circfit(sdat.par$y) ## example on response range (0, 2pi): sdat.rad <- circtree_simulate(response_range = c(0, 2*pi)) cf.rad <- circfit(sdat.rad$y) ## example on response range (0, 360): sdat.deg <- circtree_simulate(response_range = c(0, 360)) cf.deg <- circfit(sdat.deg$y)
## example on parameter range: sdat.par <- circtree_simulate(response_range = c(-pi, pi)) cf.par <- circfit(sdat.par$y) ## example on response range (0, 2pi): sdat.rad <- circtree_simulate(response_range = c(0, 2*pi)) cf.rad <- circfit(sdat.rad$y) ## example on response range (0, 360): sdat.deg <- circtree_simulate(response_range = c(0, 360)) cf.deg <- circfit(sdat.deg$y)
Distributional forests based on maximum-likelihood estimation of parameters for a circular response employing the von Mises distribution.
circforest(formula, data, response_range = NULL, subset, na.action = na.pass, weights, offset, cluster, strata, control = disttree_control(teststat = "quad", testtype = "Univ", mincriterion = 0, saveinfo = FALSE, minsplit = 20, minbucket = 7, splittry = 2, ...), ntree = 500L, fit.par = FALSE, perturb = list(replace = FALSE, fraction = 0.632), mtry = ceiling(sqrt(nvar)), applyfun = NULL, cores = NULL, trace = FALSE, ...) ## S3 method for class 'circforest' predict(object, newdata = NULL, type = c("parameter", "response", "weights", "node"), OOB = TRUE, scale = TRUE, response_range = FALSE, ...)
circforest(formula, data, response_range = NULL, subset, na.action = na.pass, weights, offset, cluster, strata, control = disttree_control(teststat = "quad", testtype = "Univ", mincriterion = 0, saveinfo = FALSE, minsplit = 20, minbucket = 7, splittry = 2, ...), ntree = 500L, fit.par = FALSE, perturb = list(replace = FALSE, fraction = 0.632), mtry = ceiling(sqrt(nvar)), applyfun = NULL, cores = NULL, trace = FALSE, ...) ## S3 method for class 'circforest' predict(object, newdata = NULL, type = c("parameter", "response", "weights", "node"), OOB = TRUE, scale = TRUE, response_range = FALSE, ...)
formula |
a symbolic description of the model to be fit. This
should be of type |
data |
an optional data frame containing the variables in the model. |
response_range |
either a logical value indicating whether the response should be transformed to its original range (TRUE) or kept on the interval (-pi,pi] or a two-dimensional vector specifying a range of the circular response. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain missing value. |
weights |
optional numeric vector of case weights. |
offset |
an optional vector of offset values. |
cluster |
an optional factor indicating independent clusters. Highly experimental, use at your own risk. |
strata |
an optional factor for stratified sampling. |
control |
a list with control parameters passed to
|
ntree |
number of trees to grow for the forest. |
fit.par |
logical. if TRUE, fitted and predicted values and predicted parameters are calculated for the learning data (together with loglikelihood) |
perturb |
a list with arguments |
mtry |
number of input variables randomly sampled as candidates
at each node for random forest like algorithms. Bagging, as special case
of a random forest without random input variable sampling, can
be performed by setting |
applyfun |
an optional |
cores |
numeric. If set to an integer the |
trace |
a logical indicating if a progress bar shall be printed while the forest grows. |
object |
an object as returned by |
newdata |
an optional data frame containing test data. |
type |
a character string denoting the type of predicted value
returned. For |
OOB |
a logical defining out-of-bag predictions (only if |
scale |
a logical indicating scaling of the nearest neighbor weights by the sum of weights in the corresponding terminal node of each tree. In the simple regression forest, predicting the conditional mean by nearest neighbor weights will be equivalent to (but slower!) the aggregation of means. |
... |
arguments to be used to form the default |
Distributional regression forests for a circular response are an application of model-based recursive
partitioning and unbiased recursive partitioning based on the implementation in
distforest
using the infrastructure of extree_fit
.
An object of S3 class circforest
inheriting from class distforest
.
distforest
, disttree
,
distfit
, extree_fit
#sdat <- circtree_simulate() #cf <- circforest(y ~ x1 + x2, data = sdat, ntree = 50)
#sdat <- circtree_simulate() #cf <- circforest(y ~ x1 + x2, data = sdat, ntree = 50)
Fit a regression model for a circular response by maximum likelihood estimation employing the von Mises distribution.
circmax(formula, data, subset, na.action, model = TRUE, y = TRUE, x = FALSE, control = circmax_control(...), ...) circmax_fit(x, y, z = NULL, control) circmax_control(maxit = 5000, start = NULL, method = "Nelder-Mead", solve_kappa = "Newton-Fourier", gradient = FALSE, hessian = TRUE, ...)
circmax(formula, data, subset, na.action, model = TRUE, y = TRUE, x = FALSE, control = circmax_control(...), ...) circmax_fit(x, y, z = NULL, control) circmax_control(maxit = 5000, start = NULL, method = "Nelder-Mead", solve_kappa = "Newton-Fourier", gradient = FALSE, hessian = TRUE, ...)
formula |
a formula expression of the form |
data |
an optional data frame containing the variables occurring in the formulas; y has to be given in radians. |
subset |
an optional vector specifying a subset of observations to be used for fitting. |
na.action |
a function which indicates what should happen when the data
contain |
model |
logical. If |
x , y
|
for |
z |
a design matrix with regressors for the concentration. |
... |
arguments to be used to form the default |
control , maxit , start
|
a list of control parameters passed to |
method |
The |
solve_kappa |
Which kappa solver should be used for the starting values for kappa.
By default a Newton Fourier is used ( |
gradient |
logical. Should gradients be used for optimization? If |
hessian |
logical or character. Should a numeric approximation of the
(negative) Hessian matrix by |
circmax
fits a regression model for a circular response assuming a von Mises distribution.
circmax_fit
is the lower level function where the parameters of the von Mises distribution
are fitted by maximum likelihood estimation.
An object of class "circmax"
.
## Example 1: Simulated Data: sdat <- circmax_simulate(n = 1000, beta = c(3, 5, 2), gamma = c(3, 3)) (m1.circmax <- circmax(y ~ x1 + x2 | x3, data = sdat)) ## Example 2: Periwinkle Dataset of Fisher and Lee, 1992: require("circular") distance <- c(107, 46, 33, 67, 122, 69, 43, 30, 12, 25, 37, 69, 5, 83, 68, 38, 21, 1, 71, 60, 71, 71, 57, 53, 38, 70, 7, 48, 7, 21, 27) directdeg <- c(67, 66, 74, 61, 58, 60, 100, 89, 171, 166, 98, 60, 197, 98, 86, 123, 165, 133, 101, 105, 71, 84, 75, 98, 83, 71, 74, 91, 38, 200, 56) cdirect <- circular(directdeg * 2 * pi/360) plot(as.numeric(cdirect) ~ distance, ylim = c(0, 4*pi), pch = 20) points(as.numeric(cdirect) + 2*pi ~ distance, pch = 20) (m2.circ <- lm.circular(type = "c-l", y = cdirect, x = distance, init = 0.0)) (m2.circmax <- circmax(cdirect ~ distance, data = data.frame(cbind(distance, cdirect))))
## Example 1: Simulated Data: sdat <- circmax_simulate(n = 1000, beta = c(3, 5, 2), gamma = c(3, 3)) (m1.circmax <- circmax(y ~ x1 + x2 | x3, data = sdat)) ## Example 2: Periwinkle Dataset of Fisher and Lee, 1992: require("circular") distance <- c(107, 46, 33, 67, 122, 69, 43, 30, 12, 25, 37, 69, 5, 83, 68, 38, 21, 1, 71, 60, 71, 71, 57, 53, 38, 70, 7, 48, 7, 21, 27) directdeg <- c(67, 66, 74, 61, 58, 60, 100, 89, 171, 166, 98, 60, 197, 98, 86, 123, 165, 133, 101, 105, 71, 84, 75, 98, 83, 71, 74, 91, 38, 200, 56) cdirect <- circular(directdeg * 2 * pi/360) plot(as.numeric(cdirect) ~ distance, ylim = c(0, 4*pi), pch = 20) points(as.numeric(cdirect) + 2*pi ~ distance, pch = 20) (m2.circ <- lm.circular(type = "c-l", y = cdirect, x = distance, init = 0.0)) (m2.circmax <- circmax(cdirect ~ distance, data = data.frame(cbind(distance, cdirect))))
circmax
This function creates artifical data set for testing the regression models for a circular response by maximum likelihood estimation.
circmax_simulate(n = 1000, beta = c(3, 5, 2), gamma = c(3, 3), seed = 111)
circmax_simulate(n = 1000, beta = c(3, 5, 2), gamma = c(3, 3), seed = 111)
n |
The number of Observations. |
beta |
The coefficients for the intercept and the covariates of the location part. |
gamma |
The coefficients for the intercept and the covariates of the concentration part. |
seed |
Sets the 'seed' to a numeric value. |
Data frame with simualated covariates and respective response.
Distributional trees based on maximum-likelihood estimation of parameters for a circular response employing the von Mises distribution.
circtree(formula, data, response_range = NULL, subset, na.action = na.pass, weights, offset, cluster, control = disttree_control(...), converged = NULL, scores = NULL, doFit = TRUE, ...)
circtree(formula, data, response_range = NULL, subset, na.action = na.pass, weights, offset, cluster, control = disttree_control(...), converged = NULL, scores = NULL, doFit = TRUE, ...)
formula |
a symbolic description of the model to be fit. This
should be of type |
data |
an optional data frame containing the variables in the model. |
response_range |
either a logical value indicating whether the response should be transformed to its original range (TRUE) or kept on the interval (-pi,pi] or a two-dimensional vector specifying a range of the circular response. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain missing value. |
weights |
optional numeric vector of case weights. |
offset |
an optional vector of offset values. |
cluster |
an optional factor indicating independent clusters. Highly experimental, use at your own risk. |
control |
control arguments passed to |
converged |
an optional function for checking user-defined criteria before splits are implemented. |
scores |
an optional named list of scores to be attached to ordered factors. |
doFit |
a logical indicating if the tree shall be grown (TRUE) or not FALSE |
... |
arguments to be used to form the default |
Distributional regression trees for a circular response are an application of model-based recursive
partitioning and unbiased recursive partitioning based on the implementation in
disttree
using the infrastructure of extree_fit
.
An object of S3 class circtree
inheriting from class disttree
.
## example on parameter range: sdat.par <- circtree_simulate(response_range = c(-pi, pi)) ct.par <- circtree(y ~ x1 + x2, data = sdat.par) plot(ct.par) ## example on response range (0, 2pi): sdat.rad <- circtree_simulate(response_range = c(0, 2*pi)) ct.rad <- circtree(y ~ x1 + x2, data = sdat.rad) ## default: type = "response" plot(ct.rad, tp_args = list(response_range = FALSE)) plot(ct.rad, tp_args = list(response_range = TRUE)) plot(ct.rad, tp_args = list(response_range = c(0, 24))) ## example on response range (0, 360): sdat.deg <- circtree_simulate(response_range = c(0, 360)) ct.deg <- circtree(y ~ x1 + x2, data = sdat.deg) plot(ct.deg, tp_args = list(response_range = FALSE)) plot(ct.deg, tp_args = list(response_range = TRUE)) plot(ct.deg, tp_args = list(template = "geographics")) ## example on response range (0, 24): sdat.hour <- circtree_simulate(response_range = c(0, 24)) ct.hour <- circtree(y ~ x1 + x2, data = sdat.hour, response_range = c(0, 24)) plot(ct.hour, tp_args = list(response_range = FALSE)) plot(ct.hour, tp_args = list(template = "clock24")) plot(ct.hour, tp_args = list(template = "clock24", circlab = c("no", "mo", "mi", "ev")))
## example on parameter range: sdat.par <- circtree_simulate(response_range = c(-pi, pi)) ct.par <- circtree(y ~ x1 + x2, data = sdat.par) plot(ct.par) ## example on response range (0, 2pi): sdat.rad <- circtree_simulate(response_range = c(0, 2*pi)) ct.rad <- circtree(y ~ x1 + x2, data = sdat.rad) ## default: type = "response" plot(ct.rad, tp_args = list(response_range = FALSE)) plot(ct.rad, tp_args = list(response_range = TRUE)) plot(ct.rad, tp_args = list(response_range = c(0, 24))) ## example on response range (0, 360): sdat.deg <- circtree_simulate(response_range = c(0, 360)) ct.deg <- circtree(y ~ x1 + x2, data = sdat.deg) plot(ct.deg, tp_args = list(response_range = FALSE)) plot(ct.deg, tp_args = list(response_range = TRUE)) plot(ct.deg, tp_args = list(template = "geographics")) ## example on response range (0, 24): sdat.hour <- circtree_simulate(response_range = c(0, 24)) ct.hour <- circtree(y ~ x1 + x2, data = sdat.hour, response_range = c(0, 24)) plot(ct.hour, tp_args = list(response_range = FALSE)) plot(ct.hour, tp_args = list(template = "clock24")) plot(ct.hour, tp_args = list(template = "clock24", circlab = c("no", "mo", "mi", "ev")))
circtree
This function creates artifical data set for testing the regression trees employing a von Mises distribution.
circtree_simulate(n = 1000, mu = c(0, 2, 5), kappa = c(3, 3, 1), response_range = c(0, 2 * pi), seed = 111)
circtree_simulate(n = 1000, mu = c(0, 2, 5), kappa = c(3, 3, 1), response_range = c(0, 2 * pi), seed = 111)
n |
The number of Observations. |
mu |
The distribution parameters for the location part. Currently exactly three parameters necessary. |
kappa |
The distribution parameters for the concentration part. Currently exactly three parameters necessary. |
response_range |
Defines range of simulated response. |
seed |
Sets the 'seed' to a numeric value. |
Data frame with simualated covariates and respective response.
Methods for extracting information from fitted circmax
objects.
## S3 method for class 'circmax' coef(object, model = c("full", "location", "concentration"), ...) ## S3 method for class 'circmax' terms(x, model = c("location", "concentration", "full"), ...)
## S3 method for class 'circmax' coef(object, model = c("full", "location", "concentration"), ...) ## S3 method for class 'circmax' terms(x, model = c("location", "concentration", "full"), ...)
object , x
|
an object of class |
model |
model for which coefficients shall be returned. |
... |
further arguments passed to or from other methods. |
In addition to the methods above, a set of standard extractor functions for
"circmax"
objects is available, including methods to the generic
functions print
, logLik
,
and model.frame
. Additionally, estfun
,
vcov
provide methods for 'robust' inference.
disttree
.Exported Von Mises Family for implementation in disttree
.
dist_vonmises(useC = FALSE, ncores = 1)
dist_vonmises(useC = FALSE, ncores = 1)
useC |
logical; if TRUE C routines are used. |
ncores |
Number of cores for parallelization with openMP (No big improvements in terms of running time). |
Density function for the von Mises distribution with location parameter mu and concentration parameter kappa.
dvonmises(y, mu, kappa, log = FALSE)
dvonmises(y, mu, kappa, log = FALSE)
y |
vector of observations. |
mu |
vector of location parameters. |
kappa |
vector of concentration parameters. |
log |
logical; if TRUE, probabilities p are given as log(p) |
Von Mises Density
This function plots regression trees with a circular response based on plot.constparty
.
## S3 method for class 'circtree' plot(x, terminal_panel = node_circular, tp_args = list(), tnex = NULL, drop_terminal = NULL, ...)
## S3 method for class 'circtree' plot(x, terminal_panel = node_circular, tp_args = list(), tnex = NULL, drop_terminal = NULL, ...)
x |
Object of class |
terminal_panel |
Do not change. |
tp_args |
Do not change. |
tnex |
Do not change. |
drop_terminal |
Do not change. |
... |
Do not change. |
Obtains various types of predictions for circmax
models.
## S3 method for class 'circmax' predict(object, newdata = NULL, type = c("location", "concentration", "parameter"), na.action = na.pass, ...)
## S3 method for class 'circmax' predict(object, newdata = NULL, type = c("location", "concentration", "parameter"), na.action = na.pass, ...)
object |
an object of class |
newdata |
an optional data frame in which to look for variables which to predict. |
type |
type of prediction: |
na.action |
a function which indicates what should happen when the data
contain |
... |
further arguments passed to or from other methods. |
For type "location"
, or "scale"
a vector with
either the location or the scale of the predicted distribution.
bamlss
.Exported Von Mises Family for implementation in bamlss
.
vonmises_bamlss(...)
vonmises_bamlss(...)
... |
Not used. |