Title: | Beta Regression |
---|---|
Description: | Beta regression for modeling beta-distributed dependent variables on the open unit interval (0, 1), e.g., rates and proportions, see Cribari-Neto and Zeileis (2010) <doi:10.18637/jss.v034.i02>. Moreover, extended-support beta regression models can accommodate dependent variables with boundary observations at 0 and/or 1. For the classical beta regression model, alternative specifications are provided: Bias-corrected and bias-reduced estimation, finite mixture models, and recursive partitioning for beta regression, see Grün, Kosmidis, and Zeileis (2012) <doi:10.18637/jss.v048.i11>. |
Authors: | Achim Zeileis [aut, cre] , Francisco Cribari-Neto [aut] , Bettina Grün [aut] , Ioannis Kosmidis [aut] , Alexandre B. Simas [ctb] (earlier version by), Andrea V. Rocha [ctb] (earlier version by) |
Maintainer: | Achim Zeileis <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 3.2-2 |
Built: | 2024-11-15 02:51:40 UTC |
Source: | https://github.com/r-forge/betareg |
Density, distribution function, quantile function, and random generation for the zero- and/or one-inflated beta distribution in regression parameterization.
dbeta01(x, mu, phi, p0 = 0, p1 = 0, log = FALSE) pbeta01(q, mu, phi, p0 = 0, p1 = 0, lower.tail = TRUE, log.p = FALSE) qbeta01(p, mu, phi, p0 = 0, p1 = 0, lower.tail = TRUE, log.p = FALSE) rbeta01(n, mu, phi, p0 = 0, p1 = 0)
dbeta01(x, mu, phi, p0 = 0, p1 = 0, log = FALSE) pbeta01(q, mu, phi, p0 = 0, p1 = 0, lower.tail = TRUE, log.p = FALSE) qbeta01(p, mu, phi, p0 = 0, p1 = 0, lower.tail = TRUE, log.p = FALSE) rbeta01(n, mu, phi, p0 = 0, p1 = 0)
x , q
|
numeric. Vector of quantiles. |
p |
numeric. Vector of probabilities. |
n |
numeric. Number of observations. If |
mu |
numeric. The mean of the beta distribution (on the open unit interval). |
phi |
numeric. The precision parameter of the beta distribution. |
p0 |
numeric. The probability for an observation of zero (often referred to as zero inflation). |
p1 |
numeric. The probability for an observation of one (often referred to as one inflation). |
log , log.p
|
logical. If TRUE, probabilities p are given as log(p). |
lower.tail |
logical. If TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
The zero- and/or one-inflated beta distribution is obtained by adding point masses at zero and/or one to a standard beta distribution.
Note that the support of the standard beta distribution is the open unit interval where values of exactly zero or one cannot occur. Thus, the inflation jargon is rather misleading as there is no probability that could be inflated. It is rather a hurdle or two-part (or three-part) model.
dbeta01
gives the density, pbeta01
gives the distribution
function, qbeta01
gives the quantile function, and rbeta01
generates random deviates.
Class and methods for zero- and/or one-inflated beta distributions in regression specification using the workflow from the distributions3 package.
Beta01(mu, phi, p0 = 0, p1 = 0)
Beta01(mu, phi, p0 = 0, p1 = 0)
mu |
numeric. The mean of the beta distribution (on the open unit interval). |
phi |
numeric. The precision parameter of the beta distribution. |
p0 |
numeric. The probability for an observation of zero (often referred to as zero inflation). |
p1 |
numeric. The probability for an observation of one (often referred to as one inflation). |
The zero- and/or one-inflated beta distribution is obtained by adding point masses at zero and/or one to a standard beta distribution.
Note that the support of the standard beta distribution is the open unit interval where values of exactly zero or one cannot occur. Thus, the inflation jargon is rather misleading as there is no probability that could be inflated. It is rather a hurdle or two-part (or three-part) model.
A Beta01
distribution object.
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- Beta01( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2), p0 = c(0.1, 0, 0), p1 = c(0, 0, 0.3) ) X ## compute moments of the distribution mean(X) variance(X) ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## point mass probabilities (if any) on boundary cdf(X, 0, lower.tail = TRUE) cdf(X, 1, lower.tail = FALSE) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- Beta01( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2), p0 = c(0.1, 0, 0), p1 = c(0, 0, 0.3) ) X ## compute moments of the distribution mean(X) variance(X) ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## point mass probabilities (if any) on boundary cdf(X, 0, lower.tail = TRUE) cdf(X, 1, lower.tail = FALSE) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
Density, distribution function, quantile function, and random generation for the 4-parameter beta distribution in regression parameterization.
dbeta4(x, mu, phi, theta1 = 0, theta2 = 1 - theta1, log = FALSE) pbeta4(q, mu, phi, theta1 = 0, theta2 = 1 - theta1, lower.tail = TRUE, log.p = FALSE) qbeta4(p, mu, phi, theta1 = 0, theta2 = 1 - theta1, lower.tail = TRUE, log.p = FALSE) rbeta4(n, mu, phi, theta1 = 0, theta2 = 1 - theta1)
dbeta4(x, mu, phi, theta1 = 0, theta2 = 1 - theta1, log = FALSE) pbeta4(q, mu, phi, theta1 = 0, theta2 = 1 - theta1, lower.tail = TRUE, log.p = FALSE) qbeta4(p, mu, phi, theta1 = 0, theta2 = 1 - theta1, lower.tail = TRUE, log.p = FALSE) rbeta4(n, mu, phi, theta1 = 0, theta2 = 1 - theta1)
x , q
|
numeric. Vector of quantiles. |
p |
numeric. Vector of probabilities. |
n |
numeric. Number of observations. If |
mu |
numeric. The mean of the beta distribution that is extended to support [theta1, theta2]. |
phi |
numeric. The precision parameter of the beta distribution that is extended to support [theta1, theta2]. |
theta1 , theta2
|
numeric. The minimum and maximum, respectively,
of the 4-parameter beta distribution. By default a symmetric support is
chosen by |
log , log.p
|
logical. If TRUE, probabilities p are given as log(p). |
lower.tail |
logical. If TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
The distribution is obtained by a linear transformation of a beta-distributed
random variable with intercept theta1
and slope theta2 - theta1
.
dbeta4
gives the density, pbeta4
gives the distribution
function, qbeta4
gives the quantile function, and rbeta4
generates random deviates.
Class and methods for 4-parameter beta distributions in regression specification using the workflow from the distributions3 package.
Beta4(mu, phi, theta1 = 0, theta2 = 1 - theta1)
Beta4(mu, phi, theta1 = 0, theta2 = 1 - theta1)
mu |
numeric. The mean of the beta distribution that is extended to support [theta1, theta2]. |
phi |
numeric. The precision parameter of the beta distribution that is extended to support [theta1, theta2]. |
theta1 , theta2
|
numeric. The minimum and maximum, respectively,
of the 4-parameter beta distribution. By default a symmetric support is
chosen by |
The distribution is obtained by a linear transformation of a beta-distributed
random variable with intercept theta1
and slope theta2 - theta1
.
A Beta4
distribution object.
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- Beta4( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2), theta1 = c(0, -0.1, -0.1), theta2 = c(1, 1.1, 1.5) ) X ## compute moments of the distribution mean(X) variance(X) ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- Beta4( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2), theta1 = c(0, -0.1, -0.1), theta2 = c(1, 1.1, 1.5) ) X ## compute moments of the distribution mean(X) variance(X) ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
Fit finite mixtures of beta regression models for rates and proportions via maximum likelihood with the EM algorithm using a parametrization with mean (depending through a link function on the covariates) and precision parameter (called phi).
betamix(formula, data, k, subset, na.action, weights, offset, link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = "log", control = betareg.control(...), cluster = NULL, FLXconcomitant = NULL, FLXcontrol = list(), verbose = FALSE, nstart = if (is.null(cluster)) 3 else 1, which = "BIC", ID, fixed, extra_components, ...) extraComponent(type = c("uniform", "betareg"), coef, delta, link = "logit", link.phi = "log")
betamix(formula, data, k, subset, na.action, weights, offset, link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = "log", control = betareg.control(...), cluster = NULL, FLXconcomitant = NULL, FLXcontrol = list(), verbose = FALSE, nstart = if (is.null(cluster)) 3 else 1, which = "BIC", ID, fixed, extra_components, ...) extraComponent(type = c("uniform", "betareg"), coef, delta, link = "logit", link.phi = "log")
formula |
symbolic description of the model (of type |
data , subset , na.action
|
arguments controlling formula processing
via |
weights |
optional numeric vector of integer case weights. |
offset |
optional numeric vector with an a priori known component to be included in the linear predictor for the mean. |
k |
a vector of integers indicating the number of components of
the finite mixture; passed in turn to the |
link |
character specification of the link function in
the mean model (mu). Currently, |
link.phi |
character specification of the link function in
the precision model (phi). Currently, |
control |
a list of control arguments specified via
|
cluster |
Either a matrix with |
FLXconcomitant |
concomitant variable model; object of class
|
FLXcontrol |
object of class |
verbose |
a logical; if |
nstart |
for each value of |
which |
number of model to get if |
ID |
grouping variable indicating if observations are from the same individual, i.e. the component membership is restricted to be the same for these observations. |
fixed |
symbolic description of the model for the parameters
fixed over components (of type |
extra_components |
a list containing objects returned by
|
... |
arguments passed to |
type |
specifies if the component follows a uniform distribution or a beta regression model. |
coef |
a vector with the coefficients to determine the midpoint of the uniform distribution or names list with the coefficients for the mean and precision of the beta regression model. |
delta |
numeric; half-length of the interval of the uniform distribution. |
The arguments and the model specification are similar to
betareg
. Internally stepFlexmix
is called with suitable arguments to fit the finite mixture model with
the EM algorithm. See Grün et al. (2012) for more details.
extra_components
is a list where each element corresponds to a
component where the parameters are fixed a-priori.
An object of class "flexmix"
containing the best model with
respect to the log likelihood or the one selected according to
which
if k
is a vector of integers longer than 1.
Bettina Grün and Achim Zeileis
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Grün B, Kosmidis I, Zeileis A (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1–25. doi:10.18637/jss.v048.i11
Grün B, Leisch F (2008). FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters. Journal of Statistical Software, 28(4), 1–35. doi:10.18637/jss.v028.i04
Leisch F (2004). FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R. Journal of Statistical Software, 11(8), 1–18. doi:10.18637/jss.v011.i08
options(digits = 4) ## data with two groups of dyslexic and non-dyslexic children data("ReadingSkills", package = "betareg") suppressWarnings(RNGversion("3.5.0")) set.seed(4040) ## try to capture accuracy ~ iq relationship (without using dyslexia ## information) using two beta regression components and one additional ## extra component for a perfect reading score rs_mix <- betamix(accuracy ~ iq, data = ReadingSkills, k = 3, nstart = 10, extra_components = extraComponent(type = "uniform", coef = 0.99, delta = 0.01)) ## visualize result ## intensities based on posterior probabilities prob <- 2 * (posterior(rs_mix)[cbind(1:nrow(ReadingSkills), clusters(rs_mix))] - 0.5) ## associated HCL colors col0 <- hcl(c(260, 0, 130), 65, 45, fixup = FALSE) col1 <- col0[clusters(rs_mix)] col2 <- hcl(c(260, 0, 130)[clusters(rs_mix)], 65 * abs(prob)^1.5, 95 - 50 * abs(prob)^1.5, fixup = FALSE) ## scatter plot plot(accuracy ~ iq, data = ReadingSkills, col = col2, pch = 19, cex = 1.5, xlim = c(-2, 2)) points(accuracy ~ iq, data = ReadingSkills, cex = 1.5, pch = 1, col = col1) ## fitted lines iq <- -30:30/10 cf <- rbind(coef(rs_mix, model = "mean", component = 1:2), c(qlogis(0.99), 0)) for(i in 1:3) lines(iq, plogis(cf[i, 1] + cf[i, 2] * iq), lwd = 2, col = col0[i]) ## refit the model including a concomitant variable model using the ## dyslexia information with some noise to avoid complete separation ## between concomitant variable and component memberships set.seed(4040) w <- rnorm(nrow(ReadingSkills), c(-1, 1)[as.integer(ReadingSkills$dyslexia)]) ## The argument FLXconcomitant can be omitted when specifying ## the model via a three part formula given by ## accuracy ~ iq | 1 | w ## The posteriors from the previously fitted model are used ## for initialization. library("flexmix") rs_mix2 <- betamix(accuracy ~ iq, data = ReadingSkills, extra_components = extraComponent(type = "uniform", coef = 0.99, delta = 0.01), cluster = posterior(rs_mix), FLXconcomitant = FLXPmultinom(~w)) coef(rs_mix2, which = "concomitant") summary(rs_mix2, which = "concomitant")
options(digits = 4) ## data with two groups of dyslexic and non-dyslexic children data("ReadingSkills", package = "betareg") suppressWarnings(RNGversion("3.5.0")) set.seed(4040) ## try to capture accuracy ~ iq relationship (without using dyslexia ## information) using two beta regression components and one additional ## extra component for a perfect reading score rs_mix <- betamix(accuracy ~ iq, data = ReadingSkills, k = 3, nstart = 10, extra_components = extraComponent(type = "uniform", coef = 0.99, delta = 0.01)) ## visualize result ## intensities based on posterior probabilities prob <- 2 * (posterior(rs_mix)[cbind(1:nrow(ReadingSkills), clusters(rs_mix))] - 0.5) ## associated HCL colors col0 <- hcl(c(260, 0, 130), 65, 45, fixup = FALSE) col1 <- col0[clusters(rs_mix)] col2 <- hcl(c(260, 0, 130)[clusters(rs_mix)], 65 * abs(prob)^1.5, 95 - 50 * abs(prob)^1.5, fixup = FALSE) ## scatter plot plot(accuracy ~ iq, data = ReadingSkills, col = col2, pch = 19, cex = 1.5, xlim = c(-2, 2)) points(accuracy ~ iq, data = ReadingSkills, cex = 1.5, pch = 1, col = col1) ## fitted lines iq <- -30:30/10 cf <- rbind(coef(rs_mix, model = "mean", component = 1:2), c(qlogis(0.99), 0)) for(i in 1:3) lines(iq, plogis(cf[i, 1] + cf[i, 2] * iq), lwd = 2, col = col0[i]) ## refit the model including a concomitant variable model using the ## dyslexia information with some noise to avoid complete separation ## between concomitant variable and component memberships set.seed(4040) w <- rnorm(nrow(ReadingSkills), c(-1, 1)[as.integer(ReadingSkills$dyslexia)]) ## The argument FLXconcomitant can be omitted when specifying ## the model via a three part formula given by ## accuracy ~ iq | 1 | w ## The posteriors from the previously fitted model are used ## for initialization. library("flexmix") rs_mix2 <- betamix(accuracy ~ iq, data = ReadingSkills, extra_components = extraComponent(type = "uniform", coef = 0.99, delta = 0.01), cluster = posterior(rs_mix), FLXconcomitant = FLXPmultinom(~w)) coef(rs_mix2, which = "concomitant") summary(rs_mix2, which = "concomitant")
Density, distribution function, quantile function, and random generation for the beta distribution in regression parameterization.
dbetar(x, mu, phi, log = FALSE) pbetar(q, mu, phi, lower.tail = TRUE, log.p = FALSE) qbetar(p, mu, phi, lower.tail = TRUE, log.p = FALSE) rbetar(n, mu, phi)
dbetar(x, mu, phi, log = FALSE) pbetar(q, mu, phi, lower.tail = TRUE, log.p = FALSE) qbetar(p, mu, phi, lower.tail = TRUE, log.p = FALSE) rbetar(n, mu, phi)
x , q
|
numeric. Vector of quantiles. |
p |
numeric. Vector of probabilities. |
n |
numeric. Number of observations. If |
mu |
numeric. The mean of the beta distribution. |
phi |
numeric. The precision parameter of the beta distribution. |
log , log.p
|
logical. If TRUE, probabilities p are given as log(p). |
lower.tail |
logical. If TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
This is the reparameterization of the beta distribution with mean mu
and precision phi
, as employed in beta regression. The classic
parameterization of the beta distribution is obtained by setting
shape1 = mu * phi
and shape2 = (1 - mu) * phi
,
respectively.
dbetar
gives the density, pbetar
gives the distribution
function, qbetar
gives the quantile function, and rbetar
generates random deviates.
Class and methods for beta distributions in regression specification using the workflow from the distributions3 package.
BetaR(mu, phi)
BetaR(mu, phi)
mu |
numeric. The mean of the beta distribution. |
phi |
numeric. The precision parameter of the beta distribution. |
Alternative parameterization of the classic beta distribution in
terms of its mean mu
and precision parameter phi
.
Thus, the distribution provided by BetaR
is equivalent to
the Beta
distribution with parameters
alpha = mu * phi
and beta = (1 - mu) * phi
.
A BetaR
distribution object.
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- BetaR( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2) ) X ## compute moments of the distribution mean(X) variance(X) skewness(X) kurtosis(X) ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- BetaR( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2) ) X ## compute moments of the distribution mean(X) variance(X) skewness(X) kurtosis(X) ## support interval (minimum and maximum) support(X) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
Family objects for distributional regression with the (extended-support) beta distribution via bamlss or gamlss2.
betar_family(link = "logit", link.phi = "log", ...) xbetax_family(link = "logit", link.phi = "log", link.nu = "log", quad = 20, tol = .Machine$double.eps^0.7, ...)
betar_family(link = "logit", link.phi = "log", ...) xbetax_family(link = "logit", link.phi = "log", link.nu = "log", quad = 20, tol = .Machine$double.eps^0.7, ...)
link |
character specification of the link function in
the mean model (mu). Currently, |
link.phi |
character specification of the link function in
the precision model (phi). Currently, |
link.nu |
character specification of the link function in
the exceedence model (nu). Currently, |
quad |
numeric. The number of quadrature points for numeric
integration of the continuous mixture in |
tol |
numeric. Accuracy (convergence tolerance) for numerically
determining quantiles based on |
... |
Arguments passed to functions that are called within the family object. |
Family objects for bamlss (Umlauf et al. 2019, 2021) and gamlss2
(Umlauf et al. 2024) are essentially lists of functions providing a standardized
interface to the d/p/q/r functions of distributions. Hence, betar_family
interfaces the classical beta distribution in regression specification, see
dbetar
. Analogously, xbetax_family
interfaces the
extended-support beta mixture specification (Kosmidis and Zeileis 2024),
see dxbetax
.
A list of class family.bamlss
.
Kosmidis I, Zeileis A (2024). Extended-Support Beta Regression for [0, 1] Responses. 2409.07233, arXiv.org E-Print Archive. doi:10.48550/arXiv.2409.07233
Umlauf N, Klein N, Zeileis A (2019). BAMLSS: Bayesian Additive Models for Location, Scale and Shape (and Beyond). Journal of Computational and Graphical Statistics, 27(3), 612–627. doi:10.1080/10618600.2017.1407325
Umlauf N, Klein N, Simon T, Zeileis A (2021). bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond). Journal of Statistical Software, 100(4), 1–53. doi:10.18637/jss.v100.i04
Umlauf N, Stasinopoulos M, Rigby R, Stauffer R (2024). gamlss2: Infrastructure for Flexible Distributional Regression. R package version 0.1-0. https://gamlss-dev.github.io/gamlss2/
dbetar
, dxbetax
, family.bamlss
## package and data library("betareg") library("bamlss") data("ReadingSkills", package = "betareg") ## classical beta regression via ML rs1 <- betareg(accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills) ## IGNORE_RDIFF_BEGIN ## Bayesian additive model (with low number of iterations to speed up the example) set.seed(0) rs2 <- bamlss(accuracy ~ s(iq, by = dyslexia) | dyslexia + iq, data = ReadingSkills, family = betar_family(), eps = 1e-7, n.iter = 400, burnin = 100) ## Bayesian model shrinks the effects compared to ML plot(accuracy ~ iq, data = ReadingSkills, pch = 19, col = dyslexia) nd <- data.frame( iq = rep(-19:20/10, 2), dyslexia = factor(rep(c("no", "yes"), each = 40), levels = c("no", "yes")) ) nd$betareg <- predict(rs1, newdata = nd, type = "response") nd$bamlss <- predict(rs2, newdata = nd, type = "parameter", model = "mu") lines(betareg ~ iq, data = nd, subset = dyslexia == "no", col = 1, lwd = 2, lty = 1) lines(betareg ~ iq, data = nd, subset = dyslexia == "yes", col = 2, lwd = 2, lty = 1) lines(bamlss ~ iq, data = nd, subset = dyslexia == "no", col = 1, lwd = 2, lty = 2) lines(bamlss ~ iq, data = nd, subset = dyslexia == "yes", col = 2, lwd = 2, lty = 2) legend("topleft", c("Dyslexia: no", "Dyslexia: yes", "betareg", "bamlss"), lty = c(0, 0, 1, 2), pch = c(19, 19, NA, NA), col = c(1, 2, 1, 1), bty = "n") ## IGNORE_RDIFF_END ## xbetax_family(): requires more time due to Gaussian quadrature ## for gamlss2: install.packages("gamlss2", repos = "https://gamlss-dev.R-universe.dev")
## package and data library("betareg") library("bamlss") data("ReadingSkills", package = "betareg") ## classical beta regression via ML rs1 <- betareg(accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills) ## IGNORE_RDIFF_BEGIN ## Bayesian additive model (with low number of iterations to speed up the example) set.seed(0) rs2 <- bamlss(accuracy ~ s(iq, by = dyslexia) | dyslexia + iq, data = ReadingSkills, family = betar_family(), eps = 1e-7, n.iter = 400, burnin = 100) ## Bayesian model shrinks the effects compared to ML plot(accuracy ~ iq, data = ReadingSkills, pch = 19, col = dyslexia) nd <- data.frame( iq = rep(-19:20/10, 2), dyslexia = factor(rep(c("no", "yes"), each = 40), levels = c("no", "yes")) ) nd$betareg <- predict(rs1, newdata = nd, type = "response") nd$bamlss <- predict(rs2, newdata = nd, type = "parameter", model = "mu") lines(betareg ~ iq, data = nd, subset = dyslexia == "no", col = 1, lwd = 2, lty = 1) lines(betareg ~ iq, data = nd, subset = dyslexia == "yes", col = 2, lwd = 2, lty = 1) lines(bamlss ~ iq, data = nd, subset = dyslexia == "no", col = 1, lwd = 2, lty = 2) lines(bamlss ~ iq, data = nd, subset = dyslexia == "yes", col = 2, lwd = 2, lty = 2) legend("topleft", c("Dyslexia: no", "Dyslexia: yes", "betareg", "bamlss"), lty = c(0, 0, 1, 2), pch = c(19, 19, NA, NA), col = c(1, 2, 1, 1), bty = "n") ## IGNORE_RDIFF_END ## xbetax_family(): requires more time due to Gaussian quadrature ## for gamlss2: install.packages("gamlss2", repos = "https://gamlss-dev.R-universe.dev")
Fit beta regression models for rates and proportions via maximum likelihood using a parametrization with mean (depending through a link function on the covariates) and precision parameter (called phi).
betareg(formula, data, subset, na.action, weights, offset, link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = NULL, type = c("ML", "BC", "BR"), dist = NULL, nu = NULL, control = betareg.control(...), model = TRUE, y = TRUE, x = FALSE, ...) betareg.fit(x, y, z = NULL, weights = NULL, offset = NULL, link = "logit", link.phi = "log", type = "ML", control = betareg.control(), dist = NULL, nu = NULL)
betareg(formula, data, subset, na.action, weights, offset, link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = NULL, type = c("ML", "BC", "BR"), dist = NULL, nu = NULL, control = betareg.control(...), model = TRUE, y = TRUE, x = FALSE, ...) betareg.fit(x, y, z = NULL, weights = NULL, offset = NULL, link = "logit", link.phi = "log", type = "ML", control = betareg.control(), dist = NULL, nu = NULL)
formula |
symbolic description of the model, either of type |
data , subset , na.action
|
arguments controlling formula processing
via |
weights |
optional numeric vector of case weights. |
offset |
optional numeric vector with an a priori known component to be
included in the linear predictor for the mean. In |
link |
character specification of the link function in
the mean model (mu). Currently, |
link.phi |
character specification of the link function in
the precision model (phi). Currently, |
type |
character specification of the type of estimator. Currently,
maximum likelihood ( |
dist |
character specification of the response distribution.
Usually, this does not have to be set by the user because by default
the classical |
nu |
numeric. The fixed value of the expected exceedence parameter |
control |
a list of control arguments specified via
|
model , y , x
|
logicals. If |
z |
numeric matrix. Regressor matrix for the precision model, defaulting to an intercept only. |
... |
arguments passed to |
Beta regression as suggested by Ferrari and Cribari-Neto (2004) and extended
by Simas, Barreto-Souza, and Rocha (2010) is implemented in betareg
.
It is useful in situations where the dependent variable is continuous and restricted to
the unit interval (0, 1), e.g., resulting from rates or proportions. It is modeled to be
beta-distributed with parametrization using mean and precision parameter (called mu and
phi, respectively). The mean mu is linked, as in generalized linear models (GLMs), to the
explanatory variables through a link function and a linear predictor. Additionally, the
precision parameter phi can be linked to another (potentially overlapping) set of
regressors through a second link function, resulting in a model with variable dispersion
(see Cribari-Neto and Zeileis 2010).
Estimation is performed by default using maximum likelihood (ML) via optim
with
analytical gradients and starting values from an auxiliary linear regression
of the transformed response. Subsequently, the optim
result may be enhanced
by an additional Fisher scoring iteration using analytical gradients and expected information.
Alternative estimation methods are bias-corrected (BC) or bias-reduced (BR)
maximum likelihood (see Grün, Kosmidis, and Zeileis 2012). For ML and BC the Fisher
scoring is just a refinement to move the gradients even closer to zero and can be
disabled by setting fsmaxit = 0
in the control arguments. For BR the Fisher scoring
is needed to solve the bias-adjusted estimating equations.
In the beta regression as introduced by Ferrari and Cribari-Neto (2004), the mean of
the response is linked to a linear predictor described by y ~ x1 + x2
using
a link
function while the precision parameter phi is assumed to be
constant. Simas et al. (2009) suggest to extend this model by linking phi to an
additional set of regressors (z1 + z2
, say): In betareg
this can be
specified in a formula of type y ~ x1 + x2 | z1 + z2
where the regressors
in the two parts can be overlapping. In the precision model (for phi), the link
function link.phi
is used. The default is a "log"
link unless no
precision model is specified. In the latter case (i.e., when the formula is of type
y ~ x1 + x2
), the "identity"
link is used by default for backward
compatibility.
Kosmidis and Zeileis (2024) introduce a generalization of the classic beta regression
model with extended support [0, 1].
Specifically, the extended-support beta distribution ("xbeta"
) leverages an underlying
symmetric four-parameter beta distribution with exceedence parameter nu
to obtain support [-nu, 1 + nu] that is subsequently censored to [0, 1] in order
to obtain point masses at the boundary values 0 and 1. The extended-support
beta mixture distribution ("xbetax"
) is a continuous mixture of extended-support
beta distributions where the exceedence parameter follows an exponential distribution
with mean nu (rather than a fixed value of nu). The latter "xbetax"
specification is used by default in case of boundary observations at 0 and/or 1.
The "xbeta"
specification with fixed nu is mostly for testing and
debugging purposes.
A set of standard extractor functions for fitted model objects is available for
objects of class "betareg"
, including methods to the generic functions
print
, summary
, plot
, coef
,
vcov
, logLik
, residuals
,
predict
, terms
,
model.frame
, model.matrix
,
cooks.distance
and hatvalues
(see influence.measures
),
gleverage
(new generic), estfun
and
bread
(from the sandwich package), and
coeftest
(from the lmtest package).
See predict.betareg
, residuals.betareg
, plot.betareg
,
and summary.betareg
for more details on all methods.
The main parameters of interest are the coefficients in the linear predictor of the mean
model. The additional parameters in the precision model (phi) can either
be treated as full model parameters (default) or as nuisance parameters. In the latter case
the estimation does not change, only the reported information in output from print
,
summary
, or coef
(among others) will be different. See also betareg.control
.
The implemented algorithms for bias correction/reduction follow Kosmidis and Firth (2010).
Technical note: In case, either bias correction or reduction is requested,
the second derivative of the inverse link function is required for link
and
link.phi
. If the two links are specified by their names (as done by default
in betareg
), then the "link-glm"
objects are enhanced automatically
by the required additional d2mu.deta
function. However, if a "link-glm"
object is supplied directly by the user, it needs to have the d2mu.deta
function or, for backward compatibility, dmu.deta
.
The original version of the package was written by Alexandre B. Simas and Andrea V. Rocha (up to version 1.2). Starting from version 2.0-0 the code was rewritten by Achim Zeileis.
betareg
returns an object of class "betareg"
, i.e., a list with components as follows.
For classic beta regressions (dist = "beta"
) several elements are lists with the names "mean"
and "precision"
for the information from the respective submodels. For extended-support
beta regressions (dist = "xbetax"
or "xbeta"
), the corresponding names are
"mu"
and "phi"
because they are not exactly the mean and precision due to the
censoring in the response variable.
betareg.fit
returns an unclassed list with components up to converged
.
coefficients |
a list with elements |
residuals |
a vector of raw residuals (observed - fitted), |
fitted.values |
a vector of fitted means, |
optim |
output from the |
method |
the method argument passed to the |
control |
the control arguments passed to the |
start |
the starting values for the parameters passed to the |
weights |
the weights used (if any), |
offset |
a list of offset vectors used (if any), |
n |
number of observations, |
nobs |
number of observations with non-zero weights, |
df.null |
residual degrees of freedom in the null model (constant mean and dispersion),
i.e., |
df.residual |
residual degrees of freedom in the fitted model, |
phi |
logical indicating whether the precision (phi) coefficients will be
treated as full model parameters or nuisance parameters in subsequent calls to
|
loglik |
log-likelihood of the fitted model, |
vcov |
covariance matrix of all parameters in the model, |
pseudo.r.squared |
pseudo R-squared value (squared correlation of linear predictor and link-transformed response), |
link |
a list with elements |
converged |
logical indicating successful convergence of |
call |
the original function call, |
formula |
the original formula, |
terms |
a list with elements |
levels |
a list with elements |
contrasts |
a list with elements |
model |
the full model frame (if |
y |
the response proportion vector (if |
x |
a list with elements |
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Ferrari SLP, Cribari-Neto F (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815.
Grün B, Kosmidis I, Zeileis A (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1–25. doi:10.18637/jss.v048.i11
Kosmidis I, Firth D (2010). A Generic Algorithm for Reducing Bias in Parametric Estimation. Electronic Journal of Statistics, 4, 1097–1112.
Kosmidis I, Zeileis A (2024). Extended-Support Beta Regression for [0, 1] Responses. 2409.07233, arXiv.org E-Print Archive. doi:10.48550/arXiv.2409.07233
Simas AB, Barreto-Souza W, Rocha AV (2010). Improved Estimators for a General Class of Beta Regression Models. Computational Statistics & Data Analysis, 54(2), 348–366.
summary.betareg
, predict.betareg
, residuals.betareg
,
Formula
options(digits = 4) ## Section 4 from Ferrari and Cribari-Neto (2004) data("GasolineYield", package = "betareg") data("FoodExpenditure", package = "betareg") ## Table 1 gy <- betareg(yield ~ batch + temp, data = GasolineYield) summary(gy) ## Table 2 fe_lin <- lm(I(food/income) ~ income + persons, data = FoodExpenditure) library("lmtest") bptest(fe_lin) fe_beta <- betareg(I(food/income) ~ income + persons, data = FoodExpenditure) summary(fe_beta) ## nested model comparisons via Wald and LR tests fe_beta2 <- betareg(I(food/income) ~ income, data = FoodExpenditure) lrtest(fe_beta, fe_beta2) waldtest(fe_beta, fe_beta2) ## Section 3 from online supplements to Simas et al. (2010) ## mean model as in gy above ## precision model with regressor temp gy2 <- betareg(yield ~ batch + temp | temp, data = GasolineYield) ## MLE column in Table 19 summary(gy2) ## LRT row in Table 18 lrtest(gy, gy2)
options(digits = 4) ## Section 4 from Ferrari and Cribari-Neto (2004) data("GasolineYield", package = "betareg") data("FoodExpenditure", package = "betareg") ## Table 1 gy <- betareg(yield ~ batch + temp, data = GasolineYield) summary(gy) ## Table 2 fe_lin <- lm(I(food/income) ~ income + persons, data = FoodExpenditure) library("lmtest") bptest(fe_lin) fe_beta <- betareg(I(food/income) ~ income + persons, data = FoodExpenditure) summary(fe_beta) ## nested model comparisons via Wald and LR tests fe_beta2 <- betareg(I(food/income) ~ income, data = FoodExpenditure) lrtest(fe_beta, fe_beta2) waldtest(fe_beta, fe_beta2) ## Section 3 from online supplements to Simas et al. (2010) ## mean model as in gy above ## precision model with regressor temp gy2 <- betareg(yield ~ batch + temp | temp, data = GasolineYield) ## MLE column in Table 19 summary(gy2) ## LRT row in Table 18 lrtest(gy, gy2)
Various parameters that control fitting of beta regression models
using betareg
.
betareg.control(phi = TRUE, method = "BFGS", maxit = 5000, gradient = NULL, hessian = FALSE, trace = FALSE, start = NULL, fsmaxit = 200, fstol = 1e-8, quad = 20, ...)
betareg.control(phi = TRUE, method = "BFGS", maxit = 5000, gradient = NULL, hessian = FALSE, trace = FALSE, start = NULL, fsmaxit = 200, fstol = 1e-8, quad = 20, ...)
phi |
logical indicating whether the precision parameter
phi should be treated as a full model parameter ( |
method |
characters string specifying the |
maxit |
integer specifying the |
trace |
logical or integer controlling whether tracing information on
the progress of the optimization should be produced (passed to |
gradient |
logical. Should the analytical gradient be used for optimizing
the log-likelihood? If set to |
hessian |
logical. Should the numerical Hessian matrix from the |
start |
an optional vector with starting values for all parameters (including phi). |
fsmaxit |
integer specifying maximal number of additional (quasi) Fisher scoring iterations. For details see below. |
fstol |
numeric tolerance for convergence in (quasi) Fisher scoring. For details see below. |
quad |
numeric. The number of quadrature points for numeric
integration in case of |
... |
arguments passed to |
All parameters in betareg
are estimated by maximum likelihood
using optim
with control options set in betareg.control
.
Most arguments are passed on directly to optim
, and start
controls
how optim
is called.
After the optim
maximization, an additional (quasi) Fisher scoring
can be perfomed to further enhance the result or to perform additional bias reduction.
If fsmaxit
is greater than zero, this additional optimization is
performed and it converges if the threshold fstol
is attained
for the cross-product of the step size.
Starting values can be supplied via start
or estimated by
lm.wfit
, using the link-transformed response.
Covariances are in general derived analytically. Only if type = "ML"
and
hessian = TRUE
, they are determined numerically using the Hessian matrix
returned by optim
. In the latter case no Fisher scoring iterations are
performed.
The main parameters of interest are the coefficients in the linear predictor of the
model and the additional precision parameter phi which can either
be treated as a full model parameter (default) or as a nuisance parameter. In the latter case
the estimation does not change, only the reported information in output from print
,
summary
, or coef
(among others) will be different. See also examples.
A list with the arguments specified.
options(digits = 4) data("GasolineYield", package = "betareg") ## regression with phi as full model parameter gy1 <- betareg(yield ~ batch + temp, data = GasolineYield) gy1 ## regression with phi as nuisance parameter gy2 <- betareg(yield ~ batch + temp, data = GasolineYield, phi = FALSE) gy2 ## compare reported output coef(gy1) coef(gy2) summary(gy1) summary(gy2)
options(digits = 4) data("GasolineYield", package = "betareg") ## regression with phi as full model parameter gy1 <- betareg(yield ~ batch + temp, data = GasolineYield) gy1 ## regression with phi as nuisance parameter gy2 <- betareg(yield ~ batch + temp, data = GasolineYield, phi = FALSE) gy2 ## compare reported output coef(gy1) coef(gy2) summary(gy1) summary(gy2)
Fit beta regression trees via model-based recursive partitioning.
betatree(formula, partition, data, subset = NULL, na.action = na.omit, weights, offset, cluster, link = "logit", link.phi = "log", control = betareg.control(), ...)
betatree(formula, partition, data, subset = NULL, na.action = na.omit, weights, offset, cluster, link = "logit", link.phi = "log", control = betareg.control(), ...)
formula |
symbolic description of the model of type |
partition |
symbolic description of the partitioning variables,
e.g., |
data , subset , na.action , weights , offset , cluster
|
arguments controlling
data/model processing passed to |
link |
character specification of the link function in
the mean model (mu). Currently, |
link.phi |
character specification of the link function in
the precision model (phi). Currently, |
control |
a list of control arguments for the beta regression specified via
|
... |
further control arguments for the recursive partitioning
passed to |
Beta regression trees are an application of model-based recursive partitioning
(implemented in mob
, see Zeileis et al. 2008) to
beta regression (implemented in betareg
, see Cribari-Neto
and Zeileis 2010). See also Grün at al. (2012) for more details.
Various methods are provided for "betatree"
objects, most of them
inherit their behavior from "mob"
objects (e.g., print
, summary
,
coef
, etc.). The plot
method employs the node_bivplot
panel-generating function.
betatree()
returns an object of S3 class "betatree"
which
inherits from "modelparty"
.
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Grün B, Kosmidis I, Zeileis A (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1–25. doi:10.18637/jss.v048.i11
Zeileis A, Hothorn T, Hornik K (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514.
options(digits = 4) suppressWarnings(RNGversion("3.5.0")) ## data with two groups of dyslexic and non-dyslexic children data("ReadingSkills", package = "betareg") ## additional random noise (not associated with reading scores) set.seed(1071) ReadingSkills$x1 <- rnorm(nrow(ReadingSkills)) ReadingSkills$x2 <- runif(nrow(ReadingSkills)) ReadingSkills$x3 <- factor(rnorm(nrow(ReadingSkills)) > 0) ## fit beta regression tree: in each node ## - accurcay's mean and precision depends on iq ## - partitioning is done by dyslexia and the noise variables x1, x2, x3 ## only dyslexia is correctly selected for splitting bt <- betatree(accuracy ~ iq | iq, ~ dyslexia + x1 + x2 + x3, data = ReadingSkills, minsize = 10) plot(bt) ## inspect result coef(bt) if(require("strucchange")) sctest(bt) ## IGNORE_RDIFF_BEGIN summary(bt, node = 2) summary(bt, node = 3) ## IGNORE_RDIFF_END ## add a numerical variable with relevant information for splitting ReadingSkills$x4 <- rnorm(nrow(ReadingSkills), c(-1.5, 1.5)[ReadingSkills$dyslexia]) bt2 <- betatree(accuracy ~ iq | iq, ~ x1 + x2 + x3 + x4, data = ReadingSkills, minsize = 10) plot(bt2) ## inspect result coef(bt2) if(require("strucchange")) sctest(bt2) ## IGNORE_RDIFF_BEGIN summary(bt2, node = 2) summary(bt2, node = 3) ## IGNORE_RDIFF_END
options(digits = 4) suppressWarnings(RNGversion("3.5.0")) ## data with two groups of dyslexic and non-dyslexic children data("ReadingSkills", package = "betareg") ## additional random noise (not associated with reading scores) set.seed(1071) ReadingSkills$x1 <- rnorm(nrow(ReadingSkills)) ReadingSkills$x2 <- runif(nrow(ReadingSkills)) ReadingSkills$x3 <- factor(rnorm(nrow(ReadingSkills)) > 0) ## fit beta regression tree: in each node ## - accurcay's mean and precision depends on iq ## - partitioning is done by dyslexia and the noise variables x1, x2, x3 ## only dyslexia is correctly selected for splitting bt <- betatree(accuracy ~ iq | iq, ~ dyslexia + x1 + x2 + x3, data = ReadingSkills, minsize = 10) plot(bt) ## inspect result coef(bt) if(require("strucchange")) sctest(bt) ## IGNORE_RDIFF_BEGIN summary(bt, node = 2) summary(bt, node = 3) ## IGNORE_RDIFF_END ## add a numerical variable with relevant information for splitting ReadingSkills$x4 <- rnorm(nrow(ReadingSkills), c(-1.5, 1.5)[ReadingSkills$dyslexia]) bt2 <- betatree(accuracy ~ iq | iq, ~ x1 + x2 + x3 + x4, data = ReadingSkills, minsize = 10) plot(bt2) ## inspect result coef(bt2) if(require("strucchange")) sctest(bt2) ## IGNORE_RDIFF_BEGIN summary(bt2, node = 2) summary(bt2, node = 3) ## IGNORE_RDIFF_END
In this study participants were asked to judge how likely it is that a customer trades in a coupe or that a customer buys a car form a specific salesperson out of four possible salespersons.
data("CarTask", package = "betareg")
data("CarTask", package = "betareg")
A data frame with 155 observations on the following 3 variables.
task
a factor with levels Car
and
Salesperson
indicating the condition.
probability
a numeric vector of the estimated probability.
NFCCscale
a numeric vector of the NFCC scale.
All participants in the study were undergraduate students at The Australian National University, some of whom obtained course credit in first-year Psychology for their participation in the study.
The NFCC scale is a combined scale of the Need for Closure and Need for Certainty scales which are strongly correlated.
For task
the questions were:
What is the probability that a customer trades in a coupe?
What is the probability that a customer buys a car from Carlos?
Taken from Smithson et al. (2011) supplements.
Smithson, M., Merkle, E.C., and Verkuilen, J. (2011). Beta Regression Finite Mixture Models of Polarization and Priming. Journal of Educational and Behavioral Statistics, 36(6), 804–831. doi:10.3102/1076998610396893
Smithson, M., and Segale, C. (2009). Partition Priming in Judgments of Imprecise Probabilities. Journal of Statistical Theory and Practice, 3(1), 169–181.
data("CarTask", package = "betareg") library("flexmix") car_betamix <- betamix(probability ~ 1, data = CarTask, k = 3, extra_components = list(extraComponent(type = "uniform", coef = 1/2, delta = 0.01), extraComponent(type = "uniform", coef = 1/4, delta = 0.01)), FLXconcomitant = FLXPmultinom(~ task))
data("CarTask", package = "betareg") library("flexmix") car_betamix <- betamix(probability ~ 1, data = CarTask, k = 3, extra_components = list(extraComponent(type = "uniform", coef = 1/2, delta = 0.01), extraComponent(type = "uniform", coef = 1/4, delta = 0.01)), FLXconcomitant = FLXPmultinom(~ task))
Data on proportion of income spent on food for a random sample of 38 households in a large US city.
data("FoodExpenditure", package = "betareg")
data("FoodExpenditure", package = "betareg")
A data frame containing 38 observations on 3 variables.
household expenditures for food.
household income.
number of persons living in household.
Taken from Griffiths et al. (1993, Table 15.4).
Cribari-Neto, F., and Zeileis, A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Ferrari, S.L.P., and Cribari-Neto, F. (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815.
Griffiths, W.E., Hill, R.C., and Judge, G.G. (1993). Learning and Practicing Econometrics New York: John Wiley and Sons.
data("FoodExpenditure", package = "betareg") ## Ferrari and Cribari-Neto (2004) ## Section 4 fe_lin <- lm(I(food/income) ~ income + persons, data = FoodExpenditure) library("lmtest") bptest(fe_lin) ## Table 2 fe_beta <- betareg(I(food/income) ~ income + persons, data = FoodExpenditure) summary(fe_beta)
data("FoodExpenditure", package = "betareg") ## Ferrari and Cribari-Neto (2004) ## Section 4 fe_lin <- lm(I(food/income) ~ income + persons, data = FoodExpenditure) library("lmtest") bptest(fe_lin) ## Table 2 fe_beta <- betareg(I(food/income) ~ income + persons, data = FoodExpenditure) summary(fe_beta)
Operational data of the proportion of crude oil converted to gasoline after distillation and fractionation.
data("GasolineYield", package = "betareg")
data("GasolineYield", package = "betareg")
A data frame containing 32 observations on 6 variables.
proportion of crude oil converted to gasoline after distillation and fractionation.
crude oil gravity (degrees API).
vapor pressure of crude oil (lbf/in2).
temperature (degrees F) at which 10 percent of crude oil has vaporized.
temperature (degrees F) at which all gasoline has vaporized.
factor indicating unique batch of conditions gravity
,
pressure
, and temp10
.
This dataset was collected by Prater (1956), its dependent variable is the proportion of crude oil after distillation and fractionation. This dataset was analyzed by Atkinson (1985), who used the linear regression model and noted that there is “indication that the error distribution is not quite symmetrical, giving rise to some unduly large and small residuals” (p. 60).
The dataset contains 32 observations on the response and on the independent
variables. It has been noted (Daniel and Wood, 1971, Chapter 8) that there are only
ten sets of values of the first three explanatory variables which correspond to
ten different crudes and were subjected to experimentally controlled distillation
conditions. These conditions are captured in variable batch
and
the data were ordered according to the ascending order of temp10
.
Taken from Prater (1956).
Atkinson, A.C. (1985). Plots, Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. New York: Oxford University Press.
Cribari-Neto, F., and Zeileis, A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Daniel, C., and Wood, F.S. (1971). Fitting Equations to Data. New York: John Wiley and Sons.
Ferrari, S.L.P., and Cribari-Neto, F. (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815.
Prater, N.H. (1956). Estimate Gasoline Yields from Crudes. Petroleum Refiner, 35(5), 236–238.
## IGNORE_RDIFF_BEGIN data("GasolineYield", package = "betareg") gy1 <- betareg(yield ~ gravity + pressure + temp10 + temp, data = GasolineYield) summary(gy1) ## Ferrari and Cribari-Neto (2004) gy2 <- betareg(yield ~ batch + temp, data = GasolineYield) ## Table 1 summary(gy2) ## Figure 2 par(mfrow = c(3, 2)) plot(gy2, which = 1, type = "pearson", sub.caption = "") plot(gy2, which = 1, type = "deviance", sub.caption = "") plot(gy2, which = 5, type = "deviance", sub.caption = "") plot(gy2, which = 4, type = "pearson", sub.caption = "") plot(gy2, which = 2:3) par(mfrow = c(1, 1)) ## exclude 4th observation gy2a <- update(gy2, subset = -4) gy2a summary(gy2a) ## IGNORE_RDIFF_END
## IGNORE_RDIFF_BEGIN data("GasolineYield", package = "betareg") gy1 <- betareg(yield ~ gravity + pressure + temp10 + temp, data = GasolineYield) summary(gy1) ## Ferrari and Cribari-Neto (2004) gy2 <- betareg(yield ~ batch + temp, data = GasolineYield) ## Table 1 summary(gy2) ## Figure 2 par(mfrow = c(3, 2)) plot(gy2, which = 1, type = "pearson", sub.caption = "") plot(gy2, which = 1, type = "deviance", sub.caption = "") plot(gy2, which = 5, type = "deviance", sub.caption = "") plot(gy2, which = 4, type = "pearson", sub.caption = "") plot(gy2, which = 2:3) par(mfrow = c(1, 1)) ## exclude 4th observation gy2a <- update(gy2, subset = -4) gy2a summary(gy2a) ## IGNORE_RDIFF_END
Compute the generalized leverages values for fitted models.
gleverage(model, ...)
gleverage(model, ...)
model |
a model object. |
... |
further arguments passed to methods. |
gleverage
is a new generic for computing generalized leverage values as suggested by
Wei, Hu, and Fung (1998). Currently, there is only a method for betareg
models, implementing
the formulas from Rocha and Simas (2011) which are consistent with the formulas from
Ferrari and Cribari-Neto (2004) for the fixed dispersion case.
Currently, the vector of generalized leverages requires computations and
storage of order .
Ferrari SLP, Cribari-Neto F (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815.
Rocha AV, Simas AB (2011). Influence Diagnostics in a General Class of Beta Regression Models. Test, 20(1), 95–119. doi:10.1007/s11749-010-0189-z
Wei BC, Hu, YQ, Fung WK (1998). Generalized Leverage and Its Applications. Scandinavian Journal of Statistics, 25, 25–37.
options(digits = 4) data("GasolineYield", package = "betareg") gy <- betareg(yield ~ batch + temp, data = GasolineYield) gleverage(gy)
options(digits = 4) data("GasolineYield", package = "betareg") gy <- betareg(yield ~ batch + temp, data = GasolineYield) gleverage(gy)
In this study participants were asked to estimate upper and lower probabilities for event to occur and not to occur.
data("ImpreciseTask", package = "betareg")
data("ImpreciseTask", package = "betareg")
A data frame with 242 observations on the following 3 variables.
task
a factor with levels Boeing stock
and Sunday weather
.
location
a numeric vector of the average of the lower estimate for the event not to occur and the upper estimate for the event to occur.
difference
a numeric vector of the differences of the lower and upper estimate for the event to occur.
All participants in the study were either first- or second-year undergraduate students in psychology, none of whom had a strong background in probability or were familiar with imprecise probability theories.
For the sunday weather task see WeatherTask
. For the Boeing
stock task participants were asked to estimate the probability that
Boeing's stock would rise more than those in a list of 30 companies.
For each task participants were asked to provide lower and upper estimates for the event to occur and not to occur.
Taken from Smithson et al. (2011) supplements.
Smithson M, Merkle EC, Verkuilen J (2011). Beta Regression Finite Mixture Models of Polarization and Priming. Journal of Educational and Behavioral Statistics, 36(6), 804–831. doi:10.3102/1076998610396893
Smithson M, Segale C (2009). Partition Priming in Judgments of Imprecise Probabilities. Journal of Statistical Theory and Practice, 3(1), 169–181.
data("ImpreciseTask", package = "betareg") library("flexmix") wt_betamix <- betamix(location ~ difference * task, data = ImpreciseTask, k = 2, extra_components = extraComponent(type = "betareg", coef = list(mean = 0, precision = 8)), FLXconcomitant = FLXPmultinom(~ task))
data("ImpreciseTask", package = "betareg") library("flexmix") wt_betamix <- betamix(location ~ difference * task, data = ImpreciseTask, k = 2, extra_components = extraComponent(type = "betareg", coef = list(mean = 0, precision = 8)), FLXconcomitant = FLXPmultinom(~ task))
Data from a behavioral economics experiment assessing the extent of myopic loss aversion among adolescents (mostly aged 11 to 19).
data("LossAversion", package = "betareg")
data("LossAversion", package = "betareg")
A data frame containing 570 observations on 7 variables.
numeric. Average proportion of tokens invested across all 9 rounds.
factor. Gender of the player (or team of players).
factor. Was (at least one of) the player(s) male (in the team)?
numeric. Age in years (averaged for teams).
factor. Type of treatment: long vs. short.
factor. School grades: 6-8 (11-14 years) vs. 10-12 (15-18 years).
factor. Is the player a single player or team of two?
Myopic loss aversion is a phenomenon in behavioral economics, where individuals do not behave economically rationally when making short-term decisions under uncertainty. Example: In lotteries with positive expected payouts investments are lower than the maximum possible (loss aversion). This effect is enhanced for short-term investments (myopia or short-sightedness).
The data in LossAversion
were collected by Matthias Sutter and
Daniela Glätzle-Rützler (Universität Innsbruck) in an experiment with
high-school students in Tyrol, Austria (Schwaz and Innsbruck). The students
could invest X tokens (0-100) in each of 9 rounds in a lottery. The payouts
were 100 + 2.5 * X tokens with probability 1/3 and 100 - X tokens with
probability 2/3. Thus, the expected payouts were 100 + 1/6 * X tokens.
Depending on the treatment in the experiment, the investments could either be
modified in each round (treatment: "short") or only in round 1, 4, 7
(treatment "long"). Decisions were either made alone or in teams of two. The
tokens were converted to monetary payouts using a conversion of
EUR 0.5 per 100 tokens for lower grades (Unterstufe, 6-8) or EUR 1.0 per 100
tokens for upper grades (Oberstufe, 10-12).
From the myopic loss aversion literature (on adults) one would expect that the investments of the players (either single players or teams of two) would depend on all factors: Investments should be
lower in the short treatment (which would indicate myopia),
higher for teams (indicating a reduction in loss aversion),
higher for (teams with) male players,
increase with age/grade.
See Glätzle-Rützler et al. (2015) for more details and references to the literature. In their original analysis, the investments are analyzes using a panel structure (i.e., 9 separate investments for each team). Here, the data are averaged across rounds for each player, leading to qualitatively similar results. The full data along with replication materials are available in the Harvard Dataverse.
Kosmidis and Zeileis (2024) revisit the data using extended-support beta mixture (XBX) regression, which can simultaneously capture both the probability of rational behavior and the mean amount of loss aversion.
Glätzle-Rützler D, Sutter M, Zeileis A (2020). Replication Data for: No Myopic Loss Aversion in Adolescents? - An Experimental Note. Harvard Dataverse, UNF:6:6hVtbHavJAFYfL7dDl7jqA==. doi:10.7910/DVN/IHFZAK
Glätzle-Rützler D, Sutter M, Zeileis A (2015). No Myopic Loss Aversion in Adolescents? – An Experimental Note. Journal of Economic Behavior & Organization, 111, 169–176. doi:10.1016/j.jebo.2014.12.021
Kosmidis I, Zeileis A (2024). Extended-Support Beta Regression for [0, 1] Responses. 2409.07233, arXiv.org E-Print Archive. doi:10.48550/arXiv.2409.07233
options(digits = 4) ## data and add ad-hoc scaling (a la Smithson & Verkuilen) data("LossAversion", package = "betareg") LossAversion <- transform(LossAversion, invests = (invest * (nrow(LossAversion) - 1) + 0.5)/nrow(LossAversion)) ## models: normal (with constant variance), beta, extended-support beta mixture la_n <- lm(invest ~ grade * (arrangement + age) + male, data = LossAversion) summary(la_n) la_b <- betareg(invests ~ grade * (arrangement + age) + male | arrangement + male + grade, data = LossAversion) summary(la_b) la_xbx <- betareg(invest ~ grade * (arrangement + age) + male | arrangement + male + grade, data = LossAversion) summary(la_xbx) ## coefficients in XBX are typically somewhat shrunken compared to beta cbind(XBX = coef(la_xbx), Beta = c(coef(la_b), NA)) ## predictions on subset: (at least one) male players, higher grades, around age 16 la <- subset(LossAversion, male == "yes" & grade == "10-12" & age >= 15 & age <= 17) la_nd <- data.frame(arrangement = c("single", "team"), male = "yes", age = 16, grade = "10-12") ## empirical vs fitted E(Y) la_nd$mean_emp <- aggregate(invest ~ arrangement, data = la, FUN = mean)$invest la_nd$mean_n <- predict(la_n, la_nd) la_nd$mean_b <- predict(la_b, la_nd) la_nd$mean_xbx <- predict(la_xbx, la_nd) la_nd ## visualization: all means rather similar la_mod <- c("Emp", "N", "B", "XBX") la_col <- unname(palette.colors())[c(1, 2, 4, 4)] la_lty <- c(1, 5, 5, 1) matplot(la_nd[, paste0("mean_", tolower(la_mod))], type = "l", col = la_col, lty = la_lty, lwd = 2, ylab = "E(Y)", main = "E(Y)", xaxt = "n") axis(1, at = 1:2, labels = la_nd$arrangement) legend("topleft", la_mod, col = la_col, lty = la_lty, lwd = 2, bty = "n") ## empirical vs. fitted P(Y > 0.95) la_nd$prob_emp <- aggregate(invest >= 0.95 ~ arrangement, data = la, FUN = mean)$invest la_nd$prob_n <- pnorm(0.95, mean = la_nd$mean_n, sd = summary(la_n)$sigma, lower.tail = FALSE) la_nd$prob_b <- 1 - predict(la_b, la_nd, type = "probability", at = 0.95) la_nd$prob_xbx <- 1 - predict(la_xbx, la_nd, type = "probability", at = 0.95) la_nd[, -(5:8)] ## visualization: only XBX works well matplot(la_nd[, paste0("prob_", tolower(la_mod))], type = "l", col = la_col, lty = la_lty, lwd = 2, ylab = "P(Y > 0.95)", main = "P(Y > 0.95)", xaxt = "n") axis(1, at = 1:2, labels = la_nd$arrangement) legend("topleft", la_mod, col = la_col, lty = la_lty, lwd = 2, bty = "n")
options(digits = 4) ## data and add ad-hoc scaling (a la Smithson & Verkuilen) data("LossAversion", package = "betareg") LossAversion <- transform(LossAversion, invests = (invest * (nrow(LossAversion) - 1) + 0.5)/nrow(LossAversion)) ## models: normal (with constant variance), beta, extended-support beta mixture la_n <- lm(invest ~ grade * (arrangement + age) + male, data = LossAversion) summary(la_n) la_b <- betareg(invests ~ grade * (arrangement + age) + male | arrangement + male + grade, data = LossAversion) summary(la_b) la_xbx <- betareg(invest ~ grade * (arrangement + age) + male | arrangement + male + grade, data = LossAversion) summary(la_xbx) ## coefficients in XBX are typically somewhat shrunken compared to beta cbind(XBX = coef(la_xbx), Beta = c(coef(la_b), NA)) ## predictions on subset: (at least one) male players, higher grades, around age 16 la <- subset(LossAversion, male == "yes" & grade == "10-12" & age >= 15 & age <= 17) la_nd <- data.frame(arrangement = c("single", "team"), male = "yes", age = 16, grade = "10-12") ## empirical vs fitted E(Y) la_nd$mean_emp <- aggregate(invest ~ arrangement, data = la, FUN = mean)$invest la_nd$mean_n <- predict(la_n, la_nd) la_nd$mean_b <- predict(la_b, la_nd) la_nd$mean_xbx <- predict(la_xbx, la_nd) la_nd ## visualization: all means rather similar la_mod <- c("Emp", "N", "B", "XBX") la_col <- unname(palette.colors())[c(1, 2, 4, 4)] la_lty <- c(1, 5, 5, 1) matplot(la_nd[, paste0("mean_", tolower(la_mod))], type = "l", col = la_col, lty = la_lty, lwd = 2, ylab = "E(Y)", main = "E(Y)", xaxt = "n") axis(1, at = 1:2, labels = la_nd$arrangement) legend("topleft", la_mod, col = la_col, lty = la_lty, lwd = 2, bty = "n") ## empirical vs. fitted P(Y > 0.95) la_nd$prob_emp <- aggregate(invest >= 0.95 ~ arrangement, data = la, FUN = mean)$invest la_nd$prob_n <- pnorm(0.95, mean = la_nd$mean_n, sd = summary(la_n)$sigma, lower.tail = FALSE) la_nd$prob_b <- 1 - predict(la_b, la_nd, type = "probability", at = 0.95) la_nd$prob_xbx <- 1 - predict(la_xbx, la_nd, type = "probability", at = 0.95) la_nd[, -(5:8)] ## visualization: only XBX works well matplot(la_nd[, paste0("prob_", tolower(la_mod))], type = "l", col = la_col, lty = la_lty, lwd = 2, ylab = "P(Y > 0.95)", main = "P(Y > 0.95)", xaxt = "n") axis(1, at = 1:2, labels = la_nd$arrangement) legend("topleft", la_mod, col = la_col, lty = la_lty, lwd = 2, bty = "n")
Data with responses of naive mock jurors to the conventional conventional two-option verdict (guilt vs. acquittal) versus a three-option verdict setup (the third option was the Scottish 'not proven' alternative), in the presence/absence of conflicting testimonial evidence.
data("MockJurors", package = "betareg")
data("MockJurors", package = "betareg")
A data frame containing 104 observations on 3 variables.
factor indicating whether a two-option or three-option verdict is requested. (A sum contrast rather than treatment contrast is employed.)
factor. Is there conflicting testimonial evidence? (A sum contrast rather than treatment contrast is employed.)
jurors degree of confidence in his/her verdict, scaled to the open unit interval (see below).
The data were collected by Daily (2004) among first-year psychology
students at Australian National University. Smithson and Verkuilen (2006)
employed the data scaling the original confidence (on a scale 0–100)
to the open unit interval: ((original_confidence/100) * 103 - 0.5) / 104
.
The original coding of conflict
in the data provided from Smithson's
homepage is -1/1 which Smithson and Verkuilen (2006) describe to mean
no/yes. However, all their results (sample statistics, histograms, etc.)
suggest that it actually means yes/no which was employed in MockJurors
.
Example 1 from Smithson and Verkuilen (2006) supplements.
Deady S (2004). The Psychological Third Verdict: 'Not Proven' or 'Not Willing to Make a Decision'? Unpublished honors thesis, The Australian National University, Canberra.
Smithson M, Verkuilen J (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological Methods, 11(7), 54–71.
betareg
, ReadingSkills
, StressAnxiety
data("MockJurors", package = "betareg") library("lmtest") ## Smithson & Verkuilen (2006, Table 1) ## variable dispersion model ## (NOTE: numerical rather than analytical Hessian is used for replication, ## Smithson & Verkuilen erroneously compute one-sided p-values) mj_vd <- betareg(confidence ~ verdict * conflict | verdict * conflict, data = MockJurors, hessian = TRUE) summary(mj_vd) ## model selection for beta regression: null model, fixed dispersion model (p. 61) mj_null <- betareg(confidence ~ 1 | 1, data = MockJurors) mj_fd <- betareg(confidence ~ verdict * conflict | 1, data = MockJurors) lrtest(mj_null, mj_fd) lrtest(mj_null, mj_vd) ## McFadden's pseudo-R-squared 1 - as.vector(logLik(mj_null)/logLik(mj_vd)) ## visualization if(require("lattice")) { histogram(~ confidence | conflict + verdict, data = MockJurors, col = "lightgray", breaks = 0:10/10, type = "density") } ## see demo("SmithsonVerkuilen2006", package = "betareg") for more details
data("MockJurors", package = "betareg") library("lmtest") ## Smithson & Verkuilen (2006, Table 1) ## variable dispersion model ## (NOTE: numerical rather than analytical Hessian is used for replication, ## Smithson & Verkuilen erroneously compute one-sided p-values) mj_vd <- betareg(confidence ~ verdict * conflict | verdict * conflict, data = MockJurors, hessian = TRUE) summary(mj_vd) ## model selection for beta regression: null model, fixed dispersion model (p. 61) mj_null <- betareg(confidence ~ 1 | 1, data = MockJurors) mj_fd <- betareg(confidence ~ verdict * conflict | 1, data = MockJurors) lrtest(mj_null, mj_fd) lrtest(mj_null, mj_vd) ## McFadden's pseudo-R-squared 1 - as.vector(logLik(mj_null)/logLik(mj_vd)) ## visualization if(require("lattice")) { histogram(~ confidence | conflict + verdict, data = MockJurors, col = "lightgray", breaks = 0:10/10, type = "density") } ## see demo("SmithsonVerkuilen2006", package = "betareg") for more details
Various types of standard diagnostic plots can be produced, involving various types of residuals, influence measures etc.
## S3 method for class 'betareg' plot(x, which = 1:4, caption = c("Residuals vs indices of obs.", "Cook's distance plot", "Generalized leverage vs predicted values", "Residuals vs linear predictor", "Half-normal plot of residuals", "Predicted vs observed values"), sub.caption = paste(deparse(x$call), collapse = "\n"), main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., type = "quantile", nsim = 100, level = 0.9)
## S3 method for class 'betareg' plot(x, which = 1:4, caption = c("Residuals vs indices of obs.", "Cook's distance plot", "Generalized leverage vs predicted values", "Residuals vs linear predictor", "Half-normal plot of residuals", "Predicted vs observed values"), sub.caption = paste(deparse(x$call), collapse = "\n"), main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., type = "quantile", nsim = 100, level = 0.9)
x |
fitted model object of class |
which |
numeric. If a subset of the plots is required, specify a subset of the numbers |
caption |
character. Captions to appear above the plots. |
sub.caption |
character. Common title-above figures if there are multiple. |
main |
character. Title to each plot in addition to the above |
ask |
logical. If |
... |
other parameters to be passed through to plotting functions. |
type |
character indicating type of residual to be used, see |
nsim |
numeric. Number of simulations in half-normal plots. |
level |
numeric. Confidence level in half-normal plots. |
The plot
method for betareg
objects produces various types
of diagnostic plots. Most of these are standard for regression models and involve
various types of residuals, influence measures etc. See Ferrari and Cribari-Neto (2004)
for a discussion of some of these displays.
The which
argument can be used to select a subset of currently six supported
types of displays. The corresponding element of caption
contains a brief
description. In some more detail, the displays are: Residuals (as selected by
type
) vs indices of observations (which = 1
). Cook's distances
vs indices of observations (which = 2
). Generalized leverage vs
predicted values (which = 3
). Residuals vs linear predictor (which = 4
).
Half-normal plot of residuals (which = 5
), which is obtained using a simulation
approach. Predicted vs observed values (which = 6
).
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Ferrari SLP, Cribari-Neto F (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815.
data("GasolineYield", package = "betareg") gy <- betareg(yield ~ gravity + pressure + temp10 + temp, data = GasolineYield) par(mfrow = c(3, 2)) plot(gy, which = 1:6) par(mfrow = c(1, 1))
data("GasolineYield", package = "betareg") gy <- betareg(yield ~ gravity + pressure + temp10 + temp, data = GasolineYield) par(mfrow = c(3, 2)) plot(gy, which = 1:6) par(mfrow = c(1, 1))
Extract various types of predictions from beta regression models: First, GLM-style predictions on the scale of responses in (0, 1) or the scale of the linear predictor are provided. Second, various quantities based on the predicted beta distributions are available, e.g., moments, quantiles, probabilities, densities, etc.
## S3 method for class 'betareg' predict(object, newdata = NULL, type = c("response", "link", "precision", "variance", "parameters", "distribution", "density", "probability", "quantile"), na.action = na.pass, at = 0.5, elementwise = NULL, ...)
## S3 method for class 'betareg' predict(object, newdata = NULL, type = c("response", "link", "precision", "variance", "parameters", "distribution", "density", "probability", "quantile"), na.action = na.pass, at = 0.5, elementwise = NULL, ...)
object |
fitted model object of class |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
type |
character indicating type of predictions: fitted means of the response
(default, |
na.action |
function determining what should be done with missing values
in |
at |
numeric vector at which the predictions should be evaluated if
|
elementwise |
logical. Should each element of the distribution only be evaluated at the
corresponding element of |
... |
further arguments when |
Each prediction for a betareg
model internally first computes the parameters
for all observations in newdata
(or the original data if newdata
is missing).
These parameters correspond to a predicted beta distribution (or extended-support beta distribution)
for each observation. Then the actual predictions can also be moments of the distributions
or standard quantities such as densities, cumulative probabilities, or quantiles.
The latter are computed with the d
/p
/q
functions such as
dbetar
(or dxbetax
or dxbeta
).
Either a vector or matrix of predictions with the same number of observations as rows
in newdata
.
options(digits = 4) data("GasolineYield", package = "betareg") gy2 <- betareg(yield ~ batch + temp | temp, data = GasolineYield) cbind( predict(gy2, type = "response"), predict(gy2, type = "link"), predict(gy2, type = "precision"), predict(gy2, type = "variance"), predict(gy2, type = "quantile", at = c(0.25, 0.5, 0.75)) ) ## evaluate cumulative _p_robabilities for (small) new data set gyd <- GasolineYield[c(1, 5, 10), ] ## CDF at 0.1 for each observation predict(gy2, newdata = gyd, type = "probability", at = 0.1) ## CDF at each combination of 0.1/0.2 and observations predict(gy2, newdata = gyd, type = "probability", at = c(0.1, 0.2)) ## CDF at elementwise combinations of 0.1/0.2/0.3 and observations predict(gy2, newdata = gyd, type = "probability", at = c(0.1, 0.2, 0.3)) predict(gy2, newdata = gyd, type = "probability", at = c(0.1, 0.2, 0.3), elementwise = TRUE) ## CDF at all combinations of 0.1/0.2/0.3 and observations predict(gy2, newdata = gyd, type = "probability", at = c(0.1, 0.2, 0.3), elementwise = FALSE)
options(digits = 4) data("GasolineYield", package = "betareg") gy2 <- betareg(yield ~ batch + temp | temp, data = GasolineYield) cbind( predict(gy2, type = "response"), predict(gy2, type = "link"), predict(gy2, type = "precision"), predict(gy2, type = "variance"), predict(gy2, type = "quantile", at = c(0.25, 0.5, 0.75)) ) ## evaluate cumulative _p_robabilities for (small) new data set gyd <- GasolineYield[c(1, 5, 10), ] ## CDF at 0.1 for each observation predict(gy2, newdata = gyd, type = "probability", at = 0.1) ## CDF at each combination of 0.1/0.2 and observations predict(gy2, newdata = gyd, type = "probability", at = c(0.1, 0.2)) ## CDF at elementwise combinations of 0.1/0.2/0.3 and observations predict(gy2, newdata = gyd, type = "probability", at = c(0.1, 0.2, 0.3)) predict(gy2, newdata = gyd, type = "probability", at = c(0.1, 0.2, 0.3), elementwise = TRUE) ## CDF at all combinations of 0.1/0.2/0.3 and observations predict(gy2, newdata = gyd, type = "probability", at = c(0.1, 0.2, 0.3), elementwise = FALSE)
Data for assessing the contribution of non-verbal IQ to children's reading skills in dyslexic and non-dyslexic children.
data("ReadingSkills", package = "betareg")
data("ReadingSkills", package = "betareg")
A data frame containing 44 observations on 3 variables.
numeric. Reading score with maximum restricted to be 0.99 rather than 1 (see below).
factor. Is the child dyslexic? (A sum contrast rather than treatment contrast is employed.)
numeric. Non-verbal intelligence quotient transformed to z-scores.
numeric. Unrestricted reading score with a maximum of 1 (see below).
The data were collected by Pammer and Kevan (2004) and employed by
Smithson and Verkuilen (2006). The original reading accuracy score was transformed
by Smithson and Verkuilen (2006) so that accuracy
is in the open unit
interval (0, 1) and beta regression can be employed. First, the original accuracy
was scaled using the minimal and maximal score (a
and b
, respectively)
that can be obtained in the test: accuracy1 = (original_accuracy - a) / (b - a)
(a
and b
are not provided). Subsequently, accuracy
was obtained
from accuracy1
by replacing all observations with a value of 1 with 0.99.
Kosmidis and Zeileis (2024) propose to investigate the original unrestricted
accuracy1
variable using their extended-support beta mixture regression.
Example 3 from Smithson and Verkuilen (2006) supplements.
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Grün B, Kosmidis I, Zeileis A (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1–25. doi:10.18637/jss.v048.i11
Kosmidis I, Zeileis A (2024). Extended-Support Beta Regression for [0, 1] Responses. 2409.07233, arXiv.org E-Print Archive. doi:10.48550/arXiv.2409.07233
Pammer K, Kevan A (2004). The Contribution of Visual Sensitivity, Phonological Processing and Non-Verbal IQ to Children's Reading. Unpublished manuscript, The Australian National University, Canberra.
Smithson M, Verkuilen J (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological Methods, 11(7), 54–71.
betareg
, MockJurors
, StressAnxiety
options(digits = 4) data("ReadingSkills", package = "betareg") ## Smithson & Verkuilen (2006, Table 5) ## OLS regression ## (Note: typo in iq coefficient: 0.3954 instead of 0.3594) rs_ols <- lm(qlogis(accuracy) ~ dyslexia * iq, data = ReadingSkills) summary(rs_ols) ## Beta regression (with numerical rather than analytic standard errors) ## (Note: Smithson & Verkuilen erroneously compute one-sided p-values) rs_beta <- betareg(accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills, hessian = TRUE) summary(rs_beta) ## Extended-support beta mixture regression (Kosmidis & Zeileis 2024) rs_xbx <- betareg(accuracy1 ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills) summary(rs_xbx) ## Coefficients in XBX are typically somewhat shrunken compared to beta cbind(XBX = coef(rs_xbx), Beta = c(coef(rs_beta), NA)) ## Visualization plot(accuracy1 ~ iq, data = ReadingSkills, col = c(4, 2)[dyslexia], pch = 19) nd <- data.frame(dyslexia = "no", iq = -30:30/10) lines(nd$iq, predict(rs_xbx, nd), col = 4) lines(nd$iq, predict(rs_beta, nd), col = 4, lty = 5) lines(nd$iq, plogis(predict(rs_ols, nd)), col = 4, lty = 3) nd <- data.frame(dyslexia = "yes", iq = -30:30/10) lines(nd$iq, predict(rs_xbx, nd), col = 2) lines(nd$iq, predict(rs_beta, nd), col = 2, lty = 5) lines(nd$iq, plogis(predict(rs_ols, nd)), col = 2, lty = 3) legend("topleft", c("Dyslexia: no", "Dyslexia: yes", "OLS", "XBX", "Beta"), lty = c(0, 0, 3, 1, 5), pch = c(19, 19, NA, NA, NA), col = c(4, 2, 1, 1, 1), bty = "n") ## see demo("SmithsonVerkuilen2006", package = "betareg") for further details
options(digits = 4) data("ReadingSkills", package = "betareg") ## Smithson & Verkuilen (2006, Table 5) ## OLS regression ## (Note: typo in iq coefficient: 0.3954 instead of 0.3594) rs_ols <- lm(qlogis(accuracy) ~ dyslexia * iq, data = ReadingSkills) summary(rs_ols) ## Beta regression (with numerical rather than analytic standard errors) ## (Note: Smithson & Verkuilen erroneously compute one-sided p-values) rs_beta <- betareg(accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills, hessian = TRUE) summary(rs_beta) ## Extended-support beta mixture regression (Kosmidis & Zeileis 2024) rs_xbx <- betareg(accuracy1 ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills) summary(rs_xbx) ## Coefficients in XBX are typically somewhat shrunken compared to beta cbind(XBX = coef(rs_xbx), Beta = c(coef(rs_beta), NA)) ## Visualization plot(accuracy1 ~ iq, data = ReadingSkills, col = c(4, 2)[dyslexia], pch = 19) nd <- data.frame(dyslexia = "no", iq = -30:30/10) lines(nd$iq, predict(rs_xbx, nd), col = 4) lines(nd$iq, predict(rs_beta, nd), col = 4, lty = 5) lines(nd$iq, plogis(predict(rs_ols, nd)), col = 4, lty = 3) nd <- data.frame(dyslexia = "yes", iq = -30:30/10) lines(nd$iq, predict(rs_xbx, nd), col = 2) lines(nd$iq, predict(rs_beta, nd), col = 2, lty = 5) lines(nd$iq, plogis(predict(rs_ols, nd)), col = 2, lty = 3) legend("topleft", c("Dyslexia: no", "Dyslexia: yes", "OLS", "XBX", "Beta"), lty = c(0, 0, 3, 1, 5), pch = c(19, 19, NA, NA, NA), col = c(4, 2, 1, 1, 1), bty = "n") ## see demo("SmithsonVerkuilen2006", package = "betareg") for further details
Extract various types of residuals from beta regression models: raw response residuals (observed - fitted), Pearson residuals (raw residuals scaled by square root of variance function), deviance residuals (scaled log-likelihood contributions), and different kinds of weighted residuals suggested by Espinheira et al. (2008).
## S3 method for class 'betareg' residuals(object, type = c("quantile", "deviance", "pearson", "response", "weighted", "sweighted", "sweighted2"), ...)
## S3 method for class 'betareg' residuals(object, type = c("quantile", "deviance", "pearson", "response", "weighted", "sweighted", "sweighted2"), ...)
object |
fitted model object of class |
type |
character indicating type of residuals. |
... |
currently not used. |
The default residuals (starting from version 3.2-0) are quantile residuals as proposed by Dunn and Smyth (1996) and explored in the context of beta regression by Pereira (2017). In case of extended-support beta regression with boundary observations at 0 and/or 1, the quantile residuals for the boundary observations are randomized.
The definitions of all other residuals are provided in Espinheira et al. (2008):
Equation 2 for "pearson"
, last equation on page 409 for "deviance"
,
Equation 6 for "weighted"
, Equation 7 for "sweighted"
, and
Equation 8 for "sweighted2"
.
Espinheira et al. (2008) recommend to use "sweighted2"
, hence this was
the default prior to version 3.2-0. However, these are rather burdensome to
compute because they require operations of and hence are typically
prohibitively costly in large sample. Also they are not available for
extended-support beta regression. Finally, Pereira (2017) found quantile
residuals to have better distributional properties.
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Dunn PK, Smyth GK (1996). Randomized Quantile Residuals. Journal of Computational and Graphical Statistics, 5(3), 236–244. doi:10.2307/1390802
Espinheira PL, Ferrari SLP, Cribari-Neto F (2008). On Beta Regression Residuals. Journal of Applied Statistics, 35(4), 407–419. doi:10.1080/02664760701834931
Ferrari SLP, Cribari-Neto F (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815. doi:10.1080/0266476042000214501
Pereira GHA (2017). On Quantile Residuals in Beta Regression. Communications in Statistics – Simulation and Computation, 48(1), 302–316. doi:10.1080/03610918.2017.1381740
Kosmidis I, Zeileis A (2024). Extended-Support Beta Regression for [0, 1] Responses. 2409.07233, arXiv.org E-Print Archive. doi:10.48550/arXiv.2409.07233
options(digits = 4) data("GasolineYield", package = "betareg") gy <- betareg(yield ~ gravity + pressure + temp10 + temp, data = GasolineYield) gy_res <- cbind( "quantile" = residuals(gy, type = "quantile"), "pearson" = residuals(gy, type = "pearson"), "deviance" = residuals(gy, type = "deviance"), "response" = residuals(gy, type = "response"), "weighted" = residuals(gy, type = "weighted"), "sweighted" = residuals(gy, type = "sweighted"), "sweighted2" = residuals(gy, type = "sweighted2") ) pairs(gy_res) cor(gy_res)
options(digits = 4) data("GasolineYield", package = "betareg") gy <- betareg(yield ~ gravity + pressure + temp10 + temp, data = GasolineYield) gy_res <- cbind( "quantile" = residuals(gy, type = "quantile"), "pearson" = residuals(gy, type = "pearson"), "deviance" = residuals(gy, type = "deviance"), "response" = residuals(gy, type = "response"), "weighted" = residuals(gy, type = "weighted"), "sweighted" = residuals(gy, type = "sweighted"), "sweighted2" = residuals(gy, type = "sweighted2") ) pairs(gy_res) cor(gy_res)
Stress and anxiety among nonclinical women in Townsville, Queensland, Australia.
data("StressAnxiety", package = "betareg")
data("StressAnxiety", package = "betareg")
A data frame containing 166 observations on 2 variables.
score, linearly transformed to the open unit interval (see below).
score, linearly transformed to the open unit interval (see below).
Both variables were assess on the Depression Anxiety Stress Scales, ranging from 0 to 42. Smithson and Verkuilen (2006) transformed these to the open unit interval (without providing details about this transformation).
Example 2 from Smithson and Verkuilen (2006) supplements.
Smithson M, Verkuilen J (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological Methods, 11(7), 54–71.
betareg
, MockJurors
, ReadingSkills
data("StressAnxiety", package = "betareg") StressAnxiety <- StressAnxiety[order(StressAnxiety$stress),] ## Smithson & Verkuilen (2006, Table 4) sa_null <- betareg(anxiety ~ 1 | 1, data = StressAnxiety, hessian = TRUE) sa_stress <- betareg(anxiety ~ stress | stress, data = StressAnxiety, hessian = TRUE) summary(sa_null) summary(sa_stress) AIC(sa_null, sa_stress) 1 - as.vector(logLik(sa_null)/logLik(sa_stress)) ## visualization attach(StressAnxiety) plot(jitter(anxiety) ~ jitter(stress), xlab = "Stress", ylab = "Anxiety", xlim = c(0, 1), ylim = c(0, 1)) lines(lowess(anxiety ~ stress)) lines(fitted(sa_stress) ~ stress, lty = 2) lines(fitted(lm(anxiety ~ stress)) ~ stress, lty = 3) legend("topleft", c("lowess", "betareg", "lm"), lty = 1:3, bty = "n") detach(StressAnxiety) ## see demo("SmithsonVerkuilen2006", package = "betareg") for more details
data("StressAnxiety", package = "betareg") StressAnxiety <- StressAnxiety[order(StressAnxiety$stress),] ## Smithson & Verkuilen (2006, Table 4) sa_null <- betareg(anxiety ~ 1 | 1, data = StressAnxiety, hessian = TRUE) sa_stress <- betareg(anxiety ~ stress | stress, data = StressAnxiety, hessian = TRUE) summary(sa_null) summary(sa_stress) AIC(sa_null, sa_stress) 1 - as.vector(logLik(sa_null)/logLik(sa_stress)) ## visualization attach(StressAnxiety) plot(jitter(anxiety) ~ jitter(stress), xlab = "Stress", ylab = "Anxiety", xlim = c(0, 1), ylim = c(0, 1)) lines(lowess(anxiety ~ stress)) lines(fitted(sa_stress) ~ stress, lty = 2) lines(fitted(lm(anxiety ~ stress)) ~ stress, lty = 3) legend("topleft", c("lowess", "betareg", "lm"), lty = 1:3, bty = "n") detach(StressAnxiety) ## see demo("SmithsonVerkuilen2006", package = "betareg") for more details
Methods for extracting information from fitted beta
regression model objects of class "betareg"
.
## S3 method for class 'betareg' summary(object, phi = NULL, type = "quantile", ...) ## S3 method for class 'betareg' coef(object, model = c("full", "mean", "precision"), phi = NULL, ...) ## S3 method for class 'betareg' vcov(object, model = c("full", "mean", "precision"), phi = NULL, ...) ## S3 method for class 'betareg' bread(x, phi = NULL, ...) ## S3 method for class 'betareg' estfun(x, phi = NULL, ...)
## S3 method for class 'betareg' summary(object, phi = NULL, type = "quantile", ...) ## S3 method for class 'betareg' coef(object, model = c("full", "mean", "precision"), phi = NULL, ...) ## S3 method for class 'betareg' vcov(object, model = c("full", "mean", "precision"), phi = NULL, ...) ## S3 method for class 'betareg' bread(x, phi = NULL, ...) ## S3 method for class 'betareg' estfun(x, phi = NULL, ...)
object , x
|
fitted model object of class |
phi |
logical indicating whether the parameters in the precision model
(for phi) should be reported as full model parameters ( |
type |
character specifying type of residuals to be included in the
summary output, see |
model |
character specifying for which component of the model coefficients/covariance
should be extracted. (Only used if |
... |
currently not used. |
A set of standard extractor functions for fitted model objects is available for
objects of class "betareg"
, including methods to the generic functions
print
and summary
which print the estimated
coefficients along with some further information. The summary
in particular
supplies partial Wald tests based on the coefficients and the covariance matrix.
As usual, the summary
method returns an object of class "summary.betareg"
containing the relevant summary statistics which can subsequently be printed
using the associated print
method.
A logLik
method is provided, hence AIC
can be called to compute information criteria.
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Ferrari SLP, Cribari-Neto F (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815.
Simas AB, Barreto-Souza W, Rocha AV (2010). Improved Estimators for a General Class of Beta Regression Models. Computational Statistics & Data Analysis, 54(2), 348–366.
options(digits = 4) data("GasolineYield", package = "betareg") gy2 <- betareg(yield ~ batch + temp | temp, data = GasolineYield) summary(gy2) coef(gy2) vcov(gy2) logLik(gy2) AIC(gy2) coef(gy2, model = "mean") coef(gy2, model = "precision") summary(gy2, phi = FALSE)
options(digits = 4) data("GasolineYield", package = "betareg") gy2 <- betareg(yield ~ batch + temp | temp, data = GasolineYield) summary(gy2) coef(gy2) vcov(gy2) logLik(gy2) AIC(gy2) coef(gy2, model = "mean") coef(gy2, model = "precision") summary(gy2, phi = FALSE)
In this study participants were asked to judge how likely Sunday is to be the hottest day of the week.
data("WeatherTask", package = "betareg")
data("WeatherTask", package = "betareg")
A data frame with 345 observations on the following 3 variables.
priming
a factor with levels two-fold
(case
prime) and seven-fold
(class prime).
eliciting
a factor with levels precise
and
imprecise
(lower and upper limit).
agreement
a numeric vector, probability indicated by participants or the average between minimum and maximum probability indicated.
All participants in the study were either first- or second-year undergraduate students in psychology, none of whom had a strong background in probability or were familiar with imprecise probability theories.
For priming
the questions were:
[What is the probability that] the temperature at Canberra airport on Sunday will be higher than every other day next week?
[What is the probability that] the highest temperature of the week at Canberra airport will occur on Sunday?
For eliciting
the instructions were if
to assign a probability estimate,
to assign a lower and upper probability estimate.
Taken from Smithson et al. (2011) supplements.
Smithson M, Merkle EC, Verkuilen J (2011). Beta Regression Finite Mixture Models of Polarization and Priming. Journal of Educational and Behavioral Statistics, 36(6), 804–831. doi:10.3102/1076998610396893
Smithson M, Segale C (2009). Partition Priming in Judgments of Imprecise Probabilities. Journal of Statistical Theory and Practice, 3(1), 169–181.
data("WeatherTask", package = "betareg") library("flexmix") wt_betamix <- betamix(agreement ~ 1, data = WeatherTask, k = 2, extra_components = extraComponent(type = "betareg", coef = list(mean = 0, precision = 2)), FLXconcomitant = FLXPmultinom(~ priming + eliciting))
data("WeatherTask", package = "betareg") library("flexmix") wt_betamix <- betamix(agreement ~ 1, data = WeatherTask, k = 2, extra_components = extraComponent(type = "betareg", coef = list(mean = 0, precision = 2)), FLXconcomitant = FLXPmultinom(~ priming + eliciting))
Density, distribution function, quantile function, and random generation for the extended-support beta distribution (in regression parameterization) on [0, 1].
dxbeta(x, mu, phi, nu = 0, log = FALSE) pxbeta(q, mu, phi, nu = 0, lower.tail = TRUE, log.p = FALSE) qxbeta(p, mu, phi, nu = 0, lower.tail = TRUE, log.p = FALSE) rxbeta(n, mu, phi, nu = 0)
dxbeta(x, mu, phi, nu = 0, log = FALSE) pxbeta(q, mu, phi, nu = 0, lower.tail = TRUE, log.p = FALSE) qxbeta(p, mu, phi, nu = 0, lower.tail = TRUE, log.p = FALSE) rxbeta(n, mu, phi, nu = 0)
x , q
|
numeric. Vector of quantiles. |
p |
numeric. Vector of probabilities. |
n |
numeric. Number of observations. If |
mu |
numeric. The mean of the underlying beta distribution on [-nu, 1 + nu]. |
phi |
numeric. The precision parameter of the underlying beta distribution on [-nu, 1 + nu]. |
nu |
numeric. Exceedence parameter for the support of the underlying beta distribution on [-nu, 1 + nu] that is censored to [0, 1]. |
log , log.p
|
logical. If TRUE, probabilities p are given as log(p). |
lower.tail |
logical. If TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
In order to obtain an extended-support beta distribution on [0, 1]
an additional exceedence parameter nu
is introduced. If nu > 0
,
this scales the underlying beta distribution to the interval [-nu, 1 + nu]
where the tails are subsequently censored to the unit interval [0, 1] with
point masses on the boundaries 0 and 1. Thus, nu
controls how likely
boundary observations are and for nu = 0
(the default), the distribution
reduces to the classic beta distribution (in regression parameterization)
without boundary observations.
dxbeta
gives the density, pxbeta
gives the distribution
function, qxbeta
gives the quantile function, and rxbeta
generates random deviates.
Class and methods for extended-support beta distributions using the workflow from the distributions3 package.
XBeta(mu, phi, nu = 0)
XBeta(mu, phi, nu = 0)
mu |
numeric. The mean of the underlying beta distribution on [-nu, 1 + nu]. |
phi |
numeric. The precision parameter of the underlying beta distribution on [-nu, 1 + nu]. |
nu |
numeric. Exceedence parameter for the support of the underlying beta distribution on [-nu, 1 + nu] that is censored to [0, 1]. |
In order to obtain an extended-support beta distribution on [0, 1]
an additional exceedence parameter nu
is introduced. If nu > 0
,
this scales the underlying beta distribution to the interval [-nu, 1 + nu]
where the tails are subsequently censored to the unit interval [0, 1] with
point masses on the boundaries 0 and 1. Thus, nu
controls how likely
boundary observations are and for nu = 0
(the default), the distribution
reduces to the classic beta distribution (in regression parameterization)
without boundary observations.
A XBeta
distribution object.
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- XBeta( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2), nu = c(0, 0.1, 0.2) ) X ## compute moments of the distribution mean(X) variance(X) ## support interval (minimum and maximum) support(X) ## it is only continuous when there are no point masses on the boundary is_continuous(X) cdf(X, 0) cdf(X, 1, lower.tail = FALSE) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- XBeta( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2), nu = c(0, 0.1, 0.2) ) X ## compute moments of the distribution mean(X) variance(X) ## support interval (minimum and maximum) support(X) ## it is only continuous when there are no point masses on the boundary is_continuous(X) cdf(X, 0) cdf(X, 1, lower.tail = FALSE) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
Density, distribution function, quantile function, and random generation for the extended-support beta mixture distribution (in regression parameterization) on [0, 1].
dxbetax(x, mu, phi, nu = 0, log = FALSE, quad = 20) pxbetax(q, mu, phi, nu = 0, lower.tail = TRUE, log.p = FALSE, quad = 20) qxbetax(p, mu, phi, nu = 0, lower.tail = TRUE, log.p = FALSE, quad = 20, tol = .Machine$double.eps^0.7) rxbetax(n, mu, phi, nu = 0)
dxbetax(x, mu, phi, nu = 0, log = FALSE, quad = 20) pxbetax(q, mu, phi, nu = 0, lower.tail = TRUE, log.p = FALSE, quad = 20) qxbetax(p, mu, phi, nu = 0, lower.tail = TRUE, log.p = FALSE, quad = 20, tol = .Machine$double.eps^0.7) rxbetax(n, mu, phi, nu = 0)
x , q
|
numeric. Vector of quantiles. |
p |
numeric. Vector of probabilities. |
n |
numeric. Number of observations. If |
mu |
numeric. The mean of the underlying beta distribution on [-nu, 1 + nu]. |
phi |
numeric. The precision parameter of the underlying beta distribution on [-nu, 1 + nu]. |
nu |
numeric. Mean of the exponentially-distributed exceedence parameter for the underlying beta distribution on [-nu, 1 + nu] that is censored to [0, 1]. |
log , log.p
|
logical. If TRUE, probabilities p are given as log(p). |
lower.tail |
logical. If TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. |
quad |
numeric. The number of quadrature points for numeric integration of the continuous mixture. Alternatively, a matrix with nodes and weights for the quadrature points can be specified. |
tol |
numeric. Accuracy (convergence tolerance) for numerically
determining quantiles based on |
The extended-support beta mixture distribution is a continuous mixture of
extended-support beta distributions on [0, 1] where the underlying exceedence
parameter is exponentially distributed with mean nu
. Thus, if nu > 0
,
the resulting distribution has point masses on the boundaries 0 and 1 with larger
values of nu
leading to higher boundary probabilities. For nu = 0
(the default), the distribution reduces to the classic beta distribution (in
regression parameterization) without boundary observations.
dxbetax
gives the density, pxbetax
gives the distribution
function, qxbetax
gives the quantile function, and rxbetax
generates random deviates.
Class and methods for extended-support beta distributions using the workflow from the distributions3 package.
XBetaX(mu, phi, nu = 0)
XBetaX(mu, phi, nu = 0)
mu |
numeric. The mean of the underlying beta distribution on [-nu, 1 + nu]. |
phi |
numeric. The precision parameter of the underlying beta distribution on [-nu, 1 + nu]. |
nu |
numeric. Mean of the exponentially-distributed exceedence parameter for the underlying beta distribution on [-nu, 1 + nu] that is censored to [0, 1]. |
The extended-support beta mixture distribution is a continuous mixture of
extended-support beta distributions on [0, 1] where the underlying exceedence
parameter is exponentially distributed with mean nu
. Thus, if nu > 0
,
the resulting distribution has point masses on the boundaries 0 and 1 with larger
values of nu
leading to higher boundary probabilities. For nu = 0
(the default), the distribution reduces to the classic beta distribution (in
regression parameterization) without boundary observations.
A XBetaX
distribution object.
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- XBetaX( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2), nu = c(0, 0.1, 0.2) ) X ## compute moments of the distribution mean(X) variance(X) ## support interval (minimum and maximum) support(X) ## it is only continuous when there are no point masses on the boundary is_continuous(X) cdf(X, 0) cdf(X, 1, lower.tail = FALSE) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )
## package and random seed library("distributions3") set.seed(6020) ## three beta distributions X <- XBetaX( mu = c(0.25, 0.50, 0.75), phi = c(1, 1, 2), nu = c(0, 0.1, 0.2) ) X ## compute moments of the distribution mean(X) variance(X) ## support interval (minimum and maximum) support(X) ## it is only continuous when there are no point masses on the boundary is_continuous(X) cdf(X, 0) cdf(X, 1, lower.tail = FALSE) ## simulate random variables random(X, 5) ## histograms of 1,000 simulated observations x <- random(X, 1000) hist(x[1, ]) hist(x[2, ]) hist(x[3, ]) ## probability density function (PDF) and log-density (or log-likelihood) x <- c(0.25, 0.5, 0.75) pdf(X, x) pdf(X, x, log = TRUE) log_pdf(X, x) ## cumulative distribution function (CDF) cdf(X, x) ## quantiles quantile(X, 0.5) ## cdf() and quantile() are inverses (except at censoring points) cdf(X, quantile(X, 0.5)) quantile(X, cdf(X, 1)) ## all methods above can either be applied elementwise or for ## all combinations of X and x, if length(X) = length(x), ## also the result can be assured to be a matrix via drop = FALSE p <- c(0.05, 0.5, 0.95) quantile(X, p, elementwise = FALSE) quantile(X, p, elementwise = TRUE) quantile(X, p, elementwise = TRUE, drop = FALSE) ## compare theoretical and empirical mean from 1,000 simulated observations cbind( "theoretical" = mean(X), "empirical" = rowMeans(random(X, 1000)) )