Title: | Infrastructure for Forecasting and Assessment of Probabilistic Models |
---|---|
Description: | Unified infrastructure for probabilistic models and distributional regressions: Probabilistic forecasting of in-sample and out-of-sample of probabilities, densities, quantiles, and moments. Probabilistic residuals and scoring via log-score (or log-likelihood), (continuous) ranked probability score, etc. Diagnostic graphics like rootograms, PIT histograms, (randomized) quantile residual Q-Q plots, and reliagrams (reliability diagrams). |
Authors: | Achim Zeileis [aut, cre] , Moritz N. Lang [aut] , Reto Stauffer [aut] , Christian Kleiber [ctb] , Ioannis Kosmidis [ctb] , Jakob Messner [ctb] |
Maintainer: | Achim Zeileis <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.3-0 |
Built: | 2024-11-20 05:51:27 UTC |
Source: | https://github.com/r-forge/topmodels |
Method to the crps
generic function from
the scoringRules package for numerically evaluating the (continuous) ranked probability
score (CRPS) of any probability distributions3 object.
crps.distribution( y, x, drop = TRUE, elementwise = NULL, gridsize = 500L, batchsize = 10000L, applyfun = NULL, cores = NULL, method = NULL, ... ) crps.Beta(y, x, drop = TRUE, elementwise = NULL, ...) crps.Bernoulli(y, x, drop = TRUE, elementwise = NULL, ...) crps.Binomial(y, x, drop = TRUE, elementwise = NULL, ...) crps.Erlang(y, x, drop = TRUE, elementwise = NULL, ...) crps.Exponential(y, x, drop = TRUE, elementwise = NULL, ...) crps.Gamma(y, x, drop = TRUE, elementwise = NULL, ...) crps.GEV(y, x, drop = TRUE, elementwise = NULL, ...) crps.Geometric(y, x, drop = TRUE, elementwise = NULL, ...) crps.Gumbel(y, x, drop = TRUE, elementwise = NULL, ...) crps.HyperGeometric(y, x, drop = TRUE, elementwise = NULL, ...) crps.Logistic(y, x, drop = TRUE, elementwise = NULL, ...) crps.LogNormal(y, x, drop = TRUE, elementwise = NULL, ...) crps.NegativeBinomial(y, x, drop = TRUE, elementwise = NULL, ...) crps.Normal(y, x, drop = TRUE, elementwise = NULL, ...) crps.Poisson(y, x, drop = TRUE, elementwise = NULL, ...) crps.StudentsT(y, x, drop = TRUE, elementwise = NULL, ...) crps.Uniform(y, x, drop = TRUE, elementwise = NULL, ...) crps.XBetaX(y, x, drop = TRUE, elementwise = NULL, method = "cdf", ...) crps.GAMLSS(y, x, drop = TRUE, elementwise = NULL, ...) crps.BAMLSS(y, x, drop = TRUE, elementwise = NULL, ...)
crps.distribution( y, x, drop = TRUE, elementwise = NULL, gridsize = 500L, batchsize = 10000L, applyfun = NULL, cores = NULL, method = NULL, ... ) crps.Beta(y, x, drop = TRUE, elementwise = NULL, ...) crps.Bernoulli(y, x, drop = TRUE, elementwise = NULL, ...) crps.Binomial(y, x, drop = TRUE, elementwise = NULL, ...) crps.Erlang(y, x, drop = TRUE, elementwise = NULL, ...) crps.Exponential(y, x, drop = TRUE, elementwise = NULL, ...) crps.Gamma(y, x, drop = TRUE, elementwise = NULL, ...) crps.GEV(y, x, drop = TRUE, elementwise = NULL, ...) crps.Geometric(y, x, drop = TRUE, elementwise = NULL, ...) crps.Gumbel(y, x, drop = TRUE, elementwise = NULL, ...) crps.HyperGeometric(y, x, drop = TRUE, elementwise = NULL, ...) crps.Logistic(y, x, drop = TRUE, elementwise = NULL, ...) crps.LogNormal(y, x, drop = TRUE, elementwise = NULL, ...) crps.NegativeBinomial(y, x, drop = TRUE, elementwise = NULL, ...) crps.Normal(y, x, drop = TRUE, elementwise = NULL, ...) crps.Poisson(y, x, drop = TRUE, elementwise = NULL, ...) crps.StudentsT(y, x, drop = TRUE, elementwise = NULL, ...) crps.Uniform(y, x, drop = TRUE, elementwise = NULL, ...) crps.XBetaX(y, x, drop = TRUE, elementwise = NULL, method = "cdf", ...) crps.GAMLSS(y, x, drop = TRUE, elementwise = NULL, ...) crps.BAMLSS(y, x, drop = TRUE, elementwise = NULL, ...)
y |
A distribution object, e.g., as created by
|
x |
A vector of elements whose CRPS should be determined given the
distribution |
drop |
logical. Should the result be simplified to a vector if possible? |
elementwise |
logical. Should each distribution in |
gridsize |
positive size of the grid used to approximate the CDF for the numerical calculation of the CRPS. |
batchsize |
maximum batch size. Used to split the input into batches. Lower values reduce required memory but may increase computation time. |
applyfun |
an optional |
cores |
numeric. If set to an integer the |
method |
character. Should the grid be set up on the observation scale
and |
... |
currently not used. |
The (continuous) ranked probability score (CRPS) for (univariate) probability
distributions can be computed based on the the object-oriented infrastructure
provided by the distributions3 package. The general crps.distribution
method does so by using numeric integration based on the cdf
and/or quantile
methods (for more details see below). Additionally, if dedicated closed-form
CRPS computations are provided by the scoringRules package for the specified
distribution, then these are used because they are both computationally faster
and numerically more precise. For example, the crps
method for Normal
objects leverages crps_norm
rather than relying on
numeric integration.
The general method for any distribution
object uses the following strategy
for numerical CRPS computation. By default (if the method
argument is NULL
),
it distinguishes distributions whose entire support is continuous, or whose entire
support is discrete, or mixed discrete-continuous distribution using
is_continuous
and is_discrete
,
respectively.
For continuous and mixed distributions, an equidistant grid of gridsize + 5
probabilities is drawn for which the corresponding quantile
s for each
distribution y
are calculated (including the observation x
). The
calculation of the CRPS then uses a trapezoidal approximation for the
numeric integration. For discrete distributions, gridsize
equidistant quantiles
(in steps of 1) are drawn and the corresponding probabilities from the cdf
are calculated for each distribution y
(including the observation x
)
and the CRPS calculated using numeric integration. If the gridsize
in steps of 1 is not
sufficient to cover the required range, the method falls back to the procedure used for
continuous and mixed distributions to approximate the CRPS.
If the method
argument is set to either "cdf"
or "quantile"
,
then the specific strategy for setting up the grid of observations and corresponding
probabilities can be enforced. This can be useful if for a certain distribution
class, only a cdf
or only a quantile
method is available or only
one of them is numerically stable or computationally efficient etc.
The numeric approximation requires to set up a matrix of dimension
length(y) * (gridsize + 5)
(or length(y) * (gridsize + 1)
) which may be very
memory intensive if length(y)
and/or gridsize
are large. Thus, the data is
split batches of (approximately) equal size, not larger than batchsize
.
Thus, the memory requirement is reduced to batchsize * (gridsize + 5)
in each step.
Hence, a smaller value of batchsize
will reduce memory footprint but will
slightly increase computation time.
The error (deviation between numerical approximation and analytic solution)
has been shown to be in the order of 1e-2
for a series of distributions
tested. Accuracy can be increased by increasing gridsize
and will be lower
for a smaller gridsize
.
For parallelization of the numeric computations, a suitable applyfun
can be
provided that carries out the integration for each element of y
. To facilitate
setting up a suitable applyfun
using the basic parallel package, the
argument cores
is provided for convenience. When used, y
is split into B
equidistant batches; at least B = cores
batches or
a multiple of cores
with a maximum size of batchsize
. On systems running
Windows parlapply
is used, else mclapply
.
In case of a single distribution object, either a numeric
vector of length(x)
(if drop = TRUE
, default) or a matrix with
length(x)
columns (if drop = FALSE
). In case of a vectorized distribution
object, a matrix with length(x)
columns containing all possible combinations.
set.seed(6020) ## three normal distributions X and observations x library("distributions3") X <- Normal(mu = c(0, 1, 2), sigma = c(2, 1, 1)) x <- c(0, 0, 1) ## evaluate crps ## using infrastructure from scoringRules (based on closed-form analytic equations) library("scoringRules") crps(X, x) ## using general distribution method explicitly (based on numeric integration) crps.distribution(X, x) ## analogously for Poisson distribution Y <- Poisson(c(0.5, 1, 2)) crps(Y, x) crps.distribution(Y, x)
set.seed(6020) ## three normal distributions X and observations x library("distributions3") X <- Normal(mu = c(0, 1, 2), sigma = c(2, 1, 1)) x <- c(0, 0, 1) ## evaluate crps ## using infrastructure from scoringRules (based on closed-form analytic equations) library("scoringRules") crps(X, x) ## using general distribution method explicitly (based on numeric integration) crps.distribution(X, x) ## analogously for Poisson distribution Y <- Poisson(c(0.5, 1, 2)) crps(Y, x) crps.distribution(Y, x)
An empirical distribution consists of a series of N
observations
out of a typically unknown distribution, i.e., a random sample 'X'.
Draws n
random values from the empirical ensemble
with replacement.
Please see the documentation of [Empirical()] for some properties of the empircal ensemble distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.
Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals. 'quantile()'
TODO(RETO): Check description
Empirical(x) pempirical(q, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE) dempirical(x, y, log = FALSE, method = "hist", ...) qempirical(p, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE, ...) rempirical(n, y, na.rm = TRUE) ## S3 method for class 'Empirical' mean(x, ...) ## S3 method for class 'Empirical' variance(x, ...) ## S3 method for class 'Empirical' skewness(x, type = 1L, ...) ## S3 method for class 'Empirical' kurtosis(x, type = 3L, ...) ## S3 method for class 'Empirical' random(x, n = 1L, drop = TRUE, ...) ## S3 method for class 'Empirical' pdf(d, x, drop = TRUE, elementwise = NULL, ...) ## S3 method for class 'Empirical' log_pdf(d, x, drop = TRUE, elementwise = NULL, ...) ## S3 method for class 'Empirical' cdf(d, x, drop = TRUE, elementwise = NULL, ...) ## S3 method for class 'Empirical' quantile(x, probs, drop = TRUE, elementwise = NULL, ...) ## S3 method for class 'Empirical' support(d, drop = TRUE, ...)
Empirical(x) pempirical(q, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE) dempirical(x, y, log = FALSE, method = "hist", ...) qempirical(p, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE, ...) rempirical(n, y, na.rm = TRUE) ## S3 method for class 'Empirical' mean(x, ...) ## S3 method for class 'Empirical' variance(x, ...) ## S3 method for class 'Empirical' skewness(x, type = 1L, ...) ## S3 method for class 'Empirical' kurtosis(x, type = 3L, ...) ## S3 method for class 'Empirical' random(x, n = 1L, drop = TRUE, ...) ## S3 method for class 'Empirical' pdf(d, x, drop = TRUE, elementwise = NULL, ...) ## S3 method for class 'Empirical' log_pdf(d, x, drop = TRUE, elementwise = NULL, ...) ## S3 method for class 'Empirical' cdf(d, x, drop = TRUE, elementwise = NULL, ...) ## S3 method for class 'Empirical' quantile(x, probs, drop = TRUE, elementwise = NULL, ...) ## S3 method for class 'Empirical' support(d, drop = TRUE, ...)
x |
A vector of elements whose cumulative probabilities you would like to determine given the distribution 'd'. |
q |
vector of quantiles. |
y |
vector of observations of the empirical distribution with two or more non-missing finite values. |
lower.tail |
logical; if |
na.rm |
logical evaluating to |
log , log.p
|
logical; if |
method |
character; the method to calculate the empirical density. Either |
... |
Currently not used. |
p |
vector of probabilities. |
n |
The number of samples to draw. Defaults to '1L'. |
type |
integer between |
drop |
logical. Should the result be simplified to a vector if possible? |
d |
An 'Empirical' object created by a call to [Empirical()]. |
elementwise |
logical. Should each distribution in |
probs |
A vector of probabilities. |
The creation function [Empirical()] allows for a variety of different objects
as main input x
.
* Vector: Assumes that the vector contains a series of observations from one empirical distribution.
* List (named or unnamed) of vectors: Each element in the list describes one empirical distribution defined by the numeric values in each of the vectors.
* Matrix or data.frame: Each row corresponds to one empirical distribution, whilst the columns contain the individual observations.
Missing values are allowed, however, each distribution requires at least two
finite observations (-Inf
/Inf
is replaced by NA
).
**Support**: , the set of all real numbers
**Mean**:
**Variance**:
**Skewness**:
(only defined for
)
(default)
For more details about the different types of sample skewness see Joanes and Gill (1998).
**Kurtosis**:
(only defined for
)
(default)
For more details about the different types of sample kurtosis see Joanes and Gill (1998).
**TODO(RETO)**: Add empirical distribution function information (step-function 1/N)
**Probability density function (p.d.f)**:
This function returns the same values that you get from a Z-table. Note 'quantile()' is the inverse of 'cdf()'. Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.
An 'Empirical' object.
In case of a single distribution object or 'n = 1', either a numeric vector of length 'n' (if 'drop = TRUE', default) or a 'matrix' with 'n' columns (if 'drop = FALSE').
In case of a single distribution object, either a numeric vector of length 'probs' (if 'drop = TRUE', default) or a 'matrix' with 'length(x)' columns (if 'drop = FALSE'). In case of a vectorized distribution object, a matrix with 'length(x)' columns containing all possible combinations.
In case of a single distribution object, either a numeric vector of length 'probs' (if 'drop = TRUE', default) or a 'matrix' with 'length(x)' columns (if 'drop = FALSE'). In case of a vectorized distribution object, a matrix with 'length(x)' columns containing all possible combinations.
In case of a single distribution object, either a numeric vector of length 'probs' (if 'drop = TRUE', default) or a 'matrix' with 'length(probs)' columns (if 'drop = FALSE'). In case of a vectorized distribution object, a matrix with 'length(probs)' columns containing all possible combinations.
In case of a single distribution object, a numeric vector of length 2 with the minimum and maximum value of the support (if 'drop = TRUE', default) or a 'matrix' with 2 columns. In case of a vectorized distribution object, a matrix with 2 columns containing all minima and maxima.
Joanes DN and Gill CA (1998). “Comparing Measures of Sample Skewness and Kurtosis.” Journal of the Royal Statistical Society D, 47(1), 183–189. doi:10.1111/1467-9884.00122
require("distributions3") set.seed(28) X <- Empirical(rnorm(50)) X mean(X) variance(X) skewness(X) kurtosis(X) random(X, 10) pdf(X, 2) log_pdf(X, 2) cdf(X, 4) quantile(X, 0.7) ### example: allowed types/classes of input arguments ## Single vector (will be coerced to numeric) Y1 <- rnorm(3, mean = -10) d1 <- Empirical(Y1) d1 mean(d1) ## Unnamed list of vectors Y2 <- list(as.character(rnorm(3, mean = -10)), runif(6), rpois(4, lambda = 15)) d2 <- Empirical(Y2) d2 mean(d2) ## Named list of vectors Y3 <- list("Normal" = as.character(rnorm(3, mean = -10)), "Uniform" = runif(6), "Poisson" = rpois(4, lambda = 15)) d3 <- Empirical(Y3) d3 mean(d3) ## Matrix or data.frame Y4 <- matrix(rnorm(20), ncol = 5, dimnames = list(sprintf("D_%d", 1:4), sprintf("obs_%d", 1:5))) d4 <- Empirical(Y4) d4 d5 <- Empirical(as.data.frame(Y4)) d5
require("distributions3") set.seed(28) X <- Empirical(rnorm(50)) X mean(X) variance(X) skewness(X) kurtosis(X) random(X, 10) pdf(X, 2) log_pdf(X, 2) cdf(X, 4) quantile(X, 0.7) ### example: allowed types/classes of input arguments ## Single vector (will be coerced to numeric) Y1 <- rnorm(3, mean = -10) d1 <- Empirical(Y1) d1 mean(d1) ## Unnamed list of vectors Y2 <- list(as.character(rnorm(3, mean = -10)), runif(6), rpois(4, lambda = 15)) d2 <- Empirical(Y2) d2 mean(d2) ## Named list of vectors Y3 <- list("Normal" = as.character(rnorm(3, mean = -10)), "Uniform" = runif(6), "Poisson" = rpois(4, lambda = 15)) d3 <- Empirical(Y3) d3 mean(d3) ## Matrix or data.frame Y4 <- matrix(rnorm(20), ncol = 5, dimnames = list(sprintf("D_%d", 1:4), sprintf("obs_%d", 1:5))) d4 <- Empirical(Y4) d4 d5 <- Empirical(as.data.frame(Y4)) d5
geom_*
and stat_*
for Producing Quantile Residual Q-Q Plots with 'ggplot2'Various geom_*
and stat_*
used within
autoplot
for producing quantile residual Q-Q plots.
geom_qqrplot( mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) stat_qqrplot_simint( mapping = NULL, data = NULL, geom = "qqrplot_simint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) geom_qqrplot_simint( mapping = NULL, data = NULL, stat = "qqrplot_simint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) stat_qqrplot_ref( mapping = NULL, data = NULL, geom = "qqrplot_ref", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, detrend = FALSE, identity = TRUE, probs = c(0.25, 0.75), scale = c("normal", "uniform"), ... ) geom_qqrplot_ref( mapping = NULL, data = NULL, stat = "qqrplot_ref", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, detrend = FALSE, identity = TRUE, probs = c(0.25, 0.75), scale = c("normal", "uniform"), ... ) geom_qqrplot_confint( mapping = NULL, data = NULL, stat = "qqrplot_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, detrend = FALSE, type = c("pointwise_internal", "pointwise", "ks", "ell"), level = 0.95, identity = TRUE, probs = c(0.25, 0.75), scale = c("normal", "uniform"), style = c("polygon", "line"), ... ) GeomQqrplotConfint
geom_qqrplot( mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) stat_qqrplot_simint( mapping = NULL, data = NULL, geom = "qqrplot_simint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) geom_qqrplot_simint( mapping = NULL, data = NULL, stat = "qqrplot_simint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) stat_qqrplot_ref( mapping = NULL, data = NULL, geom = "qqrplot_ref", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, detrend = FALSE, identity = TRUE, probs = c(0.25, 0.75), scale = c("normal", "uniform"), ... ) geom_qqrplot_ref( mapping = NULL, data = NULL, stat = "qqrplot_ref", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, detrend = FALSE, identity = TRUE, probs = c(0.25, 0.75), scale = c("normal", "uniform"), ... ) geom_qqrplot_confint( mapping = NULL, data = NULL, stat = "qqrplot_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, detrend = FALSE, type = c("pointwise_internal", "pointwise", "ks", "ell"), level = 0.95, identity = TRUE, probs = c(0.25, 0.75), scale = c("normal", "uniform"), style = c("polygon", "line"), ... ) GeomQqrplotConfint
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
Other arguments passed on to
|
geom |
The geometric object to use to display the data for this layer.
When using a
|
detrend |
logical, default |
identity |
logical. Should the identity line be plotted or a theoretical line
which passes through |
probs |
numeric vector of length two, representing probabilities of reference line used. |
scale |
character. Scale on which the quantile residuals will
be shown: |
type |
character. Method for creating the confidence intervals. By default, an internal point-wise routine ('pointwise_internal') is used. Alternatively, band methods provided by [qqconf::get_qq_band()] can be used: "ell" uses the equal local level method, "ks" uses the Kolmogorov-Smirnov test, and "pointwise" uses a slightly alternative implementation of pointwise bands. Note that for uniform scales, the identity line must be used for reference ('ref_identity = TRUE'). |
level |
numeric. The confidence level required, defaults to |
style |
character. Style for plotting confidence intervals. Either |
An object of class GeomQqrplotConfint
(inherits from Geom
, ggproto
, gg
) of length 6.
if (require("ggplot2")) { ## Fit model data("CrabSatellites", package = "countreg") m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## Compute qqrplot q1 <- qqrplot(m1_pois, plot = FALSE) q2 <- qqrplot(m2_pois, plot = FALSE) d <- c(q1, q2) ## Get label names xlab <- unique(attr(d, "xlab")) ylab <- unique(attr(d, "ylab")) main <- attr(d, "main") main <- make.names(main, unique = TRUE) d$group <- factor(d$group, labels = main) ## Polygon CI around identity line used as reference gg1 <- ggplot(data = d, aes(x = expected, y = observed, na.rm = TRUE)) + geom_qqrplot_ref() + geom_qqrplot_confint(fill = "red") + geom_qqrplot() + geom_qqrplot_simint( aes( x = simint_expected, ymin = simint_observed_lwr, ymax = simint_observed_upr, group = group ) ) + xlab(xlab) + ylab(ylab) gg1 gg1 + facet_wrap(~group) ## Polygon CI around robust reference line gg2 <- ggplot(data = d, aes(x = expected, y = observed, na.rm = TRUE)) + geom_qqrplot_ref(identity = FALSE, scale = attr(d, "scale")) + geom_qqrplot_confint(identity = FALSE, scale = attr(d, "scale"), style = "line") + geom_qqrplot() + geom_qqrplot_simint( aes( x = simint_expected, ymin = simint_observed_lwr, ymax = simint_observed_upr, group = group ) ) + xlab(xlab) + ylab(ylab) gg2 gg2 + facet_wrap(~group) ## Use different `scale`s with confidence intervals q1 <- qqrplot(m1_pois, scale = "uniform", plot = FALSE) q2 <- qqrplot(m2_pois, plot = FALSE) gg3 <- ggplot(data = q1, aes(x = expected, y = observed, na.rm = TRUE)) + geom_qqrplot_ref() + geom_qqrplot_confint(fill = "red", scale = "uniform") + geom_qqrplot() gg3 gg4 <- ggplot(data = q2, aes(x = expected, y = observed, na.rm = TRUE)) + geom_qqrplot_ref() + geom_qqrplot_confint(fill = "red", scale = "uniform") + geom_qqrplot() gg4 }
if (require("ggplot2")) { ## Fit model data("CrabSatellites", package = "countreg") m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## Compute qqrplot q1 <- qqrplot(m1_pois, plot = FALSE) q2 <- qqrplot(m2_pois, plot = FALSE) d <- c(q1, q2) ## Get label names xlab <- unique(attr(d, "xlab")) ylab <- unique(attr(d, "ylab")) main <- attr(d, "main") main <- make.names(main, unique = TRUE) d$group <- factor(d$group, labels = main) ## Polygon CI around identity line used as reference gg1 <- ggplot(data = d, aes(x = expected, y = observed, na.rm = TRUE)) + geom_qqrplot_ref() + geom_qqrplot_confint(fill = "red") + geom_qqrplot() + geom_qqrplot_simint( aes( x = simint_expected, ymin = simint_observed_lwr, ymax = simint_observed_upr, group = group ) ) + xlab(xlab) + ylab(ylab) gg1 gg1 + facet_wrap(~group) ## Polygon CI around robust reference line gg2 <- ggplot(data = d, aes(x = expected, y = observed, na.rm = TRUE)) + geom_qqrplot_ref(identity = FALSE, scale = attr(d, "scale")) + geom_qqrplot_confint(identity = FALSE, scale = attr(d, "scale"), style = "line") + geom_qqrplot() + geom_qqrplot_simint( aes( x = simint_expected, ymin = simint_observed_lwr, ymax = simint_observed_upr, group = group ) ) + xlab(xlab) + ylab(ylab) gg2 gg2 + facet_wrap(~group) ## Use different `scale`s with confidence intervals q1 <- qqrplot(m1_pois, scale = "uniform", plot = FALSE) q2 <- qqrplot(m2_pois, plot = FALSE) gg3 <- ggplot(data = q1, aes(x = expected, y = observed, na.rm = TRUE)) + geom_qqrplot_ref() + geom_qqrplot_confint(fill = "red", scale = "uniform") + geom_qqrplot() gg3 gg4 <- ggplot(data = q2, aes(x = expected, y = observed, na.rm = TRUE)) + geom_qqrplot_ref() + geom_qqrplot_confint(fill = "red", scale = "uniform") + geom_qqrplot() gg4 }
Generic function and methods for extracting response variables from new data based on fitted model objects.
newresponse(object, ...) ## Default S3 method: newresponse(object, newdata, na.action = na.pass, ...) ## S3 method for class 'glm' newresponse(object, newdata, na.action = na.pass, initialize = NULL, ...) ## S3 method for class 'distribution' newresponse(object, newdata, ...)
newresponse(object, ...) ## Default S3 method: newresponse(object, newdata, na.action = na.pass, ...) ## S3 method for class 'glm' newresponse(object, newdata, na.action = na.pass, initialize = NULL, ...) ## S3 method for class 'distribution' newresponse(object, newdata, ...)
object |
a fitted model object. For the |
... |
further arguments passed to methods. |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
na.action |
function determining how to handle missing
values in |
initialize |
logical. Should the response variable from |
newresponse
is a convenience function that supports functions like
proscore
or proresiduals
which assess
discrepancies between predictions/forecasts on new data and the corresponding
observed response variables.
The default method takes an approach that is similar to many predict
methods which rebuild the model.frame
after dropping the
response from the terms
of a model object. However, here
only the response variable is preserved and all explanatory variables are dropped.
Missing values values are typically preserved (i.e., using na.pass
).
If the new model.frame
contains a variable "(weights)"
,
it is preserved along with the response variable(s).
A method for distribution
objects is provided which expects that
newdata
is essentially already the corresponding new response.
Thus, it needs to be a vector (or data frame) of the same length as distribution
.
If it is not a data frame, yet, it is transformed to one but no further
modifications are made.
A data.frame
(model.frame
) containing the response variable
(and optionally a variable with "(weights)"
).
## linear regression model m <- lm(dist ~ speed, data = cars) ## extract response variable on data used for model fitting newresponse(m) ## extract response variable on "new" data newresponse(m, newdata = cars[1:3, ])
## linear regression model m <- lm(dist ~ speed, data = cars) ## extract response variable on data used for model fitting newresponse(m) ## extract response variable on "new" data newresponse(m, newdata = cars[1:3, ])
Probability integral transform (PIT) histograms graphically
compare empirical probabilities from fitted models
with a uniform distribution. If plot = TRUE
, the resulting object of
class "pithist"
is plotted by plot.pithist
or
autoplot.pithist
depending on whether the
package ggplot2
is loaded, before the "pithist"
object is returned.
pithist(object, ...) ## Default S3 method: pithist( object, newdata = NULL, plot = TRUE, class = NULL, scale = c("uniform", "normal"), breaks = NULL, type = c("expected", "random"), nsim = 1L, delta = NULL, simint = NULL, simint_level = 0.95, simint_nrep = 250, style = c("bar", "line"), freq = FALSE, expected = TRUE, confint = TRUE, xlab = "PIT", ylab = if (freq) "Frequency" else "Density", main = NULL, ... )
pithist(object, ...) ## Default S3 method: pithist( object, newdata = NULL, plot = TRUE, class = NULL, scale = c("uniform", "normal"), breaks = NULL, type = c("expected", "random"), nsim = 1L, delta = NULL, simint = NULL, simint_level = 0.95, simint_nrep = 250, style = c("bar", "line"), freq = FALSE, expected = TRUE, confint = TRUE, xlab = "PIT", ylab = if (freq) "Frequency" else "Density", main = NULL, ... )
object |
an object from which probability integral transforms can be
extracted using the generic function |
... |
further graphical parameters forwarded to the plotting functions. |
newdata |
an optional data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
plot |
logical or character. Should the |
class |
should the invisible return value be either a |
scale |
controls the scale on which the PIT residuals are computed: on
the probability scale ( |
breaks |
|
type |
character. In case of discrete distributions, should an expected
(non-normal) PIT histogram be computed according to Czado et al. (2009)
( |
nsim |
positive integer, defaults to |
delta |
|
simint |
|
simint_level |
numeric, defaults to |
simint_nrep |
numeric, defaults to |
style |
character specifying plotting style. For |
freq |
logical. If |
expected |
logical. Should the expected values be plotted as reference? |
confint |
logical. Should confident intervals be drawn? |
xlab , ylab , main
|
graphical parameters passed to
|
PIT histograms graphically evaluate the probability integral transform (PIT),
i.e., the value that the predictive CDF attains at the observation, with a
uniform distribution. For a well calibrated model fit, the PIT will have a
standard uniform distribution.
For computation, pithist
leverages the function
proresiduals
employing the procast
generic and then
essentially draws a hist
.
In addition to the plot
and autoplot
method for
pithist objects, it is also possible to combine two (or more) PIT histograms by
c
/rbind
, which creates a set of PIT histograms that can then be
plotted in one go.
An object of class "pithist"
inheriting from
data.frame
or tbl_df
conditional on the argument class
including the following variables:
x |
histogram interval midpoints on the x-axis, |
y |
bottom coordinate of the histogram bars, |
width |
widths of the histogram bars, |
confint_lwr |
lower bound of the confidence interval, |
confint_upr |
upper bound of the confidence interval, |
expected |
y-coordinate of the expected curve. |
Additionally, freq
, xlab
, ylab
, main
, and
confint_level
are stored as attributes.
Agresti A, Coull AB (1998). “Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions.” The American Statistician, 52(2), 119–126. doi:10.1080/00031305.1998.10480550
Czado C, Gneiting T, Held L (2009). “Predictive Model Assessment for Count Data.” Biometrics, 65(4), 1254–1261. doi:10.1111/j.1541-0420.2009.01191.x
Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach”, Journal of the Royal Statistical Society: Series A (General), 147(2), 278–292. doi:10.2307/2981683
Diebold FX, Gunther TA, Tay AS (1998). “Evaluating Density Forecasts with Applications to Financial Risk Management”. International Economic Review, 39(4), 863–883. doi:10.2307/2527342
Gneiting T, Balabdaoui F, Raftery AE (2007). “Probabilistic Forecasts, Calibration and Sharpness”. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 69(2), 243–268. doi:10.1111/j.1467-9868.2007.00587.x
plot.pithist
, proresiduals
, procast
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot pithist pithist(m1_lm) #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## compute and plot pithist as base graphic p1 <- pithist(m1_pois, plot = FALSE) p2 <- pithist(m2_pois, plot = FALSE) ## plot combined pithist as "ggplot2" graphic ggplot2::autoplot(c(p1, p2), single_graph = TRUE, style = "line", col = c(1, 2))
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot pithist pithist(m1_lm) #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## compute and plot pithist as base graphic p1 <- pithist(m1_pois, plot = FALSE) p2 <- pithist(m2_pois, plot = FALSE) ## plot combined pithist as "ggplot2" graphic ggplot2::autoplot(c(p1, p2), single_graph = TRUE, style = "line", col = c(1, 2))
Generic plotting functions for probability integral transform (PIT)
histograms of the class "pithist"
computed by link{pithist}
.
## S3 method for class 'pithist' plot( x, single_graph = FALSE, style = NULL, freq = NULL, expected = TRUE, confint = NULL, confint_level = 0.95, confint_type = c("exact", "approximation"), simint = NULL, xlim = c(NA, NA), ylim = c(0, NA), xlab = NULL, ylab = NULL, main = NULL, axes = TRUE, box = TRUE, col = "black", border = "black", lwd = NULL, lty = 1, alpha_min = 0.2, expected_col = NULL, expected_lty = NULL, expected_lwd = 1.75, confint_col = NULL, confint_lty = 2, confint_lwd = 1.75, confint_alpha = NULL, simint_col = "black", simint_lty = 1, simint_lwd = 1.75, ... ) ## S3 method for class 'pithist' lines( x, freq = NULL, expected = FALSE, confint = FALSE, confint_level = 0.95, confint_type = c("exact", "approximation"), simint = FALSE, col = "black", lwd = 2, lty = 1, expected_col = "black", expected_lty = 2, expected_lwd = 1.75, confint_col = "black", confint_lty = 1, confint_lwd = 1.75, confint_alpha = 1, simint_col = "black", simint_lty = 1, simint_lwd = 1.75, ... ) ## S3 method for class 'pithist' autoplot( object, single_graph = FALSE, style = NULL, freq = NULL, expected = NULL, confint = NULL, confint_level = 0.95, confint_type = c("exact", "approximation"), simint = NULL, xlim = c(NA, NA), ylim = c(0, NA), xlab = NULL, ylab = NULL, main = NULL, legend = FALSE, theme = NULL, colour = NULL, fill = NULL, size = NULL, linetype = NULL, alpha = NULL, expected_colour = NULL, expected_size = 0.75, expected_linetype = NULL, expected_alpha = NA, confint_colour = NULL, confint_fill = NULL, confint_size = 0.75, confint_linetype = NULL, confint_alpha = NULL, simint_colour = "black", simint_size = 0.5, simint_linetype = 1, simint_alpha = NA, ... )
## S3 method for class 'pithist' plot( x, single_graph = FALSE, style = NULL, freq = NULL, expected = TRUE, confint = NULL, confint_level = 0.95, confint_type = c("exact", "approximation"), simint = NULL, xlim = c(NA, NA), ylim = c(0, NA), xlab = NULL, ylab = NULL, main = NULL, axes = TRUE, box = TRUE, col = "black", border = "black", lwd = NULL, lty = 1, alpha_min = 0.2, expected_col = NULL, expected_lty = NULL, expected_lwd = 1.75, confint_col = NULL, confint_lty = 2, confint_lwd = 1.75, confint_alpha = NULL, simint_col = "black", simint_lty = 1, simint_lwd = 1.75, ... ) ## S3 method for class 'pithist' lines( x, freq = NULL, expected = FALSE, confint = FALSE, confint_level = 0.95, confint_type = c("exact", "approximation"), simint = FALSE, col = "black", lwd = 2, lty = 1, expected_col = "black", expected_lty = 2, expected_lwd = 1.75, confint_col = "black", confint_lty = 1, confint_lwd = 1.75, confint_alpha = 1, simint_col = "black", simint_lty = 1, simint_lwd = 1.75, ... ) ## S3 method for class 'pithist' autoplot( object, single_graph = FALSE, style = NULL, freq = NULL, expected = NULL, confint = NULL, confint_level = 0.95, confint_type = c("exact", "approximation"), simint = NULL, xlim = c(NA, NA), ylim = c(0, NA), xlab = NULL, ylab = NULL, main = NULL, legend = FALSE, theme = NULL, colour = NULL, fill = NULL, size = NULL, linetype = NULL, alpha = NULL, expected_colour = NULL, expected_size = 0.75, expected_linetype = NULL, expected_alpha = NA, confint_colour = NULL, confint_fill = NULL, confint_size = 0.75, confint_linetype = NULL, confint_alpha = NULL, simint_colour = "black", simint_size = 0.5, simint_linetype = 1, simint_alpha = NA, ... )
single_graph |
logical. Should all computed extended reliability
diagrams be plotted in a single graph? If yes, |
style |
|
freq |
|
expected |
logical. Should the expected values be plotted as reference? |
confint |
|
confint_level |
numeric in |
confint_type |
character. Which type of confidence interval should be plotted: '"exact"' or '"approximation"'. According to Agresti and Coull (1998), for interval estimation of binomial proportions an approximation can be better than exact. |
simint |
|
xlim , ylim , xlab , ylab , main , axes , box
|
graphical parameters. |
col , border , lwd , lty , alpha_min
|
graphical parameters for the main part of the base plot. |
simint_col , simint_lty , simint_lwd , confint_col , confint_lty , confint_lwd , confint_alpha , expected_col , expected_lty , expected_lwd
|
Further graphical parameters for the 'confint' and 'simint' line/polygon in the base plot. |
... |
further graphical parameters passed to the plotting function. |
object , x
|
an object of class |
legend |
logical. Should a legend be added in the |
theme |
Which 'ggplot2' theme should be used. If not set, |
colour , fill , size , linetype , alpha
|
graphical parameters for the histogram style part in the |
simint_colour , simint_size , simint_linetype , simint_alpha , confint_colour , confint_fill , confint_size , confint_linetype , expected_colour , expected_size , expected_linetype , expected_alpha
|
Further graphical parameters for the 'confint' and 'simint' line/polygon using |
PIT histograms graphically evaluate the probability integral transform (PIT), i.e., the value that the predictive CDF attains at the observation, with a uniform distribution. For a well calibrated model fit, the observation will be drawn from the predictive distribution and the PIT will have a standard uniform distribution.
PIT histograms can be rendered as ggplot2
or base R graphics by using
the generics autoplot
or plot
.
For a single base R graphically panel, lines
adds an additional PIT histogram.
Agresti A, Coull AB (1998). “Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions.” The American Statistician, 52(2), 119–126. doi:10.1080/00031305.1998.10480550
Czado C, Gneiting T, Held L (2009). “Predictive Model Assessment for Count Data.” Biometrics, 65(4), 1254–1261. doi:10.2307/2981683
Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach”, Journal of the Royal Statistical Society: Series A (General), 147(2), 278–292. doi:10.2307/2981683
Diebold FX, Gunther TA, Tay AS (1998). “Evaluating Density Forecasts with Applications to Financial Risk Management”. International Economic Review, 39(4), 863–883. doi:10.2307/2527342
Gneiting T, Balabdaoui F, Raftery AE (2007). “Probabilistic Forecasts, Calibration and Sharpness”. Journal of the Royal Statistical Society: Series B (Methodological). 69(2), 243–268. doi:10.1111/j.1467-9868.2007.00587.x
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot pithist pithist(m1_lm) ## customize colors and style pithist(m1_lm, expected_col = "blue", lty = 2, pch = 20, style = "line") ## add separate model if (require("crch", quietly = TRUE)) { m1_crch <- crch(dist ~ speed | speed, data = cars) #lines(pithist(m1_crch, plot = FALSE), col = 2, lty = 2, confint_col = 2) #FIXME } #------------------------------------------------------------------------------- if (require("crch")) { ## precipitation observations and forecasts for Innsbruck data("RainIbk", package = "crch") RainIbk <- sqrt(RainIbk) RainIbk$ensmean <- apply(RainIbk[, grep("^rainfc", names(RainIbk))], 1, mean) RainIbk$enssd <- apply(RainIbk[, grep("^rainfc", names(RainIbk))], 1, sd) RainIbk <- subset(RainIbk, enssd > 0) ## linear model w/ constant variance estimation m2_lm <- lm(rain ~ ensmean, data = RainIbk) ## logistic censored model m2_crch <- crch(rain ~ ensmean | log(enssd), data = RainIbk, left = 0, dist = "logistic") ## compute pithists pit2_lm <- pithist(m2_lm, plot = FALSE) pit2_crch <- pithist(m2_crch, plot = FALSE) ## plot in single graph with style "line" plot(c(pit2_lm, pit2_crch), col = c(1, 2), confint_col = c(1, 2), expected_col = 3, style = "line", single_graph = TRUE ) } #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m3_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) ## compute and plot pithist as "ggplot2" graphic pithist(m3_pois, plot = "ggplot2")
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot pithist pithist(m1_lm) ## customize colors and style pithist(m1_lm, expected_col = "blue", lty = 2, pch = 20, style = "line") ## add separate model if (require("crch", quietly = TRUE)) { m1_crch <- crch(dist ~ speed | speed, data = cars) #lines(pithist(m1_crch, plot = FALSE), col = 2, lty = 2, confint_col = 2) #FIXME } #------------------------------------------------------------------------------- if (require("crch")) { ## precipitation observations and forecasts for Innsbruck data("RainIbk", package = "crch") RainIbk <- sqrt(RainIbk) RainIbk$ensmean <- apply(RainIbk[, grep("^rainfc", names(RainIbk))], 1, mean) RainIbk$enssd <- apply(RainIbk[, grep("^rainfc", names(RainIbk))], 1, sd) RainIbk <- subset(RainIbk, enssd > 0) ## linear model w/ constant variance estimation m2_lm <- lm(rain ~ ensmean, data = RainIbk) ## logistic censored model m2_crch <- crch(rain ~ ensmean | log(enssd), data = RainIbk, left = 0, dist = "logistic") ## compute pithists pit2_lm <- pithist(m2_lm, plot = FALSE) pit2_crch <- pithist(m2_crch, plot = FALSE) ## plot in single graph with style "line" plot(c(pit2_lm, pit2_crch), col = c(1, 2), confint_col = c(1, 2), expected_col = 3, style = "line", single_graph = TRUE ) } #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m3_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) ## compute and plot pithist as "ggplot2" graphic pithist(m3_pois, plot = "ggplot2")
Generic plotting functions for Q-Q residual plots for objects of class "qqrplot"
returned by link{qqrplot}
.
## S3 method for class 'qqrplot' plot( x, single_graph = FALSE, detrend = NULL, simint = NULL, confint = NULL, confint_type = c("pointwise_internal", "pointwise", "ks", "ell"), confint_level = 0.95, ref = NULL, ref_type = NULL, xlim = c(NA, NA), ylim = c(NA, NA), xlab = NULL, ylab = NULL, main = NULL, axes = TRUE, box = TRUE, col = "black", pch = 19, simint_col = "black", simint_alpha = 0.2, confint_col = "black", confint_lty = 2, confint_lwd = 1.25, confint_alpha = NULL, ref_col = "black", ref_lty = 2, ref_lwd = 1.25, ... ) ## S3 method for class 'qqrplot' points( x, detrend = NULL, simint = FALSE, col = "black", pch = 19, simint_col = "black", simint_alpha = 0.2, ... ) ## S3 method for class 'qqrplot' autoplot( object, single_graph = FALSE, detrend = NULL, simint = NULL, confint = NULL, confint_type = c("pointwise_internal", "pointwise", "ks", "ell"), confint_level = 0.95, ref = NULL, ref_type = NULL, xlim = c(NA, NA), ylim = c(NA, NA), xlab = NULL, ylab = NULL, main = NULL, legend = FALSE, theme = NULL, alpha = NA, colour = "black", fill = NA, shape = 19, size = 2, stroke = 0.5, simint_fill = "black", simint_alpha = 0.2, confint_colour = NULL, confint_fill = NULL, confint_size = NULL, confint_linetype = NULL, confint_alpha = NULL, ref_colour = "black", ref_size = 0.5, ref_linetype = 2, ... )
## S3 method for class 'qqrplot' plot( x, single_graph = FALSE, detrend = NULL, simint = NULL, confint = NULL, confint_type = c("pointwise_internal", "pointwise", "ks", "ell"), confint_level = 0.95, ref = NULL, ref_type = NULL, xlim = c(NA, NA), ylim = c(NA, NA), xlab = NULL, ylab = NULL, main = NULL, axes = TRUE, box = TRUE, col = "black", pch = 19, simint_col = "black", simint_alpha = 0.2, confint_col = "black", confint_lty = 2, confint_lwd = 1.25, confint_alpha = NULL, ref_col = "black", ref_lty = 2, ref_lwd = 1.25, ... ) ## S3 method for class 'qqrplot' points( x, detrend = NULL, simint = FALSE, col = "black", pch = 19, simint_col = "black", simint_alpha = 0.2, ... ) ## S3 method for class 'qqrplot' autoplot( object, single_graph = FALSE, detrend = NULL, simint = NULL, confint = NULL, confint_type = c("pointwise_internal", "pointwise", "ks", "ell"), confint_level = 0.95, ref = NULL, ref_type = NULL, xlim = c(NA, NA), ylim = c(NA, NA), xlab = NULL, ylab = NULL, main = NULL, legend = FALSE, theme = NULL, alpha = NA, colour = "black", fill = NA, shape = 19, size = 2, stroke = 0.5, simint_fill = "black", simint_alpha = 0.2, confint_colour = NULL, confint_fill = NULL, confint_size = NULL, confint_linetype = NULL, confint_alpha = NULL, ref_colour = "black", ref_size = 0.5, ref_linetype = 2, ... )
x , object
|
an object of class |
single_graph |
logical, defaults to |
detrend |
logical. Should the qqrplot be detrended, i.e, plotted as a
'wormplot()'? If |
simint |
logical or quantile specification. Should the simint of quantiles of the randomized quantile residuals be visualized? |
confint |
logical or character string describing the style for plotting 'c("polygon", "line")'. |
confint_type |
character. Method for creating the confidence intervals. By default, an internal point-wise routine ('pointwise_internal') is used. Alternatively, band methods provided by [qqconf::get_qq_band()] can be used: "ell" uses the equal local level method, "ks" uses the Kolmogorov-Smirnov test, and "pointwise" uses a slightly alternative implementation of pointwise bands. Note that for uniform scales, the identity line must be used for reference ('ref_identity = TRUE'). |
confint_level |
numeric. The confidence level required, defaults to |
ref |
logical. Should a reference line be plotted? |
ref_type |
character specifying that the "identity" line should be used as as a reference or the "quartiles" of the quantile residuals should be used for defining the reference line. Alternatively, also a numeric vector of length two can be used to define the probabilities to be used for defining the reference line. Note, that the reference is also used for detrending the quantile residuals. |
xlim , ylim , axes , box
|
additional graphical
parameters for base plots, whereby |
xlab , ylab , main , ...
|
graphical plotting parameters passed to
|
col , pch
|
graphical parameters for the main part of the base plot. |
simint_col , simint_alpha , confint_col , confint_lty , confint_lwd , ref_col , ref_lty , ref_lwd
|
Further graphical parameters for the 'confint' and 'simint' line/polygon in the base plot. |
legend |
logical. Should a legend be added in the |
theme |
name of the 'ggplot2' theme to be used. If |
colour , fill , alpha , shape , size , stroke
|
graphical parameters passed to |
simint_fill , confint_colour , confint_fill , confint_size , confint_linetype , confint_alpha , ref_colour , ref_size , ref_linetype
|
Further graphical parameters for the 'confint' and 'simint' line/polygon using |
Q-Q residuals plots draw quantile residuals (by default on the standard normal
scale) against theoretical quantiles from the same distribution.
Alternatively, quantile residuals can also be compared on the uniform scale
(scale = "uniform"
) using no transformation.
Q-Q residuals plots can be rendered as ggplot2
or base R graphics by
using the generics autoplot
or
plot
. points
(points.qqrplot
) can be used to add Q-Q residuals to an
existing base R graphics panel.
Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236–244. doi:10.2307/1390802
qqrplot
, wormplot
,
proresiduals
, qqnorm
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot qqrplot qqrplot(m1_lm) ## customize colors qqrplot(m1_lm, plot = "base", ref_col = "blue", lty = 2, pch = 20) ## add separate model if (require("crch", quietly = TRUE)) { m1_crch <- crch(dist ~ speed | speed, data = cars) points(qqrplot(m1_crch, plot = FALSE), col = 2, lty = 2, simint = 2) } #------------------------------------------------------------------------------- if (require("crch")) { ## precipitation observations and forecasts for Innsbruck data("RainIbk", package = "crch") RainIbk <- sqrt(RainIbk) RainIbk$ensmean <- apply(RainIbk[,grep('^rainfc',names(RainIbk))], 1, mean) RainIbk$enssd <- apply(RainIbk[,grep('^rainfc',names(RainIbk))], 1, sd) RainIbk <- subset(RainIbk, enssd > 0) ## linear model w/ constant variance estimation m2_lm <- lm(rain ~ ensmean, data = RainIbk) ## logistic censored model m2_crch <- crch(rain ~ ensmean | log(enssd), data = RainIbk, left = 0, dist = "logistic") ## compute qqrplots qq2_lm <- qqrplot(m2_lm, plot = FALSE) qq2_crch <- qqrplot(m2_crch, plot = FALSE) ## plot in single graph plot(c(qq2_lm, qq2_crch), col = c(1, 2), simint_col = c(1, 2), single_graph = TRUE) } #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m3_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) ## compute and plot qqrplot as "ggplot2" graphic qqrplot(m3_pois, plot = "ggplot2")
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot qqrplot qqrplot(m1_lm) ## customize colors qqrplot(m1_lm, plot = "base", ref_col = "blue", lty = 2, pch = 20) ## add separate model if (require("crch", quietly = TRUE)) { m1_crch <- crch(dist ~ speed | speed, data = cars) points(qqrplot(m1_crch, plot = FALSE), col = 2, lty = 2, simint = 2) } #------------------------------------------------------------------------------- if (require("crch")) { ## precipitation observations and forecasts for Innsbruck data("RainIbk", package = "crch") RainIbk <- sqrt(RainIbk) RainIbk$ensmean <- apply(RainIbk[,grep('^rainfc',names(RainIbk))], 1, mean) RainIbk$enssd <- apply(RainIbk[,grep('^rainfc',names(RainIbk))], 1, sd) RainIbk <- subset(RainIbk, enssd > 0) ## linear model w/ constant variance estimation m2_lm <- lm(rain ~ ensmean, data = RainIbk) ## logistic censored model m2_crch <- crch(rain ~ ensmean | log(enssd), data = RainIbk, left = 0, dist = "logistic") ## compute qqrplots qq2_lm <- qqrplot(m2_lm, plot = FALSE) qq2_crch <- qqrplot(m2_crch, plot = FALSE) ## plot in single graph plot(c(qq2_lm, qq2_crch), col = c(1, 2), simint_col = c(1, 2), single_graph = TRUE) } #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m3_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) ## compute and plot qqrplot as "ggplot2" graphic qqrplot(m3_pois, plot = "ggplot2")
Generic plotting functions for reliability diagrams of the class "reliagram"
computed by link{reliagram}
.
## S3 method for class 'reliagram' plot( x, single_graph = FALSE, minimum = 0, confint = TRUE, ref = TRUE, xlim = c(0, 1), ylim = c(0, 1), xlab = NULL, ylab = NULL, main = NULL, col = "black", fill = adjustcolor("black", alpha.f = 0.2), alpha_min = 0.2, lwd = 2, pch = 19, lty = 1, type = NULL, add_hist = TRUE, add_info = TRUE, add_rug = TRUE, add_min = TRUE, axes = TRUE, box = TRUE, ... ) ## S3 method for class 'reliagram' lines( x, minimum = 0, confint = FALSE, ref = FALSE, col = "black", fill = adjustcolor("black", alpha.f = 0.2), alpha_min = 0.2, lwd = 2, pch = 19, lty = 1, type = "b", ... ) ## S3 method for class 'reliagram' autoplot( object, single_graph = FALSE, minimum = 0, confint = TRUE, ref = TRUE, xlim = c(0, 1), ylim = c(0, 1), xlab = NULL, ylab = NULL, main = NULL, colour = "black", fill = adjustcolor("black", alpha.f = 0.2), alpha_min = 0.2, size = 1, shape = 19, linetype = 1, type = NULL, add_hist = TRUE, add_info = TRUE, add_rug = TRUE, add_min = TRUE, legend = FALSE, ... )
## S3 method for class 'reliagram' plot( x, single_graph = FALSE, minimum = 0, confint = TRUE, ref = TRUE, xlim = c(0, 1), ylim = c(0, 1), xlab = NULL, ylab = NULL, main = NULL, col = "black", fill = adjustcolor("black", alpha.f = 0.2), alpha_min = 0.2, lwd = 2, pch = 19, lty = 1, type = NULL, add_hist = TRUE, add_info = TRUE, add_rug = TRUE, add_min = TRUE, axes = TRUE, box = TRUE, ... ) ## S3 method for class 'reliagram' lines( x, minimum = 0, confint = FALSE, ref = FALSE, col = "black", fill = adjustcolor("black", alpha.f = 0.2), alpha_min = 0.2, lwd = 2, pch = 19, lty = 1, type = "b", ... ) ## S3 method for class 'reliagram' autoplot( object, single_graph = FALSE, minimum = 0, confint = TRUE, ref = TRUE, xlim = c(0, 1), ylim = c(0, 1), xlab = NULL, ylab = NULL, main = NULL, colour = "black", fill = adjustcolor("black", alpha.f = 0.2), alpha_min = 0.2, size = 1, shape = 19, linetype = 1, type = NULL, add_hist = TRUE, add_info = TRUE, add_rug = TRUE, add_min = TRUE, legend = FALSE, ... )
single_graph |
logical. Should all computed extended reliability diagrams be plotted in a single graph? |
minimum , ref , xlim , ylim , col , fill , alpha_min , lwd , pch , lty , type , add_hist , add_info , add_rug , add_min , axes , box
|
additional graphical
parameters for base plots, whereby |
confint |
logical. Should confident intervals be calculated and drawn? |
xlab , ylab , main
|
graphical parameters. |
... |
further graphical parameters. |
object , x
|
an object of class |
colour , size , shape , linetype , legend
|
graphical parameters passed for
|
Reliagrams evaluate if a probability model is calibrated (reliable) by first partitioning the forecast probability for a binary event into a certain number of bins and then plotting (within each bin) the averaged forecast probability against the observered/empirical relative frequency.
For continous probability forecasts, reliability diagrams can be plotted either for a pre-specified threshold or for a specific quantile probability of the response values.
Reliagrams can be rendered as ggplot2
or base R graphics by using
the generics autoplot
or plot
.
For a single base R graphically panel, points
adds an additional
reliagram.
Wilks DS (2011) Statistical Methods in the Atmospheric Sciences, 3rd ed., Academic Press, 704 pp.
link{reliagram}
, procast
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot reliagram reliagram(m1_lm) ## customize colors reliagram(m1_lm, ref = "blue", lty = 2, pch = 20) ## add separate model if (require("crch", quietly = TRUE)) { m1_crch <- crch(dist ~ speed | speed, data = cars) lines(reliagram(m1_crch, plot = FALSE), col = 2, lty = 2, confint = 2) } #------------------------------------------------------------------------------- if (require("crch")) { ## precipitation observations and forecasts for Innsbruck data("RainIbk", package = "crch") RainIbk <- sqrt(RainIbk) RainIbk$ensmean <- apply(RainIbk[,grep('^rainfc',names(RainIbk))], 1, mean) RainIbk$enssd <- apply(RainIbk[,grep('^rainfc',names(RainIbk))], 1, sd) RainIbk <- subset(RainIbk, enssd > 0) ## linear model w/ constant variance estimation m2_lm <- lm(rain ~ ensmean, data = RainIbk) ## logistic censored model m2_crch <- crch(rain ~ ensmean | log(enssd), data = RainIbk, left = 0, dist = "logistic") ## compute reliagrams rel2_lm <- reliagram(m2_lm, plot = FALSE) rel2_crch <- reliagram(m2_crch, plot = FALSE) ## plot in single graph plot(c(rel2_lm, rel2_crch), col = c(1, 2), confint = c(1, 2), ref = 3, single_graph = TRUE) } #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m3_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) ## compute and plot reliagram as "ggplot2" graphic reliagram(m3_pois, plot = "ggplot2")
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot reliagram reliagram(m1_lm) ## customize colors reliagram(m1_lm, ref = "blue", lty = 2, pch = 20) ## add separate model if (require("crch", quietly = TRUE)) { m1_crch <- crch(dist ~ speed | speed, data = cars) lines(reliagram(m1_crch, plot = FALSE), col = 2, lty = 2, confint = 2) } #------------------------------------------------------------------------------- if (require("crch")) { ## precipitation observations and forecasts for Innsbruck data("RainIbk", package = "crch") RainIbk <- sqrt(RainIbk) RainIbk$ensmean <- apply(RainIbk[,grep('^rainfc',names(RainIbk))], 1, mean) RainIbk$enssd <- apply(RainIbk[,grep('^rainfc',names(RainIbk))], 1, sd) RainIbk <- subset(RainIbk, enssd > 0) ## linear model w/ constant variance estimation m2_lm <- lm(rain ~ ensmean, data = RainIbk) ## logistic censored model m2_crch <- crch(rain ~ ensmean | log(enssd), data = RainIbk, left = 0, dist = "logistic") ## compute reliagrams rel2_lm <- reliagram(m2_lm, plot = FALSE) rel2_crch <- reliagram(m2_crch, plot = FALSE) ## plot in single graph plot(c(rel2_lm, rel2_crch), col = c(1, 2), confint = c(1, 2), ref = 3, single_graph = TRUE) } #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m3_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) ## compute and plot reliagram as "ggplot2" graphic reliagram(m3_pois, plot = "ggplot2")
Generic plotting functions for rootograms of the class "rootogram"
computed by link{rootogram}
.
## S3 method for class 'rootogram' plot( x, style = NULL, scale = NULL, expected = NULL, ref = NULL, confint = NULL, confint_level = 0.95, confint_type = c("tukey", "pointwise", "simultaneous"), confint_nrep = 1000, xlim = c(NA, NA), ylim = c(NA, NA), xlab = NULL, ylab = NULL, main = NULL, axes = TRUE, box = FALSE, col = "darkgray", border = "black", lwd = 1, lty = 1, alpha_min = 0.8, expected_col = 2, expected_pch = 19, expected_lty = 1, expected_lwd = 2, confint_col = "black", confint_lty = 2, confint_lwd = 1.75, ref_col = "black", ref_lty = 1, ref_lwd = 1.25, ... ) ## S3 method for class 'rootogram' autoplot( object, style = NULL, scale = NULL, expected = NULL, ref = NULL, confint = NULL, confint_level = 0.95, confint_type = c("tukey", "pointwise", "simultaneous"), confint_nrep = 1000, xlim = c(NA, NA), ylim = c(NA, NA), xlab = NULL, ylab = NULL, main = NULL, legend = FALSE, theme = NULL, colour = "black", fill = "darkgray", size = 0.5, linetype = 1, alpha = NA, expected_colour = 2, expected_size = 1, expected_linetype = 1, expected_alpha = 1, expected_fill = NA, expected_stroke = 0.5, expected_shape = 19, confint_colour = "black", confint_size = 0.5, confint_linetype = 2, confint_alpha = NA, ref_colour = "black", ref_size = 0.5, ref_linetype = 1, ref_alpha = NA, ... )
## S3 method for class 'rootogram' plot( x, style = NULL, scale = NULL, expected = NULL, ref = NULL, confint = NULL, confint_level = 0.95, confint_type = c("tukey", "pointwise", "simultaneous"), confint_nrep = 1000, xlim = c(NA, NA), ylim = c(NA, NA), xlab = NULL, ylab = NULL, main = NULL, axes = TRUE, box = FALSE, col = "darkgray", border = "black", lwd = 1, lty = 1, alpha_min = 0.8, expected_col = 2, expected_pch = 19, expected_lty = 1, expected_lwd = 2, confint_col = "black", confint_lty = 2, confint_lwd = 1.75, ref_col = "black", ref_lty = 1, ref_lwd = 1.25, ... ) ## S3 method for class 'rootogram' autoplot( object, style = NULL, scale = NULL, expected = NULL, ref = NULL, confint = NULL, confint_level = 0.95, confint_type = c("tukey", "pointwise", "simultaneous"), confint_nrep = 1000, xlim = c(NA, NA), ylim = c(NA, NA), xlab = NULL, ylab = NULL, main = NULL, legend = FALSE, theme = NULL, colour = "black", fill = "darkgray", size = 0.5, linetype = 1, alpha = NA, expected_colour = 2, expected_size = 1, expected_linetype = 1, expected_alpha = 1, expected_fill = NA, expected_stroke = 0.5, expected_shape = 19, confint_colour = "black", confint_size = 0.5, confint_linetype = 2, confint_alpha = NA, ref_colour = "black", ref_size = 0.5, ref_linetype = 1, ref_alpha = NA, ... )
x , object
|
an object of class |
style |
character specifying the syle of rootogram. |
scale |
character specifying whether raw frequencies or their square roots (default) should be drawn. |
expected |
Should the expected (fitted) frequencies be plotted? |
ref |
logical. Should a reference line be plotted? |
confint |
logical. Should confident intervals be drawn? |
confint_level |
numeric. The confidence level required. |
confint_type |
character. Should |
confint_nrep |
numeric. The repetition number of simulation for computing the confidence intervals. |
xlim , ylim , xlab , ylab , main , axes , box
|
graphical parameters. |
col , border , lwd , lty , alpha_min
|
graphical parameters for the histogram style part of the base plot. |
expected_col , expected_pch , expected_lty , expected_lwd , ref_col , ref_lty , ref_lwd , expected_colour , expected_size , expected_linetype , expected_alpha , expected_fill , expected_stroke , expected_shape , ref_colour , ref_size , ref_linetype , ref_alpha , confint_col , confint_lty , confint_lwd , confint_colour , confint_size , confint_linetype , confint_alpha
|
Further graphical parameters for the 'expected' and 'ref' line using either |
... |
further graphical parameters passed to the plotting function. |
legend |
logical. Should a legend be added in the |
theme |
Which 'ggplot2' theme should be used. If not set, |
colour , fill , size , linetype , alpha
|
graphical parameters for the histogram style part in the |
Rootograms graphically compare (square roots) of empirical frequencies with
expected (fitted) frequencies from a probability model. For the observed distribution the histogram is
drawn on a square root scale (hence the name) and superimposed with a line
for the expected frequencies. The histogram can be "standing"
on the
x-axis (as usual), or "hanging"
from the expected (fitted) curve, or a
"suspended"
histogram of deviations can be drawn.
Rootograms are associated with the work of John W. Tukey (see Tukey 1977) and were originally proposed for assessing the goodness of fit of univariate distributions and extended by Kleiber and Zeileis (2016) to regression setups.
As the expected distribution is typically a sum of different conditional
distributions in regression models, the "pointwise"
confidence intervals
for each bin can be computed from mid-quantiles of a Poisson-Binomial distribution
(Wilson and Einbeck 2021). Corresponding "simultaneous"
confidence intervals
for all bins can be obtained via simulation from the Poisson-Binomial distributions.
As the pointwise confidence intervals are typically not substantially different from
the warning limits of Tukey (1972, p. 61), set at +/- 1, these "tukey"
intervals
are used by default.
Note that for computing the exact "pointwise"
intervals from the Poisson-Binomial
distribution, the PoissonBinomial needs to be installed. Otherwise, a warning
is issueed and a normal approximation is used.
Kleiber C, Zeileis A (2016). “Visualizing Count Data Regressions Using Rootograms.” The American Statistician, 70(3), 296–303. doi:10.1080/00031305.2016.1173590
Tukey JW (1972), “Some Graphic and Semigraphic Displays,” in Statistical Papers in Honor of George W. Snedecor, pp.293–316. Bancroft TA (Ed.). Iowa State University Press, Ames. Reprinted in William S. Cleveland (Ed.) (1988). The Collected Works of John W. Tukey, Volume V. Graphics: 1965–1985, Wadsworth & Brooks/Cole, Pacific Grove.
Tukey JW (1977). Exploratory Data Analysis. Addison-Wesley, Reading.
Wilson P, Einbeck J (2021). “A Graphical Tool for Assessing the Suitability of a Count Regression Model”, Austrian Journal of Statistics, 50(1), 1–23. doi:10.17713/ajs.v50i1.921
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot rootogram rootogram(m1_lm) ## customize colors rootogram(m1_lm, ref_col = "blue", lty = 2, pch = 20) #------------------------------------------------------------------------------- if (require("crch")) { ## precipitation observations and forecasts for Innsbruck data("RainIbk", package = "crch") RainIbk <- sqrt(RainIbk) RainIbk$ensmean <- apply(RainIbk[, grep("^rainfc", names(RainIbk))], 1, mean) RainIbk$enssd <- apply(RainIbk[, grep("^rainfc", names(RainIbk))], 1, sd) RainIbk <- subset(RainIbk, enssd > 0) ## linear model w/ constant variance estimation m2_lm <- lm(rain ~ ensmean, data = RainIbk) ## logistic censored model m2_crch <- crch(rain ~ ensmean | log(enssd), data = RainIbk, left = 0, dist = "logistic") ### compute rootograms FIXME #r2_lm <- rootogram(m2_lm, plot = FALSE) #r2_crch <- rootogram(m2_crch, plot = FALSE) ### plot in single graph #plot(c(r2_lm, r2_crch), col = c(1, 2)) } #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m3_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) ## compute and plot rootogram as "ggplot2" graphic rootogram(m3_pois, plot = "ggplot2") #------------------------------------------------------------------------------- ## artificial data from negative binomial (mu = 3, theta = 2) ## and Poisson (mu = 3) distribution set.seed(1090) y <- rnbinom(100, mu = 3, size = 2) x <- rpois(100, lambda = 3) ## glm method: fitted values via glm() m4_pois <- glm(y ~ x, family = poisson) ## correctly specified Poisson model fit par(mfrow = c(1, 3)) r4a_pois <- rootogram(m4_pois, style = "standing", ylim = c(-2.2, 4.8), main = "Standing") r4b_pois <- rootogram(m4_pois, style = "hanging", ylim = c(-2.2, 4.8), main = "Hanging") r4c_pois <- rootogram(m4_pois, style = "suspended", ylim = c(-2.2, 4.8), main = "Suspended") par(mfrow = c(1, 1))
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot rootogram rootogram(m1_lm) ## customize colors rootogram(m1_lm, ref_col = "blue", lty = 2, pch = 20) #------------------------------------------------------------------------------- if (require("crch")) { ## precipitation observations and forecasts for Innsbruck data("RainIbk", package = "crch") RainIbk <- sqrt(RainIbk) RainIbk$ensmean <- apply(RainIbk[, grep("^rainfc", names(RainIbk))], 1, mean) RainIbk$enssd <- apply(RainIbk[, grep("^rainfc", names(RainIbk))], 1, sd) RainIbk <- subset(RainIbk, enssd > 0) ## linear model w/ constant variance estimation m2_lm <- lm(rain ~ ensmean, data = RainIbk) ## logistic censored model m2_crch <- crch(rain ~ ensmean | log(enssd), data = RainIbk, left = 0, dist = "logistic") ### compute rootograms FIXME #r2_lm <- rootogram(m2_lm, plot = FALSE) #r2_crch <- rootogram(m2_crch, plot = FALSE) ### plot in single graph #plot(c(r2_lm, r2_crch), col = c(1, 2)) } #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m3_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) ## compute and plot rootogram as "ggplot2" graphic rootogram(m3_pois, plot = "ggplot2") #------------------------------------------------------------------------------- ## artificial data from negative binomial (mu = 3, theta = 2) ## and Poisson (mu = 3) distribution set.seed(1090) y <- rnbinom(100, mu = 3, size = 2) x <- rpois(100, lambda = 3) ## glm method: fitted values via glm() m4_pois <- glm(y ~ x, family = poisson) ## correctly specified Poisson model fit par(mfrow = c(1, 3)) r4a_pois <- rootogram(m4_pois, style = "standing", ylim = c(-2.2, 4.8), main = "Standing") r4b_pois <- rootogram(m4_pois, style = "hanging", ylim = c(-2.2, 4.8), main = "Hanging") r4c_pois <- rootogram(m4_pois, style = "suspended", ylim = c(-2.2, 4.8), main = "Suspended") par(mfrow = c(1, 1))
Generic function and methods for computing various kinds of probabilistic forecasts from (regression) models.
procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ... ) ## Default S3 method: procast( object, newdata = NULL, na.action = na.pass, type = c("distribution", "mean", "variance", "quantile", "probability", "density", "loglikelihood", "parameters", "kurtosis", "skewness"), at = 0.5, drop = FALSE, ... ) ## S3 method for class 'lm' procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ..., sigma = "ML" ) ## S3 method for class 'glm' procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ..., dispersion = NULL ) ## S3 method for class 'bamlss' procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ..., distributions3 = FALSE ) ## S3 method for class 'disttree' procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ..., distributions3 = FALSE )
procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ... ) ## Default S3 method: procast( object, newdata = NULL, na.action = na.pass, type = c("distribution", "mean", "variance", "quantile", "probability", "density", "loglikelihood", "parameters", "kurtosis", "skewness"), at = 0.5, drop = FALSE, ... ) ## S3 method for class 'lm' procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ..., sigma = "ML" ) ## S3 method for class 'glm' procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ..., dispersion = NULL ) ## S3 method for class 'bamlss' procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ..., distributions3 = FALSE ) ## S3 method for class 'disttree' procast( object, newdata = NULL, na.action = na.pass, type = "distribution", at = 0.5, drop = FALSE, ..., distributions3 = FALSE )
object |
a fitted model object. For the |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
na.action |
function determining what should be done with missing
values in |
type |
character specifying the type of probabilistic forecast to
compute. The default is to return a |
at |
numeric vector at which the forecasts should be
evaluated if |
drop |
logical. Should forecasts be returned in a data frame (default) or (if possible) dropped to a vector, see return value description below. |
... |
further parameters passed to methods. In particular, this includes
the logical argument |
sigma |
character or numeric or |
dispersion |
character or numeric or |
distributions3 |
logical. If a dedicated distributions3 object
is available (e.g., such as |
The function procast
provides a unified framework for probabilistic
forcasting (or procasting, for short) based on probabilistic (regression)
models, also known as distributional regression approaches. Typical types
of predictions include quantiles, probabilities, (conditional) expectations,
variances, and (log-)densities. Internally, procast
methods typically
compute the predicted parameters for each observation and then compute the
desired outcome for the distributions with the respective parameters.
Some quantities, e.g., the moments of the distribution (like mean or variance),
can be computed directly from the predicted parameters of the
distribution while others require an additional argument at
which the
distribution is evaluated (e.g., the probability of a quantile or an
observation of the response).
The default procast
method leverages the S3 classes and methods for
probability distributions from the distributions3 package. In a first step
the predicted probability distribution object is obtained and, by default
(type = "distribution"
), returned in order to reflect the distributional
nature of the forecast. For all other type
s (e.g., "mean"
,
"quantile"
, or "density"
), the corresponding extractor methods
(e.g., mean
, quantile
, or pdf
) are used to
compute the desired quantity from the distribution objects. The examples
provide some worked illustrations.
Package authors or users, who want to enable procast
for new types
of model objects, only need to provide a suitable prodist
extractor for the predicted probability distribution. Then the default procast
works out of the box. However, if the distributions3 package does not support
the necessary probability distribution, then it may also be necessary to
implement a new distribution objects, see apply_dpqr
.
Either a data.frame
of predictions with the same number of rows
as the newdata
(or the original observations if that is NULL
).
If drop = TRUE
predictions with just a single column are simplified
to a vector and predictions with multiple columns to a matrix.
## load packages library("topmodels") library("distributions3") ## Poisson regression model for FIFA 2018 data: ## number of goals scored by each team in each game, explained by ## predicted ability difference of the competing teams data("FIFA2018", package = "distributions3") m <- glm(goals ~ difference, data = FIFA2018, family = poisson) ## predicted probability distributions for all matches (in sample) head(procast(m)) head(procast(m, drop = TRUE)) ## procasts for new data ## much lower, equal, and much higher ability than opponent nd <- data.frame(difference = c(-1, 0, 1)) ## predicted goal distribution object goals <- procast(m, newdata = nd, drop = TRUE) goals ## predicted densities/probabilities for scoring 0, 1, ..., 5 goals procast(m, newdata = nd, type = "density", at = 0:5) ## by hand pdf(goals, 0:5) ## means and medians procast(m, newdata = nd, type = "mean") procast(m, newdata = nd, type = "quantile", at = 0.5) ## by hand mean(goals) quantile(goals, 0.5) ## evaluate procast elementwise or for all possible combinations ## of distributions from 'nd' and observations in 'at' procast(m, newdata = nd, type = "probability", at = 1:3, elementwise = TRUE) procast(m, newdata = nd, type = "probability", at = 1:3, elementwise = FALSE) ## compute in-sample log-likelihood sum via procast sum(procast(m, type = "density", at = FIFA2018$goals, log = TRUE)) logLik(m)
## load packages library("topmodels") library("distributions3") ## Poisson regression model for FIFA 2018 data: ## number of goals scored by each team in each game, explained by ## predicted ability difference of the competing teams data("FIFA2018", package = "distributions3") m <- glm(goals ~ difference, data = FIFA2018, family = poisson) ## predicted probability distributions for all matches (in sample) head(procast(m)) head(procast(m, drop = TRUE)) ## procasts for new data ## much lower, equal, and much higher ability than opponent nd <- data.frame(difference = c(-1, 0, 1)) ## predicted goal distribution object goals <- procast(m, newdata = nd, drop = TRUE) goals ## predicted densities/probabilities for scoring 0, 1, ..., 5 goals procast(m, newdata = nd, type = "density", at = 0:5) ## by hand pdf(goals, 0:5) ## means and medians procast(m, newdata = nd, type = "mean") procast(m, newdata = nd, type = "quantile", at = 0.5) ## by hand mean(goals) quantile(goals, 0.5) ## evaluate procast elementwise or for all possible combinations ## of distributions from 'nd' and observations in 'at' procast(m, newdata = nd, type = "probability", at = 1:3, elementwise = TRUE) procast(m, newdata = nd, type = "probability", at = 1:3, elementwise = FALSE) ## compute in-sample log-likelihood sum via procast sum(procast(m, type = "density", at = FIFA2018$goals, log = TRUE)) logLik(m)
The function promodel
is a wrapper for dispatching the base
predict
and residuals
methods to
the procast
and proresiduals
functions for probabilistic forecasts and probabilistic residuals, respectively.
promodel(object) ## S3 method for class 'promodel' residuals(object, ...) ## S3 method for class 'promodel' predict(object, ...)
promodel(object) ## S3 method for class 'promodel' residuals(object, ...) ## S3 method for class 'promodel' predict(object, ...)
object |
a fitted model object for which |
... |
further arguments passed on to |
The default methods for procast
and
proresiduals
in this package make a wide range of
different probabilistic forecasts and probabilistic residuals
available for many fitted model object classes. However, it may sometimes be
useful to call these flexible methods via the base
predict
and residuals
methods.
For example, this may be useful in combination with other packages that
rely on the base functions such as marginaleffects.
Therefore, the promodel
wrapper function simply adds an additional
class "promodel"
(probabilistic model) to the original class of an
object. Then the methods for predict
and residuals
then
strip off this class again before calling procast
and proresiduals
,
respectively.
## Poisson regression model for FIFA 2018 data: ## number of goals scored by each team in each game, explained by ## predicted ability difference of the competing teams data("FIFA2018", package = "distributions3") m <- glm(goals ~ difference, data = FIFA2018, family = poisson) ## prediction using a new data set (final of the tournament) final <- tail(FIFA2018, 2) ## base predict method computes linear predictor on link scale (here in logs) predict(m, newdata = final) ## procast-based method computes distribution object by default pm <- promodel(m) predict(pm, newdata = final) ## all other procast types are available as well predict(pm, newdata = final, type = "density", at = 0:4) predict(pm, newdata = final, type = "cdf", at = 0:4) ## the base residuals method defaults to deviance residuals ## but the proresiduals-based method defaults to quantile residuals head(residuals(m)) head(residuals(pm))
## Poisson regression model for FIFA 2018 data: ## number of goals scored by each team in each game, explained by ## predicted ability difference of the competing teams data("FIFA2018", package = "distributions3") m <- glm(goals ~ difference, data = FIFA2018, family = poisson) ## prediction using a new data set (final of the tournament) final <- tail(FIFA2018, 2) ## base predict method computes linear predictor on link scale (here in logs) predict(m, newdata = final) ## procast-based method computes distribution object by default pm <- promodel(m) predict(pm, newdata = final) ## all other procast types are available as well predict(pm, newdata = final, type = "density", at = 0:4) predict(pm, newdata = final, type = "cdf", at = 0:4) ## the base residuals method defaults to deviance residuals ## but the proresiduals-based method defaults to quantile residuals head(residuals(m)) head(residuals(pm))
Generic function and default method for (randomized) quantile residuals, PIT, Pearson, and raw response residuals based on distributions3 support.
proresiduals(object, ...) ## Default S3 method: proresiduals( object, newdata = NULL, type = c("quantile", "pit", "pearson", "response"), nsim = NULL, prob = NULL, delta = NULL, ... )
proresiduals(object, ...) ## Default S3 method: proresiduals( object, newdata = NULL, type = c("quantile", "pit", "pearson", "response"), nsim = NULL, prob = NULL, delta = NULL, ... )
object |
an object for which a |
... |
further parameters passed to methods. |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
type |
character indicating whether quantile (default), PIT, Pearson, or raw response residuals should be computed. |
nsim |
integer. The number of randomly simulated residuals of |
prob |
numeric. Instead of simulating the probabilities (between 0 and 1) for
|
delta |
numeric. The minimal difference to compute the range of
proabilities corresponding to each observation according to get (randomized)
|
The new generic function proresiduals
comes with a powerful default
method that is based on the following idea: newresponse
and prodist
can be used to extract the observed
response and expected distribution for it, respectively. For all model classes
that have methods for these two generic functions, proresiduals
can
compute a range of different type
s of residuals.
The simplest definition of residuals are the so-called "response"
residuals
which simply compute the difference between the observations and the expected means.
The "pearson"
residuals additionally standardize these residuals by the
square root of the expected variance. Thus, these residuals are based only on the
first and on the first two moments, respectively.
To assess the entire distribution and not just the first moments, there are also
residuals based on the probability integral transform (PIT).
For regression models with a continuous response distribution, "pit"
residuals
(see Warton 2007) are simply the expected cumulative distribution (CDF) evaluated at the
observations (Dawid, 1984). For discrete distributions, a uniform random value is drawn
from the range of probabilities between the CDF at the observation and the supremum
of the CDF to the left of it. If the model fits well the PIT residuals should be uniformly
distributed.
In order to obtain normally distributed residuals for well-fitting models (like often
desired in linear regression models), "quantile"
residuals, proposed by Dunn and
Smyth (1996), additionally transform the PIT residuals by the standard normal quantile function.
As quantile residuals and PIT residuals are subject to randomness for discrete distributions
(and also for mixed discrete-continuous distributions), it is sometimes
useful to explore the extent of the random variation. This can be done either by
obtaining multiple replications (via nsim
) or by computing fixed quantiles
of each probability interval such as prob = 0.5
(corresponding to mid-quantile residuals,
see Feng et al. 2020). Another common setting is prob = c(0, 1)
yielding the range of
possible residuals.
A vector or matrix of residuals. A matrix of residuals is returned
if more than one replication of quantile or PIT residuals is computed, i.e., if either
random > 1
or random = FALSE
and length(prob) > 1
.
Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society A, 147(2), 278–292. doi:10.2307/2981683.
Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236–244. doi:10.2307/1390802
Feng C, Li L, Sadeghpour A (2020). “A Comparison of Residual Diagnosis Tools for Diagnosing Regression Models for Count Data” BMC Medical Research Methodology, 20(175), 1–21. doi:10.1186/s12874-020-01055-2
Warton DI, Thibaut L, Wang YA (2017) “The PIT-Trap – A ‘Model-Free’ Bootstrap Procedure for Inference about Regression Models with Discrete, Multivariate Responses”. PLOS ONE, 12(7), 1–18. doi:10.1371/journal.pone.0181790.
## Poisson GLM for FIFA 2018 data data("FIFA2018", package = "distributions3") m <- glm(goals ~ difference, data = FIFA2018, family = poisson) ## random quantile residuals (on original data) proresiduals(m) ## various flavors of residuals on small new data nd <- data.frame(goals = c(1, 1, 1), difference = c(-1, 0, 1)) ## quantile residuals: random (1 sample), random (5 samples), mid-quantile (non-random) proresiduals(m, newdata = nd, type = "quantile") proresiduals(m, newdata = nd, type = "quantile", nsim = 5) proresiduals(m, newdata = nd, type = "quantile", prob = 0.5) ## PIT residuals (without transformation to normal): random vs. minimum/maximum quantile proresiduals(m, newdata = nd, type = "pit", nsim = 5) proresiduals(m, newdata = nd, type = "pit", prob = c(0, 1)) ## raw response residuals (observation - expected mean) proresiduals(m, newdata = nd, type = "response") ## standardized Pearson residuals (response residuals divided by standard deviation) proresiduals(m, newdata = nd, type = "pearson") ## compute residuals by manually obtaining distribution and response ## proresiduals(procast(m, newdata = nd, drop = TRUE), nd$goals)
## Poisson GLM for FIFA 2018 data data("FIFA2018", package = "distributions3") m <- glm(goals ~ difference, data = FIFA2018, family = poisson) ## random quantile residuals (on original data) proresiduals(m) ## various flavors of residuals on small new data nd <- data.frame(goals = c(1, 1, 1), difference = c(-1, 0, 1)) ## quantile residuals: random (1 sample), random (5 samples), mid-quantile (non-random) proresiduals(m, newdata = nd, type = "quantile") proresiduals(m, newdata = nd, type = "quantile", nsim = 5) proresiduals(m, newdata = nd, type = "quantile", prob = 0.5) ## PIT residuals (without transformation to normal): random vs. minimum/maximum quantile proresiduals(m, newdata = nd, type = "pit", nsim = 5) proresiduals(m, newdata = nd, type = "pit", prob = c(0, 1)) ## raw response residuals (observation - expected mean) proresiduals(m, newdata = nd, type = "response") ## standardized Pearson residuals (response residuals divided by standard deviation) proresiduals(m, newdata = nd, type = "pearson") ## compute residuals by manually obtaining distribution and response ## proresiduals(procast(m, newdata = nd, drop = TRUE), nd$goals)
Generic function and default method for computing various kinds of scores for fitted or predicted probability distributions from (regression) models.
proscore(object, newdata = NULL, ...) ## Default S3 method: proscore( object, newdata = NULL, na.action = na.pass, type = c("logs", "crps"), aggregate = TRUE, drop = FALSE, ... )
proscore(object, newdata = NULL, ...) ## Default S3 method: proscore( object, newdata = NULL, na.action = na.pass, type = c("logs", "crps"), aggregate = TRUE, drop = FALSE, ... )
object |
a fitted model object. For the |
newdata |
optionally, a data frame in which to look for variables with which to predict and from which to obtain the response variable. If omitted, the original observations are used. |
... |
further parameters passed to the |
na.action |
function determining what should be done with missing
values in |
type |
character specifying the type of score to compute. Avaible types:
|
aggregate |
logical or function to be used for aggregating scores across
observations. Setting |
drop |
logical. Should scores be returned in a data frame (default) or (if possible) dropped to a vector? |
The function proscore
provides a unified framework for scoring
probabilistic forecasts (in-sample or out-of-sample). The following scores
are currently available, using the following notation: is the predicted
random variable with cumulative distribution function
and probability
density function
. The corresponding expectation and variance are denoted
by
and
. The actual observation is
.
Log-score: Also known as logarithmic score. This is the negative log-likelihood where the negative sign has the effect that smaller values indicate a better fit.
Log-likelihood: Also known as log-density. Clearly, this is equivalent to the log-score above but using the conventional sign where bigger values indicate a better fit.
Continuous ranked probability score (CRPS):
where denotes the indicator function.
In case of a discrete rather than a continuous distribution, the ranked probability
score (RPS) is defined analogously using the sum rather than the integral. In other
words it is then the sum of the squared deviations between the predicted cumulative
probabilities and the ideal step function for the actual observation
.
Mean absolute error (MAE):
Mean squared error (MSE):
Dawid-Sebastiani score (DSS):
Internally, the default proscore
method first computes the fitted/predicted
probability distribution object using prodist
(corresponding to above) and then obtains the corresponding observation
using
newresponse
. Subsequently, the scores are evaluated
using either the log_pdf
method,
crps
method, or simply the mean
. Finally,
the resulting individual scores per observation can be returned as a full
data frame, or aggregated (e.g., by using mean
, sum
, or summary
, etc.).
Either a data.frame
of scores (if drop = FALSE
, default) or
a named numeric vector (if drop = TRUE
and the scores are not a matrix).
The names are the type
specified by the user (i.e., are not canonicalized
by partial matching etc.).
## Poisson regression model for FIFA 2018 data: ## number of goals scored by each team in each game, explained by ## predicted ability difference of the competing teams data("FIFA2018", package = "distributions3") m <- glm(goals ~ difference, data = FIFA2018, family = poisson) ## default: in-sample mean log-score and CRPS proscore(m) ## element-wise score using a new data set (final of the tournament) final <- tail(FIFA2018, 2) proscore(m, newdata = final, aggregate = FALSE) ## replicate in-sample log-likelihood proscore(m, type = "loglik", aggregate = sum) logLik(m) ## compute mean of all available scores proscore(m, type = NULL) ## upper vs. lower case spelling is matched internally but preserved in output proscore(m, type = c("logs", "crps")) proscore(m, type = c("Log-score", "CRPS")) ## least-squares regression for speed and breaking distance of cars data("cars", package = "datasets") m <- lm(dist ~ speed, data = cars) ## replicate in-sample log-likelihood and residual sum of squares ## (aka deviance) by taking the sum (rather than the mean) of the ## log-density and squared errors, respectively proscore(m, type = c("loglik", "MSE"), aggregate = sum) logLik(m) deviance(m)
## Poisson regression model for FIFA 2018 data: ## number of goals scored by each team in each game, explained by ## predicted ability difference of the competing teams data("FIFA2018", package = "distributions3") m <- glm(goals ~ difference, data = FIFA2018, family = poisson) ## default: in-sample mean log-score and CRPS proscore(m) ## element-wise score using a new data set (final of the tournament) final <- tail(FIFA2018, 2) proscore(m, newdata = final, aggregate = FALSE) ## replicate in-sample log-likelihood proscore(m, type = "loglik", aggregate = sum) logLik(m) ## compute mean of all available scores proscore(m, type = NULL) ## upper vs. lower case spelling is matched internally but preserved in output proscore(m, type = c("logs", "crps")) proscore(m, type = c("Log-score", "CRPS")) ## least-squares regression for speed and breaking distance of cars data("cars", package = "datasets") m <- lm(dist ~ speed, data = cars) ## replicate in-sample log-likelihood and residual sum of squares ## (aka deviance) by taking the sum (rather than the mean) of the ## log-density and squared errors, respectively proscore(m, type = c("loglik", "MSE"), aggregate = sum) logLik(m) deviance(m)
Visualize goodness of fit of regression models by Quantile-Quantile (Q-Q) plots using quantile
residuals. If plot = TRUE
, the resulting object of class
"qqrplot"
is plotted by plot.qqrplot
or
autoplot.qqrplot
before it is returned, depending on whether the
package ggplot2
is loaded.
qqrplot(object, ...) ## Default S3 method: qqrplot( object, newdata = NULL, plot = TRUE, class = NULL, detrend = FALSE, ref_type = "identity", scale = c("normal", "uniform"), nsim = 1L, delta = NULL, simint = TRUE, simint_level = 0.95, simint_nrep = 250, confint = TRUE, ref = TRUE, xlab = "Theoretical quantiles", ylab = if (!detrend) "Quantile residuals" else "Deviation", main = NULL, ... )
qqrplot(object, ...) ## Default S3 method: qqrplot( object, newdata = NULL, plot = TRUE, class = NULL, detrend = FALSE, ref_type = "identity", scale = c("normal", "uniform"), nsim = 1L, delta = NULL, simint = TRUE, simint_level = 0.95, simint_nrep = 250, confint = TRUE, ref = TRUE, xlab = "Theoretical quantiles", ylab = if (!detrend) "Quantile residuals" else "Deviation", main = NULL, ... )
object |
an object from which probability integral transforms can be
extracted using the generic function |
newdata |
an optional data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
plot |
logical or character. Should the |
class |
should the invisible return value be either a |
detrend |
logical, defaults to |
ref_type |
character specifying that the "identity" line should be used as as a reference or the "quartiles" of the quantile residuals should be used for defining the reference line. Alternatively, also a numeric vector of length two can be used to define the probabilities to be used for defining the reference line. Note, that the reference is also used for detrending the quantile residuals. |
scale |
character. On which scale should the quantile residuals be
shown: on the probability scale ( |
nsim , delta
|
arguments passed to |
simint |
logical. In case of discrete distributions, should the simulation (confidence) interval due to the randomization be visualized? |
simint_level |
numeric. The confidence level required for calculating the simulation (confidence) interval due to the randomization. |
simint_nrep |
numeric (positive; default |
confint |
logical or character describing the style for plotting
confidence intervals. |
ref |
logical, defaults to |
xlab , ylab , main , ...
|
graphical parameters passed to
|
Q-Q residuals plots draw quantile residuals (by default on the standard normal
scale) against theoretical quantiles from the same distribution.
Alternatively, quantile residuals can also be compared on the uniform scale
(scale = "uniform"
) using no transformation. For computation,
qqrplot
leverages the function proresiduals
employing
the procast
generic.
Additional options are offered for models with discrete responses where randomization of quantiles is needed.
In addition to the plot
and autoplot
method for
qqrplot objects, it is also possible to combine two (or more) Q-Q residuals plots by
c
/rbind
, which creates a set of Q-Q residuals plots that can then be
plotted in one go.
An object of class "qqrplot"
inheriting from
"data.frame"
or "tibble"
conditional on the argument class
with the following variables:
observed |
deviations between theoretical and empirical quantiles, |
expected |
theoretical quantiles, |
simint_observed_lwr |
lower bound of the simulated confidence interval, |
simint_observed_upr |
upper bound of the simulated confidence interval, |
simint_expected |
TODO: (ML) Description missing. |
In case of nsim > 1
, a set of nsim
pairs of observed and
expected quantiles are returned (observed_1
, expected_1
, ...
observed_nsim
, observed_nsim
) is returned.
The "qqrplot"
also contains additional attributes
xlab
, ylab
, main
, simint_level
, scale
,
and detrended
used to create the plot.
Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236–244. doi:10.2307/1390802
plot.qqrplot
, wormplot
,
proresiduals
, qqnorm
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot qqrplot qqrplot(m1_lm) #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## compute and plot qqrplot as base graphic q1 <- qqrplot(m1_pois, plot = FALSE) q2 <- qqrplot(m2_pois, plot = FALSE) ## plot combined qqrplot as "ggplot2" graphic ggplot2::autoplot(c(q1, q2), single_graph = TRUE, col = c(1, 2), fill = c(1, 2)) ## Use different `scale`s with confidence intervals qqrplot(m1_pois, scale = "uniform") qqrplot(m1_pois, scale = "normal") qqrplot(m1_pois, detrend = TRUE, scale = "uniform", confint = "line") qqrplot(m1_pois, detrend = TRUE, scale = "normal", confint = "line")
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot qqrplot qqrplot(m1_lm) #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## compute and plot qqrplot as base graphic q1 <- qqrplot(m1_pois, plot = FALSE) q2 <- qqrplot(m2_pois, plot = FALSE) ## plot combined qqrplot as "ggplot2" graphic ggplot2::autoplot(c(q1, q2), single_graph = TRUE, col = c(1, 2), fill = c(1, 2)) ## Use different `scale`s with confidence intervals qqrplot(m1_pois, scale = "uniform") qqrplot(m1_pois, scale = "normal") qqrplot(m1_pois, detrend = TRUE, scale = "uniform", confint = "line") qqrplot(m1_pois, detrend = TRUE, scale = "normal", confint = "line")
Reliagram (extended reliability diagram) assess the reliability of a fitted
probabilistic distributional forecast for a binary event. If plot =
TRUE
, the resulting object of class "reliagram"
is plotted by
plot.reliagram
or autoplot.reliagram
before it is
returned, depending on whether the package ggplot2
is loaded.
reliagram(object, ...) ## Default S3 method: reliagram( object, newdata = NULL, plot = TRUE, class = NULL, breaks = seq(0, 1, by = 0.1), quantiles = 0.5, thresholds = NULL, confint = TRUE, confint_level = 0.95, confint_nboot = 250, confint_seed = 1, single_graph = FALSE, xlab = "Forecast probability", ylab = "Observed relative frequency", main = NULL, ... )
reliagram(object, ...) ## Default S3 method: reliagram( object, newdata = NULL, plot = TRUE, class = NULL, breaks = seq(0, 1, by = 0.1), quantiles = 0.5, thresholds = NULL, confint = TRUE, confint_level = 0.95, confint_nboot = 250, confint_seed = 1, single_graph = FALSE, xlab = "Forecast probability", ylab = "Observed relative frequency", main = NULL, ... )
object |
an object from which an extended reliability diagram can be
extracted with |
... |
further graphical parameters. |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
plot |
Should the |
class |
Should the invisible return value be either a |
breaks |
numeric vector passed on to |
quantiles |
numeric vector of quantile probabilities with values in
[0,1] to calculate single or several thresholds. Only used if
|
thresholds |
numeric vector specifying both where to cut the
observations into binary values and at which values the predicted
probabilities should be calculated ( |
confint |
logical. Should confident intervals be calculated and drawn? |
confint_level |
numeric. The confidence level required. |
confint_nboot |
numeric. The number of bootstrap steps. |
confint_seed |
numeric. The seed to be set for the bootstrapping. |
single_graph |
logical. Should all computed extended reliability diagrams be plotted in a single graph? |
xlab , ylab , main
|
graphical parameters. |
Reliagrams evaluate if a probability model is calibrated (reliable) by first
partitioning the predicted probability for a binary event into a certain number
of bins and then plotting (within each bin) the averaged forecast probability
against the observered/empirical relative frequency. For computation,
reliagram
leverages the procast
generic to
forecast the respective predictive probabilities.
For continous probability forecasts, reliability diagrams can be computed either for a pre-specified threshold or for a specific quantile probability of the response values. Per default, reliagrams are computed for the 50%-quantile of the reponse.
In addition to the plot
and autoplot
method for
reliagram objects, it is also possible to combine two (or more) reliability
diagrams by c
/rbind
, which creates a set of reliability diagrams
that can then be plotted in one go.
An object of class "reliagram"
inheriting from
"data.frame"
or "tibble"
conditional on the argument class
with the following variables:
x |
forecast probabilities, |
y |
observered/empirical relative frequencies, |
bin_lwr , bin_upr
|
lower and upper bound of the binned forecast probabilities, |
n_pred |
number of predictions within the binned forecasts probabilites, |
ci_lwr , ci_upr
|
lower and upper confidence interval bound. |
Additionally,
xlab
, ylab
, main
, and treshold
,
confint_level
, as well as the total and the decomposed Brier Score
(bs, rel, res, unc
) are stored as attributes.
Note that there is also a reliability.plot
function in the
verification package. However, it only works for numeric
forecast probabilities and numeric observed relative frequencies, hence a function has been
created here.
Wilks DS (2011) Statistical Methods in the Atmospheric Sciences, 3rd ed., Academic Press, 704 pp.
link{plot.reliagram}
, procast
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot reliagram reliagram(m1_lm) #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## compute and plot reliagram as base graphic r1 <- reliagram(m1_pois, plot = FALSE) r2 <- reliagram(m2_pois, plot = FALSE) ## plot combined reliagram as "ggplot2" graphic ggplot2::autoplot(c(r1, r2), single_graph = TRUE, col = c(1, 2), fill = c(1, 2))
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot reliagram reliagram(m1_lm) #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## compute and plot reliagram as base graphic r1 <- reliagram(m1_pois, plot = FALSE) r2 <- reliagram(m2_pois, plot = FALSE) ## plot combined reliagram as "ggplot2" graphic ggplot2::autoplot(c(r1, r2), single_graph = TRUE, col = c(1, 2), fill = c(1, 2))
Rootograms graphically compare (square roots) of empirical frequencies with
expected (fitted) frequencies from a probabilistic model. If plot = TRUE
, the
resulting object of class "rootogram"
is plotted by
plot.rootogram
or autoplot.rootogram
before it is
returned, depending on whether the package ggplot2
is loaded.
rootogram(object, ...) ## Default S3 method: rootogram( object, newdata = NULL, plot = TRUE, class = NULL, breaks = NULL, width = NULL, style = c("hanging", "standing", "suspended"), scale = c("sqrt", "raw"), expected = TRUE, confint = TRUE, ref = TRUE, xlab = NULL, ylab = NULL, main = NULL, ... )
rootogram(object, ...) ## Default S3 method: rootogram( object, newdata = NULL, plot = TRUE, class = NULL, breaks = NULL, width = NULL, style = c("hanging", "standing", "suspended"), scale = c("sqrt", "raw"), expected = TRUE, confint = TRUE, ref = TRUE, xlab = NULL, ylab = NULL, main = NULL, ... )
object |
an object from which an rootogram can be extracted with
|
... |
further graphical parameters passed to the plotting function. |
newdata |
an optional data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
plot |
logical or character. Should the |
class |
should the invisible return value be either a |
breaks |
|
width |
|
style |
character specifying the syle of rootogram (see 'Details'). |
scale |
character specifying whether |
expected |
logical or character. Should the expected (fitted) frequencies be plotted?
Can be set to |
confint |
logical, defaults to |
ref |
logical, defaults to |
xlab , ylab , main
|
graphical parameters forwarded to
|
Rootograms graphically compare frequencies of empirical distributions and
expected (fitted) probability models. For the observed distribution the histogram is
drawn on a square root scale (hence the name) and superimposed with a line
for the expected frequencies. The histogram can be "hanging"
from the
expected curve (default), "standing"
on the (like bars in barplot),
or drawn as a "suspended"
histogram of deviations.
Rootograms are associated with the work of John W. Tukey (see Tukey 1977) and were originally proposed for assessing the goodness of fit of univariate distributions. See Friendly (2000) for a software implementation, in particular geared towards count data models. Kleiber and Zeileis (2016) extend it to regression models for count data, essentially by replacing the expected frequencies of a univariate distribution by the sum of the expected frequencies from the different conditional distributions for all observations.
The function rootogram
leverages the procast
generic in order to compute all necessary coordinates based on observed and
expected (fitted) frequencies. It is thus not only applicable to count data
regressions but to all (regression) models that are supported by procast
.
In addition to the plot
and autoplot
method for
rootogram objects, it is also possible to combine two (or more) rootograms by
c
/rbind
, which creates a set of rootograms that can then be
plotted in one go.
An object of class "rootogram"
inheriting from
"data.frame"
or "tibble"
conditional on the argument class
with the following variables:
observed |
observed frequencies, |
expected |
expected (fitted) frequencies, |
mid |
histogram interval midpoints on the x-axis, |
width |
widths of the histogram bars, |
confint_lwr , confint_upr
|
lower and upper confidence interval bound. |
Additionally, style
, scale
, expected
, confint
,
ref
, xlab
, ylab
, amd main
are stored as attributes.
Note that there is also a rootogram
function in the
vcd package that is similar to the numeric
method provided
here. However, it is much more limited in scope, hence a function has been
created here.
Friendly M (2000), Visualizing Categorical Data. SAS Institute, Cary.
Kleiber C, Zeileis A (2016). “Visualizing Count Data Regressions Using Rootograms.” The American Statistician, 70(3), 296–303. doi:10.1080/00031305.2016.1173590
Tukey JW (1977). Exploratory Data Analysis. Addison-Wesley, Reading.
## plots and output ## number of deaths by horsekicks in Prussian army (Von Bortkiewicz 1898) deaths <- rep(0:4, c(109, 65, 22, 3, 1)) ## fit glm model m1_pois <- glm(deaths ~ 1, family = poisson) rootogram(m1_pois) ## inspect output (without plotting) r1 <- rootogram(m1_pois, plot = FALSE) r1 ## combine plots plot(c(r1, r1), col = c(1, 2), expected_col = c(1, 2)) #------------------------------------------------------------------------------- ## different styles ## artificial data from negative binomial (mu = 3, theta = 2) ## and Poisson (mu = 3) distribution set.seed(1090) y <- rnbinom(100, mu = 3, size = 2) x <- rpois(100, lambda = 3) ## glm method: fitted values via glm() m2_pois <- glm(y ~ x, family = poisson) ## correctly specified Poisson model fit par(mfrow = c(1, 3)) r1 <- rootogram(m2_pois, style = "standing", ylim = c(-2.2, 4.8), main = "Standing") r2 <- rootogram(m2_pois, style = "hanging", ylim = c(-2.2, 4.8), main = "Hanging") r3 <- rootogram(m2_pois, style = "suspended", ylim = c(-2.2, 4.8), main = "Suspended") par(mfrow = c(1, 1)) #------------------------------------------------------------------------------- ## linear regression with normal/Gaussian response: anorexia data data("anorexia", package = "MASS") m3_gauss <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) ## plot rootogram as "ggplot2" graphic rootogram(m3_gauss, plot = "ggplot2")
## plots and output ## number of deaths by horsekicks in Prussian army (Von Bortkiewicz 1898) deaths <- rep(0:4, c(109, 65, 22, 3, 1)) ## fit glm model m1_pois <- glm(deaths ~ 1, family = poisson) rootogram(m1_pois) ## inspect output (without plotting) r1 <- rootogram(m1_pois, plot = FALSE) r1 ## combine plots plot(c(r1, r1), col = c(1, 2), expected_col = c(1, 2)) #------------------------------------------------------------------------------- ## different styles ## artificial data from negative binomial (mu = 3, theta = 2) ## and Poisson (mu = 3) distribution set.seed(1090) y <- rnbinom(100, mu = 3, size = 2) x <- rpois(100, lambda = 3) ## glm method: fitted values via glm() m2_pois <- glm(y ~ x, family = poisson) ## correctly specified Poisson model fit par(mfrow = c(1, 3)) r1 <- rootogram(m2_pois, style = "standing", ylim = c(-2.2, 4.8), main = "Standing") r2 <- rootogram(m2_pois, style = "hanging", ylim = c(-2.2, 4.8), main = "Hanging") r3 <- rootogram(m2_pois, style = "suspended", ylim = c(-2.2, 4.8), main = "Suspended") par(mfrow = c(1, 1)) #------------------------------------------------------------------------------- ## linear regression with normal/Gaussian response: anorexia data data("anorexia", package = "MASS") m3_gauss <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) ## plot rootogram as "ggplot2" graphic rootogram(m3_gauss, plot = "ggplot2")
Sample of 152 serum potassium levels.
data("SerumPotassium", package = "topmodels")
data("SerumPotassium", package = "topmodels")
A numeric vector of 152 serum potassium levels.
The data are taken from Rice (2007) who obtained the data from Martin, Gudzinowicz and Fanger (1975) and reports them rounded to one digit.
Page 350 in Rice (2007).
Rice JA (2007). Mathematical Statistics and Data Analysis, 3rd ed. Duxbury, Belmont, CA.
Martin HF, Gudzinowicz BJ, Fanger H (1975). Normal Values in Clinical Chemistry: A Guide to Statistical Analysis of Laboratory Data. Marcel Dekker, New York.
library("topmodels") data("SerumPotassium", package = "topmodels") ## Figure 9.3a-c from Rice (2007), and actual hanging rootogram ## (note that Rice erroneously refers to suspended rootograms as hanging) sp <- lm(SerumPotassium ~ 1) br <- 32:54/10 - 0.05 rootogram(sp, scale = "raw", style = "standing", breaks = br, col = "transparent") rootogram(sp, scale = "raw", style = "suspended", breaks = br, col = "transparent", ylim = c(2.8, -4)) rootogram(sp, scale = "sqrt", style = "suspended", breaks = br, col = "transparent", ylim = c(1, -1.5)) rootogram(sp, breaks = br)
library("topmodels") data("SerumPotassium", package = "topmodels") ## Figure 9.3a-c from Rice (2007), and actual hanging rootogram ## (note that Rice erroneously refers to suspended rootograms as hanging) sp <- lm(SerumPotassium ~ 1) br <- 32:54/10 - 0.05 rootogram(sp, scale = "raw", style = "standing", breaks = br, col = "transparent") rootogram(sp, scale = "raw", style = "suspended", breaks = br, col = "transparent", ylim = c(2.8, -4)) rootogram(sp, scale = "sqrt", style = "suspended", breaks = br, col = "transparent", ylim = c(1, -1.5)) rootogram(sp, breaks = br)
geom_*
and stat_*
for Producing PIT Histograms with 'ggplot2'Various geom_*
and stat_*
used within
autoplot
for producing PIT histograms.
stat_pithist( mapping = NULL, data = NULL, geom = "pithist", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, freq = FALSE, style = c("bar", "line"), ... ) geom_pithist( mapping = NULL, data = NULL, stat = "pithist", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, freq = FALSE, style = c("bar", "line"), ... ) stat_pithist_expected( mapping = NULL, data = NULL, geom = "pithist_expected", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("uniform", "normal"), freq = FALSE, ... ) geom_pithist_expected( mapping = NULL, data = NULL, stat = "pithist_expected", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("uniform", "normal"), freq = FALSE, ... ) stat_pithist_confint( mapping = NULL, data = NULL, geom = "pithist_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("uniform", "normal"), level = 0.95, type = "approximation", freq = FALSE, style = c("polygon", "line"), ... ) geom_pithist_confint( mapping = NULL, data = NULL, stat = "pithist_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("uniform", "normal"), level = 0.95, type = "approximation", freq = FALSE, style = c("polygon", "line"), ... ) stat_pithist_simint( mapping = NULL, data = NULL, geom = "pithist_simint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, freq = FALSE, ... ) geom_pithist_simint( mapping = NULL, data = NULL, stat = "pithist_simint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, freq = FALSE, ... )
stat_pithist( mapping = NULL, data = NULL, geom = "pithist", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, freq = FALSE, style = c("bar", "line"), ... ) geom_pithist( mapping = NULL, data = NULL, stat = "pithist", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, freq = FALSE, style = c("bar", "line"), ... ) stat_pithist_expected( mapping = NULL, data = NULL, geom = "pithist_expected", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("uniform", "normal"), freq = FALSE, ... ) geom_pithist_expected( mapping = NULL, data = NULL, stat = "pithist_expected", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("uniform", "normal"), freq = FALSE, ... ) stat_pithist_confint( mapping = NULL, data = NULL, geom = "pithist_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("uniform", "normal"), level = 0.95, type = "approximation", freq = FALSE, style = c("polygon", "line"), ... ) geom_pithist_confint( mapping = NULL, data = NULL, stat = "pithist_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("uniform", "normal"), level = 0.95, type = "approximation", freq = FALSE, style = c("polygon", "line"), ... ) stat_pithist_simint( mapping = NULL, data = NULL, geom = "pithist_simint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, freq = FALSE, ... ) geom_pithist_simint( mapping = NULL, data = NULL, stat = "pithist_simint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, freq = FALSE, ... )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
The geometric object to use to display the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
freq |
logical. If |
style |
character specifying the style of pithist. For |
... |
Other arguments passed on to
|
stat |
The statistical transformation to use on the data for this layer.
When using a
|
scale |
On which scale should the PIT residuals be computed: on the probability scale
( |
level |
numeric. The confidence level required. |
type |
character. Which type of confidence interval should be plotted: '"exact"' or '"approximation"'. According to Agresti and Coull (1998), for interval estimation of binomial proportions an approximation can be better than exact. |
if (require("ggplot2")) { ## Fit model data("CrabSatellites", package = "countreg") m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## Compute pithist p1 <- pithist(m1_pois, type = "random", plot = FALSE) p2 <- pithist(m2_pois, type = "random", plot = FALSE) d <- c(p1, p2) ## Create factor main <- attr(d, "main") main <- make.names(main, unique = TRUE) d$group <- factor(d$group, labels = main) ## Plot bar style PIT histogram gg1 <- ggplot(data = d) + geom_pithist(aes(x = mid, y = observed, width = width, group = group), freq = TRUE) + geom_pithist_simint(aes(x = mid, ymin = simint_lwr, ymax = simint_upr), freq = TRUE) + geom_pithist_confint(aes(x = mid, y = observed, width = width), style = "line", freq = TRUE) + geom_pithist_expected(aes(x = mid, y = observed, width = width), freq = TRUE) + facet_grid(group ~ .) + xlab("PIT") + ylab("Frequency") gg1 gg2 <- ggplot(data = d) + geom_pithist(aes(x = mid, y = observed, width = width, group = group), freq = FALSE) + geom_pithist_simint(aes( x = mid, ymin = simint_lwr, ymax = simint_upr, y = observed, width = width ), freq = FALSE) + geom_pithist_confint(aes(x = mid, y = observed, width = width), style = "line", freq = FALSE) + geom_pithist_expected(aes(x = mid, y = observed, width = width), freq = FALSE) + facet_grid(group ~ .) + xlab("PIT") + ylab("Density") gg2 ## Plot line style PIT histogram gg3 <- ggplot(data = d) + geom_pithist(aes(x = mid, y = observed, width = width, group = group), style = "line") + geom_pithist_confint(aes(x = mid, y = observed, width = width), style = "polygon") + facet_grid(group ~ .) + xlab("PIT") + ylab("Density") gg3 }
if (require("ggplot2")) { ## Fit model data("CrabSatellites", package = "countreg") m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## Compute pithist p1 <- pithist(m1_pois, type = "random", plot = FALSE) p2 <- pithist(m2_pois, type = "random", plot = FALSE) d <- c(p1, p2) ## Create factor main <- attr(d, "main") main <- make.names(main, unique = TRUE) d$group <- factor(d$group, labels = main) ## Plot bar style PIT histogram gg1 <- ggplot(data = d) + geom_pithist(aes(x = mid, y = observed, width = width, group = group), freq = TRUE) + geom_pithist_simint(aes(x = mid, ymin = simint_lwr, ymax = simint_upr), freq = TRUE) + geom_pithist_confint(aes(x = mid, y = observed, width = width), style = "line", freq = TRUE) + geom_pithist_expected(aes(x = mid, y = observed, width = width), freq = TRUE) + facet_grid(group ~ .) + xlab("PIT") + ylab("Frequency") gg1 gg2 <- ggplot(data = d) + geom_pithist(aes(x = mid, y = observed, width = width, group = group), freq = FALSE) + geom_pithist_simint(aes( x = mid, ymin = simint_lwr, ymax = simint_upr, y = observed, width = width ), freq = FALSE) + geom_pithist_confint(aes(x = mid, y = observed, width = width), style = "line", freq = FALSE) + geom_pithist_expected(aes(x = mid, y = observed, width = width), freq = FALSE) + facet_grid(group ~ .) + xlab("PIT") + ylab("Density") gg2 ## Plot line style PIT histogram gg3 <- ggplot(data = d) + geom_pithist(aes(x = mid, y = observed, width = width, group = group), style = "line") + geom_pithist_confint(aes(x = mid, y = observed, width = width), style = "polygon") + facet_grid(group ~ .) + xlab("PIT") + ylab("Density") gg3 }
geom_*
and stat_*
for Producing Rootograms with 'ggplot2'Various geom_*
and stat_*
used within
autoplot
for producing rootograms.
stat_rootogram( mapping = NULL, data = NULL, geom = "rootogram", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("sqrt", "raw"), style = c("hanging", "standing", "suspended"), ... ) geom_rootogram( mapping = NULL, data = NULL, stat = "rootogram", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("sqrt", "raw"), style = c("hanging", "standing", "suspended"), ... ) stat_rootogram_expected( mapping = NULL, data = NULL, geom = "rootogram_expected", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("sqrt", "raw"), ... ) geom_rootogram_expected( mapping = NULL, data = NULL, stat = "rootogram_expected", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("sqrt", "raw"), linestyle = c("both", "line", "point"), ... ) GeomRootogramExpected geom_rootogram_ref( mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) stat_rootogram_confint( mapping = NULL, data = NULL, geom = "rootogram_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, level = 0.95, nrep = 1000, type = c("tukey", "pointwise", "simultaneous"), scale = c("sqrt", "raw"), rootogram_style = c("hanging", "standing", "suspended"), ... ) geom_rootogram_confint( mapping = NULL, data = NULL, stat = "rootogram_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, level = 0.95, nrep = 1000, type = c("tukey", "pointwise", "simultaneous"), scale = c("sqrt", "raw"), rootogram_style = c("hanging", "standing", "suspended"), ... )
stat_rootogram( mapping = NULL, data = NULL, geom = "rootogram", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("sqrt", "raw"), style = c("hanging", "standing", "suspended"), ... ) geom_rootogram( mapping = NULL, data = NULL, stat = "rootogram", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("sqrt", "raw"), style = c("hanging", "standing", "suspended"), ... ) stat_rootogram_expected( mapping = NULL, data = NULL, geom = "rootogram_expected", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("sqrt", "raw"), ... ) geom_rootogram_expected( mapping = NULL, data = NULL, stat = "rootogram_expected", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, scale = c("sqrt", "raw"), linestyle = c("both", "line", "point"), ... ) GeomRootogramExpected geom_rootogram_ref( mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) stat_rootogram_confint( mapping = NULL, data = NULL, geom = "rootogram_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, level = 0.95, nrep = 1000, type = c("tukey", "pointwise", "simultaneous"), scale = c("sqrt", "raw"), rootogram_style = c("hanging", "standing", "suspended"), ... ) geom_rootogram_confint( mapping = NULL, data = NULL, stat = "rootogram_confint", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, level = 0.95, nrep = 1000, type = c("tukey", "pointwise", "simultaneous"), scale = c("sqrt", "raw"), rootogram_style = c("hanging", "standing", "suspended"), ... )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
The geometric object to use to display the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
scale |
character specifying whether values should be transformed to the square root scale (not checking for original scale, so maybe applied again). |
style |
character specifying the syle of rootogram (see below). |
... |
Other arguments passed on to
|
stat |
The statistical transformation to use on the data for this layer.
When using a
|
linestyle |
Character string defining one of '"both"', '"line"' or '"point"'. |
level |
numeric. The confidence level required. |
nrep |
numeric. The repetition number of simulation for computing the confidence intervals. |
type |
character. Should |
rootogram_style |
character specifying the syle of rootogram. |
An object of class GeomRootogramExpected
(inherits from GeomPath
, Geom
, ggproto
, gg
) of length 3.
if (require("ggplot2")) { ## Fit model data("CrabSatellites", package = "countreg") m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## Compute rootogram (on raw scale) p1 <- rootogram(m1_pois, scale = "raw", plot = FALSE) p2 <- rootogram(m2_pois, scale = "raw", plot = FALSE) d <- c(p1, p2) ## Get label names main <- attr(d, "main") main <- make.names(main, unique = TRUE) d$group <- factor(d$group, labels = main) ## Plot rootograms w/ on default "sqrt" scale gg1 <- ggplot(data = d) + geom_rootogram(aes( observed = observed, expected = expected, mid = mid, width = width, group = group )) + geom_rootogram_expected(aes(expected = expected, mid = mid)) + geom_rootogram_ref() + facet_grid(group ~ .) + xlab("satellites") + ylab("sqrt(Frequency)") gg1 }
if (require("ggplot2")) { ## Fit model data("CrabSatellites", package = "countreg") m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## Compute rootogram (on raw scale) p1 <- rootogram(m1_pois, scale = "raw", plot = FALSE) p2 <- rootogram(m2_pois, scale = "raw", plot = FALSE) d <- c(p1, p2) ## Get label names main <- attr(d, "main") main <- make.names(main, unique = TRUE) d$group <- factor(d$group, labels = main) ## Plot rootograms w/ on default "sqrt" scale gg1 <- ggplot(data = d) + geom_rootogram(aes( observed = observed, expected = expected, mid = mid, width = width, group = group )) + geom_rootogram_expected(aes(expected = expected, mid = mid)) + geom_rootogram_ref() + facet_grid(group ~ .) + xlab("satellites") + ylab("sqrt(Frequency)") gg1 }
A quick overview plot with panels for all graphical evaluation methods provided
for probabilistic (regression) model objects. If plot = TRUE
, the
resulting objects are plotted by plot
or autoplot
before they are returned within a single list, depending on whether the package
ggplot2
is loaded.
topmodels( object, plot = TRUE, class = NULL, newdata = NULL, na.action = na.pass, which = NULL, ask = dev.interactive(), spar = TRUE, single_page = NULL, envir = parent.frame(), ... )
topmodels( object, plot = TRUE, class = NULL, newdata = NULL, na.action = na.pass, which = NULL, ask = dev.interactive(), spar = TRUE, single_page = NULL, envir = parent.frame(), ... )
object |
An object supported by |
plot |
Should the |
class |
Should the invisible return value be either a |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
na.action |
function determining what should be done with missing
values in |
which |
Character or integer, selects the type of plot:
|
ask |
For multiple plots, the user is asked to show the next plot. Argument is
ignored for |
spar |
Should graphical parameters be set? Will be ignored for
|
single_page |
Logical. Should all plots be shown on a single page? Only
choice for |
envir |
environment, default is |
... |
Arguments to be passed to |
Render the diagnostic graphics rootogram
, pithist
, reliagram
qqrplot
, and wormplot
.
A list containing the objects plotted conditional on the arguemnt which
.
rootogram
, pithist
, reliagram
qqrplot
, wormplot
data("CrabSatellites", package = "countreg") CrabSatellites2 <- CrabSatellites[CrabSatellites$satellites <= 1, ] m1 <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2 <- glm(satellites ~ width + color, data = CrabSatellites2, family = binomial) ## ggplot2 graphics topmodels(m1, single_page = TRUE, nsim = 30, plot = "ggplot2") topmodels(m2, single_page = TRUE, nsim = 30, plot = "ggplot2")
data("CrabSatellites", package = "countreg") CrabSatellites2 <- CrabSatellites[CrabSatellites$satellites <= 1, ] m1 <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2 <- glm(satellites ~ width + color, data = CrabSatellites2, family = binomial) ## ggplot2 graphics topmodels(m1, single_page = TRUE, nsim = 30, plot = "ggplot2") topmodels(m2, single_page = TRUE, nsim = 30, plot = "ggplot2")
Heights of 218 volcanos taken from Tukey (1972).
data("VolcanoHeights", package = "topmodels")
data("VolcanoHeights", package = "topmodels")
A numeric vector of 218 volcano heights (in 1000 feet).
The data are taken from Tukey (1972) who obtained them from The World Almanac, 1966 (New York: The New York World-Telegram and The Sun, 1966), pp. 282–283.
Figure 1 in Tukey (1972).
Tukey JW (1972). “Some Graphic and Semigraphic Displays.” In Bancroft TA (ed.), Statistical Papers in Honor of George W. Snedecor, pp. 293–316. Iowa State University Press, Ames, IA. Reprinted in Cleveland WS (ed.): The Collected Works of John W. Tukey, Volume V. Graphics: 1965–1985, Wadsworth & Brooks/Cole, Pacific Grove, CA, 1988.
## Rootograms from Tukey (1972) ## (some 'breaks' don't match exactly) library("topmodels") data("VolcanoHeights", package = "topmodels") ## Figure 16 rootogram(lm(VolcanoHeights ~ 1), style = "standing", breaks = 0:20 - 0.01, expected = FALSE, confint = FALSE) ## Figure 17 rootogram(lm(sqrt(1000 * VolcanoHeights) ~ 1), style = "standing", breaks = 0:17 * 10 - 1.1, expected = FALSE, confint = FALSE) ## Figure 18 rootogram(lm(sqrt(1000 * VolcanoHeights) ~ 1), style = "hanging", breaks = -2:18 * 10 - 1.1, confint = FALSE) ## Figure 19 rootogram(lm(sqrt(1000 * VolcanoHeights) ~ 1), style = "suspended", breaks = -2:18 * 10 - 1.1, ylim = c(6, -2), confint = FALSE) abline(h = c(-1.5, -1, 1, 1.5), lty = c(2, 3, 3, 2))
## Rootograms from Tukey (1972) ## (some 'breaks' don't match exactly) library("topmodels") data("VolcanoHeights", package = "topmodels") ## Figure 16 rootogram(lm(VolcanoHeights ~ 1), style = "standing", breaks = 0:20 - 0.01, expected = FALSE, confint = FALSE) ## Figure 17 rootogram(lm(sqrt(1000 * VolcanoHeights) ~ 1), style = "standing", breaks = 0:17 * 10 - 1.1, expected = FALSE, confint = FALSE) ## Figure 18 rootogram(lm(sqrt(1000 * VolcanoHeights) ~ 1), style = "hanging", breaks = -2:18 * 10 - 1.1, confint = FALSE) ## Figure 19 rootogram(lm(sqrt(1000 * VolcanoHeights) ~ 1), style = "suspended", breaks = -2:18 * 10 - 1.1, ylim = c(6, -2), confint = FALSE) abline(h = c(-1.5, -1, 1, 1.5), lty = c(2, 3, 3, 2))
Visualize goodness of fit of regression models by worm plots using quantile
residuals. If plot = TRUE
, the resulting object of class
"wormplot"
is plotted by plot.qqrplot
or
autoplot.qqrplot
before it is returned, depending on whether the
package ggplot2
is loaded.
wormplot(object, ...) ## Default S3 method: wormplot( object, newdata = NULL, plot = TRUE, class = NULL, detrend = TRUE, scale = c("normal", "uniform"), nsim = 1L, delta = NULL, confint = TRUE, simint = TRUE, simint_level = 0.95, simint_nrep = 250, single_graph = FALSE, xlab = "Theoretical quantiles", ylab = "Deviation", main = NULL, ... )
wormplot(object, ...) ## Default S3 method: wormplot( object, newdata = NULL, plot = TRUE, class = NULL, detrend = TRUE, scale = c("normal", "uniform"), nsim = 1L, delta = NULL, confint = TRUE, simint = TRUE, simint_level = 0.95, simint_nrep = 250, single_graph = FALSE, xlab = "Theoretical quantiles", ylab = "Deviation", main = NULL, ... )
object |
an object from which probability integral transforms can be
extracted using the generic function |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used. |
plot |
Should the |
class |
Should the invisible return value be either a |
detrend |
logical. Should the qqrplot be detrended, i.e, plotted as a 'wormplot()'? |
scale |
On which scale should the quantile residuals be shown: on the probability scale
( |
nsim , delta
|
arguments passed to |
confint |
logical or character string describing the type for plotting 'c("polygon", "line")'. If not set to 'FALSE', the pointwise confidence interval of the (randomized) quantile residuals are visualized. |
simint |
logical. In case of discrete distributions, should the simulation (confidence) interval due to the randomization be visualized? |
simint_level |
numeric. The confidence level required for calculating the simulation (confidence) interval due to the randomization. |
simint_nrep |
numeric. The repetition number of simulated quantiles for calculating the simulation (confidence) interval due to the randomization. |
single_graph |
logical. Should all computed extended reliability diagrams be plotted in a single graph? |
xlab , ylab , main , ...
|
graphical parameters passed to
|
Worm plots (de-trended Q-Q plots) draw deviations of quantile residuals (by
default: transformed to standard normal scale) and theoretical quantiles from
the same distribution against the same theoretical quantiles. For computation,
wormplot
leverages the function proresiduals
employing the procast
.
Additional options are offered for models with discrete responses where randomization of quantiles is needed.
In addition to the plot
and autoplot
method for
wormplot objects, it is also possible to combine two (or more) worm plots by
c
/rbind
, which creates a set of worm plots that can then be
plotted in one go.
An object of class "qqrplot"
inheriting from
"data.frame"
or "tibble"
conditional on the argument class
with the following variables:
x |
theoretical quantiles, |
y |
deviations between theoretical and empirical quantiles. |
In case of
randomized residuals, nsim
different x
and y
values, and
lower and upper confidence interval bounds (x_rg_lwr
, y_rg_lwr
,
x_rg_upr
, y_rg_upr
) can optionally be returned. Additionally,
xlab
, ylab
, main
, and simint_level
, as well as the
the (scale
) and wether a detrended
Q-Q residuals plot
was computed are stored as attributes.
van Buuren S and Fredriks M (2001). “Worm plot: simple diagnostic device for modelling growth reference curves”. Statistics in Medicine, 20, 1259–1277. doi:10.1002/sim.746
qqrplot
, plot.qqrplot
, qqrplot
,
proresiduals
, qqnorm
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot wormplot wormplot(m1_lm) #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## compute and plot wormplot as base graphic w1 <- wormplot(m1_pois, plot = FALSE) w2 <- wormplot(m2_pois, plot = FALSE) ## plot combined wormplot as "ggplot2" graphic ggplot2::autoplot(c(w1, w2), single_graph = TRUE, col = c(1, 2), fill = c(1, 2))
## speed and stopping distances of cars m1_lm <- lm(dist ~ speed, data = cars) ## compute and plot wormplot wormplot(m1_lm) #------------------------------------------------------------------------------- ## determinants for male satellites to nesting horseshoe crabs data("CrabSatellites", package = "countreg") ## linear poisson model m1_pois <- glm(satellites ~ width + color, data = CrabSatellites, family = poisson) m2_pois <- glm(satellites ~ color, data = CrabSatellites, family = poisson) ## compute and plot wormplot as base graphic w1 <- wormplot(m1_pois, plot = FALSE) w2 <- wormplot(m2_pois, plot = FALSE) ## plot combined wormplot as "ggplot2" graphic ggplot2::autoplot(c(w1, w2), single_graph = TRUE, col = c(1, 2), fill = c(1, 2))