This document gives an overview of the parametric count data distributions implemented within countreg.
A distribution is commonly determined by its density function f(y | θ), where y is a realization of a random variable Y and θ is a vector of parameters allowing the location, scale, and, shape of the distribution to vary.
The main motivation for the use of parametric distribtions within countreg is to solve regression problems. For maximum likelihood estimation the objective function is the log-likelihood,
To solve this optimizaiton problem numerically algorithms of the Newton-Raphson type are employed, which require the first and second derivative of the objective funciton, i.e., the score function s and the hessian h, respectively,
Note, that in many cases it is numerically less burdensome to compute the second derivative numerically instead of applying an analytical solution.
For prediction purposes it is convenient to have functions on hand that allow the computation of the expected value and the variance given a set of parameters.
These two points, numerical optimization and prediction, motivate to extend the infrastucture of the distributions implemented in countreg.
The standard infrastructure within stats provides 4
functions for each distibution. The prefixes 'd'
,
'p'
, 'q'
, and 'r'
indicate the
density, cumulative distribution function (CDF), the quantile function,
and a simulator for random deviates, respectively. The implementation in
countreg aims at extending this infrastructure by the
score function sxxx
, the hessian hxxx
, the
mean mean_xxx
, and the variance var_xxx
.
The interface of the score function look as follows,
sxxx(x, theta1, theta2, parameter = c("theta1" ,"theta2"), drop = TRUE)
The first argument x
is a vector of quantiles,
theta1
and theta2
are vectors of the
parameters specifying the distribution (names and amount of parameters
are choosen as an example), the argument parameter
gets a
character string (or a vector of charater strings) indicating wrt which
parameter the score should be computed, the logical drop
indicates whether the result should be a matrix or if the dimension
should be dropped. The interface of the hessian hxxx
is
analogously.
The interface of mean_xxx
and var_xxx
is
mean_xxx(theta1, theta2, drop = TRUE)
"xpois"
)The Poisson distribution has the density with expected value E(Y) = λ and variance VAR(Y) = λ.
The score function is The hessian is
"xbinom"
)The binomial distribution with size
= n and prob
= π has the density with expected
value E(Y) = n ⋅ π
and variance VAR(Y) = n ⋅ π ⋅ (1 − π).
The score function is
The hessian is
"xnbinom"
)The negative binomial (type 2) has the density, with expected value E(Y) = μ and variance VAR(Y) = μ + μ2/θ.
The score functions are: where ψ0 is the digamma function.
The elements of the hessian are where ψ1 is the trigamma function.
"xztpois"
)The zero truncated Poisson has the density where fPois
is the density of the Poisson distribution. The
zero-truncated distribution has expectation E(x) = μ = λ/(1 − exp (−λ))
and variance VAR(x) = μ ⋅ (λ + 1 − μ),
where λ is the expectation of
the untruncated Poisson distribution. Within
countreg both parameterizations, either with λ ("lambda"
) or μ ("mean"
), are
implemented. Thus, the score functions can be calculated either wrt
λ ("lambda"
) or
μ ("mean"
): The
hessian is
"xztnbinom"
)The zero-truncated negative binomial has density with expectation where fNB is the density of the negative binomial distribution, and variance
The score functions are:
The elements of the hessian are: and
"xhpois"
)The hurdle poisson has density with expectation and variance The score functions are and where I{0}(y) is an indicator function which takes the value one if y equals zero, and zero otherwise.
The elements of the Hessian are, and
"xhnbinom"
)The hurdle negative binomial has density with expectation and variance The score functions are, and where s⋆, NB(⋅) are the score functions of the zero-truncated negative binomial.
The elements of the hessian