Internals: variance estimation

The aim of the document is to explain how the uncertainty is estimated in mrds.

Terms of reference

  • $y$ refers to distance: this could be perpendicular or radial.
  • $w$ is the right truncation distance. We assume the left truncation is at 0 distance unless explicitly noted; results hold for any left truncation but 0 is taken for ease of notation.
  • $g(y, \boldsymbol{z})$ a general detection function (missing the $\boldsymbol{z}$ if there are no covariates), we assume that the detection function has some parameters ($\boldsymbol{\theta}$, say) but they aren’t important here.

Variance of what?

There are a lot of objects that we could calculate the variance of in a detection function analysis (and even more in an MRDS analysis). The aim here is to build up to more complicated objects, from the basics.

Parameter estimates

Having estimated the parameters of the detection function, $g$, we may want to estimate the standard error of those parameters, $\hat{\boldsymbol{\theta}}$. The uncertainty attached to these parameter estimates is easy to find, since their variance-covariance matrix can be calculated from the Hessian (matrix of partial derivatives) returned by the optimisation routine used to find the parameters. Denoting the Hessian $H$, the variances of the parameters $\hat{\boldsymbol{\theta}}$ are the diagonal elements of the matrix $(-H)^{-1}$ (and hence the standard errors are the square roots of those elements).

On a more technical note, it may be the case that $H$ is not invertible, in which case one must take the pseudo-inverse (eigen-decomposition of $H$ ($H=UDU^\text{T}$), then taking the reciprocal of the diagonal elements of $D$) in order to get the uncertainty of the parameters. In this case it may be reasonable to check there are not numerical issues with your model.

Also note that in code you may see simply the inverse of the Hessian being taken, this is usually because rather than maximising the likelihood, the optimisation routine is (equivalently) minimising the negative likelihood.

Estimates of average detectability

(M)CDS models

We first detail how to calculate variance of the average detectability for conventional and multiple covariate distance sampling models.

Obtaining the variance of the average detectability (for a given covariate combination) requires a little more work. First, let’s remind ourselves of it’s definition

\[\hat{p}(z) = \frac{1}{w} \int_0^w g(x, \boldsymbol{z}; \hat{\boldsymbol{\theta}}) \text{d}x\]

for a line transect detection function $g$ with covariates $\boldsymbol{z}$, with left truncation point 0 and right truncation point $w$. We might refer to the integral as

\[\hat{\mu}(z) = \frac{1}{w} \int_0^w g(x, \boldsymbol{z}; \hat{\boldsymbol{\theta}}) \text{d}x\]

and therefore think of $\hat{\mu}$ as a function of $\hat{\boldsymbol{\theta}}$: $\hat{\mu}(\hat{\boldsymbol{\theta}})$. With this in mind we can use the “sandwich estimator” as in Borchers et al (2002) to calculate the variance of a function of the maximum likelihood estimate of $\hat{\boldsymbol{\theta}}$

\[\text{Var}(\hat{p}(z)) = \frac{1}{w^2} \text{Var}(\hat{\mu}(\hat{\boldsymbol{\theta}}))\]

Now applying the sandwich estimator:

\[\text{Var}(\hat{p}(z)) = \frac{1}{w^2} \left[ \frac{\partial \hat{\mu}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} {\Bigg|}_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}\right]^T (-H)^{-1} \left[ \frac{\partial \hat{\mu}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} {\Bigg|}_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}\right]\]
where ${\big }_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}$ indicates that the derivatives are evaluated at their maximum likelihood estimates. $H$ is the Hessian matrix from the optimisation of the likelihood: the matrix of second derivatives of the likelihood with respect to the parameters. Note that DISTANCE (for Windows) uses the first derivative Hessian.

The derivatives of the effective strip width can be tricky to calculate analytically, so it may be necessary to use finite differences to approximate them.

MRDS models

For mark-recapture distance sampling models, the likelihood consists of two components: one for the detection functions and one for the mark-recapture component.

Independent observer models (io)

For an io model, the probability of detection is calculated as:

\[\hat{p}(\boldsymbol{z};\boldsymbol{\theta}) = \hat{p}_{MR}(0,\boldsymbol{z};\boldsymbol{\theta}) \hat{p}_\cdot(\boldsymbol{z};\boldsymbol{\theta})\]

where $\hat{p}\cdot(\boldsymbol{z};\boldsymbol{\theta})$ is as $\hat{p}(\boldsymbol{z}; \boldsymbol{\theta})$ in 1) above. The other part of the prediction, $\hat{p}{MR}(0,\boldsymbol{z};\boldsymbol{\theta})$ is calculated from the GLM part of the model and is given by \(\hat{p}_{MR}(0,\boldsymbol{z};\boldsymbol{\theta}) = \hat{p}_{GLM}(0,\boldsymbol{z};\boldsymbol{\theta} \vert \text{observer==1}) + \hat{p}_{GLM}(0,\boldsymbol{z};\boldsymbol{\theta} \vert \text{observer==2}) -\\ \hat{p}_{GLM}(0,\boldsymbol{z};\boldsymbol{\theta} \vert \text{observer==1}) \hat{p}_{GLM}(0,\boldsymbol{z};\boldsymbol{\theta} \vert \text{observer==2})\)

References and further reading

  • Borchers, DL, ST Buckland, and W Zucchini. Estimating Animal Abundance, Springer, 2002.
  • Buckland, ST, DR Anderson, KP Burnham, JL Laake, DL Borchers, and L Thomas. Introduction to Distance Sampling, Oxford University Press, 2001.
  • Buckland, ST, DR Anderson, KP Burnham, JL Laake, DL Borchers, and L Thomas. Advanced Distance Sampling, Oxford University Press, 2004.
  • Innes, S, MP Heide-Jorgensen, and JL Laake. Surveys of Belugas and Narwhals in the Canadian High Arctic in 1996. NAMMCO Scientific Publications (2002).