# Internals: prediction

**Note: this article is unfinished.**

The aim of the document is to explain how predictions are calculated in `mrds`

.

It’s assumed you have some familiarity with `mrds`

and a lot of familiarity with R. This is not supposed to be a replacement for the `mrds`

documentation, but rather further explanation for those hacking on `mrds`

.

## Terms of reference

- $y$ refers to distance: this could be perpendicular or radial.
- $w$ is the right truncation distance.
- $g(y, \mathbf{z};\mathbf{\theta})$ a general detection function (missing the $\mathbf{z}$ if there are no covariates), we assume that the detection function has some parameters ($\mathbf{\theta}$).

## What are “predictions” in `mrds`

?

Before explaining what the code does, it’s worth thinking about what a prediction means in the context of distance sampling and `mrds`

. These mean different things depending on the model that we are thinking about. In general we are thinking about who quantities: probabilities of detection and effective strip width/area.

**Probabilities**:*Conventional distance sampling (CDS) detection functions*: for CDS models, we are thinking about the average probability of detection. That is, we have a model for $\mathbb{P}(\text{animal detected } \vert \text{ animal at distance } x)$ (the detection function) and we want to integrate out distance from the model to obtain $\mathbb{P}(\text{animal detected})$.*Multiple covariate distance sampling (MCDS) detection functions*: as with CDS models, we wish to integrate distance out of the model, however the resulting probabilities are conditional on the observed values of the non-distance covariates, since we don’t know their distributions.*MRDS full independence detection functions*:*MRDS point independence detection functions*:

## Predictions for `mrds`

model types

### Predictions from `predict.ds`

Predictions from `predict.ds`

can be one of two things, either:

1) the *predicted average probability of detection for a given set of covariates*. That is:
where for line transects $\pi(y) = \frac{1}{w}$ and for point transects $\pi(y)=\frac{2y}{w^2}$.
2) the *predicted effective strip width for a given set of covariates*. That is:

Giving either of these quantities a subscript $i$ will denote that they are *fitted values* (i.e. predictions at the observed data): $\hat{p}_i(\mathbf{z}; \mathbf{\theta})$ or $\hat{\mu}_i(\mathbf{z}; \mathbf{\theta})$.

### Predictions from `predict.io.fi`

For full independence models, we fit a GLM to the detections (see Fitting MRDS), but the probabilities in the model are in fact $\pi_{ij} = \mathbb{P}(\text{animal } i \text{ detected by observer } j \quad\vert \text{ detected by at least one observer})$, so in order to obtain the probabilities we want, we must calculate: the conditional probability of an animal being seen by either observer. See Section 6.3.2, Laake and Borchers (2004) (in particular equation 6.22). For brevity we write:

When the option `integrate=FALSE`

is passed a three element list is returned with elements:

- $\hat{p}_{MR}(0,\mathbf{z};\hat{\boldsymbol{\theta}})$
- $\hat{p}_{GLM}(0,\mathbf{z};\hat{\boldsymbol{\theta}} \vert \texttt{observer==1})$
- $\hat{p}_{GLM}(0,\mathbf{z};\hat{\boldsymbol{\theta}} \vert \texttt{observer==2})$

for each animal.

When option `integrate=TRUE`

, a vector of integrated average detection probabilities for each observation are returned. In this case the logistic function must be integrated over the range of the distances (this is calculated by `pdot.dsr.integrate.logistic`

). We therefore calculate:
where
where some column of the design matrix $X$ are the distances. So, during the integration we hold everything else fixed and vary distance to perform the integration.

- what if distance is not in the model?

### Predictions from `predict.io`

For independent observer methods with point independence, we need to calculate $\hat{p}*\text{MR}(0,\mathbf{z};\boldsymbol{\theta})$ (the intercept or apex) from the mark-recapture part of the model and then $\hat{p}*\cdot(\mathbf{z};\boldsymbol{\theta})$ , the average detection probability at covariates $\mathbf{z}$, from the detection function part of the model. We then multiply these two quantities:
to get the predictions. That is: the predicted probability of detection given covariates and that the object was seen by observer 1 (**NB** predictions will only be made for `newdata$observer==1`

in `predict.io`

).

So, $\hat{p}*\cdot(\mathbf{z};\boldsymbol{\theta})$ is as $\hat{p}(\mathbf{z}; \boldsymbol{\theta})$ in 1, above. The other part of the prediction, $\hat{p}*{MR}(0,\mathbf{z};\boldsymbol{\theta})$ is calculated from the GLM part of the model and is given above.

## References

- Borchers, DL, JL Laake, C Southwell, and CGM Paxton.
*Accommodating Unmodeled Heterogeneity in Double‐Observer Distance Sampling Surveys.*Biometrics 62, no. 2 (2006): 372–378. doi:10.1111/j.1541-0420.2005.00493.x - Laake, JL, and DL Borchers.
*Methods for Incomplete Detection at Zero Distance.*In Advanced Distance Sampling, edited by ST Buckland, DR Anderson, KP Burnham, JL Laake, DL Borchers, and L Thomas, 48–70, Oxford University Press, 2004.