# 11 Mark-recapture distance sampling of golftees

This document is designed to give you some pointers so that you can perform the Mark-Recapture Distance Sampling practical directly using the mrds package in R, rather than via the Distance graphical interface. I assume you have some knowledge of R, the mrds package, and Distance.

## 11.1 Golf tee survey

The golf tee dataset is provided as part of the mrds package, as well as in the Distance GolfteesExercise project.

Distance 7 instructions Data layers associated with the golftees dataset
• Note particularly the fields in the Observation layer: object, observer, detected.

Open R and load the mrds package and golf tee dataset.

library(knitr)
library(mrds)
data(book.tee.data)
#investigate the structure of the dataset
str(book.tee.data)
List of 4
$book.tee.dataframe:'data.frame': 324 obs. of 7 variables: ..$ object  : num [1:324] 1 1 2 2 3 3 4 4 5 5 ...
..$observer: Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ... ..$ detected: num [1:324] 1 0 1 0 1 0 1 0 1 0 ...
..$distance: num [1:324] 2.68 2.68 3.33 3.33 0.34 0.34 2.53 2.53 1.46 1.46 ... ..$ size    : num [1:324] 2 2 2 2 1 1 2 2 2 2 ...
..$sex : num [1:324] 1 1 1 1 0 0 1 1 1 1 ... ..$ exposure: num [1:324] 1 1 0 0 0 0 1 1 0 0 ...
$book.tee.region :'data.frame': 2 obs. of 2 variables: ..$ Region.Label: Factor w/ 2 levels "1","2": 1 2
..$Area : num [1:2] 1040 640$ book.tee.samples  :'data.frame': 11 obs. of  3 variables:
..$Sample.Label: num [1:11] 1 2 3 4 5 6 7 8 9 10 ... ..$ Region.Label: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 2 2 2 2 ...
..$Effort : num [1:11] 10 30 30 27 21 12 23 23 15 12 ...$ book.tee.obs      :'data.frame': 162 obs. of  3 variables:
..$object : int [1:162] 1 2 3 21 22 23 24 59 60 61 ... ..$ Region.Label: int [1:162] 1 1 1 1 1 1 1 1 1 1 ...
..$Sample.Label: int [1:162] 1 1 1 1 1 1 1 1 1 1 ... #extract the list elements from the dataset into easy-to-use objects detections <- book.tee.data$book.tee.dataframe
#make sure sex and exposure are factor variables
detections$sex <- as.factor(detections$sex)
detections$exposure <- as.factor(detections$exposure)
region <- book.tee.data$book.tee.region samples <- book.tee.data$book.tee.samples
obs <- book.tee.data$book.tee.obs We’ll start by fitting the initial full independence model, with only distance as a covariate - just as was done in the “FI - MR dist” model in Distance. Indeed, if you did fit that model in Distance, you can look in the Log tab at the R code Distance generated, and compare it with the code we use here. Feel free to use ? to find out more about any of the functions used – e.g., ?ddf will tell you more about the ddf function. #Fit the model fi.mr.dist <- ddf(method='trial.fi',mrmodel=~glm(link='logit',formula=~distance), data=detections,meta.data=list(width=4)) #Create a set of tables summarizing the double observer data (this is what Distance does) detection.tables <- det.tables(fi.mr.dist) #Print these detection tables detection.tables  Observer 1 detections Detected Missed Detected [0,0.4] 1 25 (0.4,0.8] 2 16 (0.8,1.2] 2 16 (1.2,1.6] 6 22 (1.6,2] 5 9 (2,2.4] 2 10 (2.4,2.8] 6 12 (2.8,3.2] 6 9 (3.2,3.6] 2 3 (3.6,4] 6 2 Observer 2 detections Detected Missed Detected [0,0.4] 4 22 (0.4,0.8] 1 17 (0.8,1.2] 0 18 (1.2,1.6] 2 26 (1.6,2] 1 13 (2,2.4] 2 10 (2.4,2.8] 3 15 (2.8,3.2] 4 11 (3.2,3.6] 2 3 (3.6,4] 1 7 Duplicate detections [0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4] (2.4,2.8] 21 15 16 20 8 8 9 (2.8,3.2] (3.2,3.6] (3.6,4] 5 1 1 Observer 1 detections of those seen by Observer 2 Missed Detected Prop. detected [0,0.4] 1 21 0.9545455 (0.4,0.8] 2 15 0.8823529 (0.8,1.2] 2 16 0.8888889 (1.2,1.6] 6 20 0.7692308 (1.6,2] 5 8 0.6153846 (2,2.4] 2 8 0.8000000 (2.4,2.8] 6 9 0.6000000 (2.8,3.2] 6 5 0.4545455 (3.2,3.6] 2 1 0.3333333 (3.6,4] 6 1 0.1428571 # They could also be plotted, but I've not done so in the interest of space # plot(detection.tables) #Produce a summary of the fitted detection function object summary(fi.mr.dist)  Summary for trial.fi object Number of observations : 162 Number seen by primary : 124 Number seen by secondary (trials) : 142 Number seen by both (detected trials): 104 AIC : 452.8094 Conditional detection function parameters: estimate se (Intercept) 2.900233 0.4876238 distance -1.058677 0.2235722 Estimate SE CV Average p 0.6423252 0.04069409 0.06335434 Average primary p(0) 0.9478579 0.06109655 0.06445750 N in covered region 193.0486185 15.84826458 0.08209468 #Produce goodness of fit statistics and a qq plot gof.result <- ddf.gof(fi.mr.dist, main="Full independence, trial mode goodness of fit\nGolftee data") chi.distance <- gof.result$chisquare$chi1$chisq
chi.markrecap <- gof.result$chisquare$chi2$chisq chi.total <- gof.result$chisquare$pooled.chi Distance 7 instructions Specification of the full independence model with distance as the only covariate in the mark-recapture model, fitted with a logit transform Having run this model in Distance 7, convince yourself the parameter estimates for sigma and the distance covariate are the same when using the Distance 7 interface as when using mrds() directly. Abbreviated $$\chi^2$$ goodness of fit assessment shows the $$\chi^2$$ contribution from the distance sampling model to be 11.5 and the $$\chi^2$$ contribution from the mark-recapture model to be 3.4. The combination of these elements produces a total $$\chi^2$$ of 14.9 with 17 degrees of freedom, resulting in a P-value of 0.604 #Calculate density estimates using the dht function tee.abund <- dht(fi.mr.dist,region,samples,obs) kable(tee.abund$individuals$summary, digits=2, caption="Survey summary statistics for golftees") Table 11.1: Survey summary statistics for golftees Region Area CoveredArea Effort n ER se.ER cv.ER mean.size se.mean 1 1040 1040 130 229 1.76 0.12 0.07 3.18 0.21 2 640 640 80 152 1.90 0.33 0.18 2.92 0.23 Total 1680 1680 210 381 1.81 0.14 0.08 3.07 0.15 kable(tee.abund$individuals$N, digits=2, caption="Abundance estimates for golftee population with two strata") Table 11.1: Abundance estimates for golftee population with two strata Label Estimate se cv lcl ucl df 1 356.52 32.35 0.09 294.54 431.53 17.13 2 236.64 44.14 0.19 147.33 380.09 5.06 Total 593.16 60.38 0.10 478.32 735.57 16.06 Now, see if you can work out how to change the call to ddf to fit the other models mentioned in the exercise, and then write code to enable you to compare the models and select among them. ## 11.2 Crabeater seal survey This analysis is described in Borchers et al. (2005) Biometrics paper of aerial survey data looking for seals in the Antarctic pack ice. There were four observers in the plane, two on each side (front and back). The data from the survey has been saved in a .csv file. This file is read into R using read.table(). Note that these tables are only needed when estimating abundance by scaling up from the covered region to the study area. library(Distance) crabseal <- read.csv("crabbieMRDS.csv") # Half normal detection function, 700m truncation distance, # logit function for mark-recapture component crab.ddf.io <- ddf(method="io", dsmodel=~cds(key="hn"), mrmodel=~glm(link="logit", formula=~distance), data=crabseal, meta.data=list(width=700)) summary(crab.ddf.io)  Summary for io.fi object Number of observations : 1740 Number seen by primary : 1394 Number seen by secondary : 1471 Number seen by both : 1125 AIC : 3011.463 Conditional detection function parameters: estimate se (Intercept) 2.107762345 0.0994391200 distance -0.003087713 0.0003159216 Estimate SE CV Average primary p(0) 0.8916554 0.009606428 0.010773701 Average secondary p(0) 0.8916554 0.009606428 0.010773701 Average combined p(0) 0.9882614 0.002081614 0.002106339 Summary for ds object Number of observations : 1740 Distance range : 0 - 700 AIC : 22314.4 Detection function: Half-normal key function Detection function parameters Scale coefficient(s): estimate se (Intercept) 5.828703 0.0268578 Estimate SE CV Average p 0.5845871 0.01247837 0.02134562 Summary for io object Total AIC value : 25325.86 Estimate SE CV Average p 0.5777249 0.01239179 0.02144929 N in covered region 3011.8139211 79.84197966 0.02650960 Distance 7 instructions Specification of the point independence model with distance as the only covariate in the mark-recapture model. However under point independence, not only is the a mark-recapture model but also a distance sampling model (shown here) Having run this model in Distance 7, convince yourself the parameter estimates for sigma and the distance covariate are the same when using the Distance 7 interface as when using mrds() directly. Goodness of fit could be examined in the same manner as the golf tees by the use of ddf.gof(crab.ddf.io) but I have not shown this step. Following model criticism and selection, estimation of abundance ensues. The estimates of abundance for the study area are arbitrary because inference of the study was restricted to the covered region. Hence the estimates of abundance here are artificial. For illustration, the checkdata() function produces the region, sample, and observation tables. From these tables Horvitz-Thompson like estimators can be applied to produce estimates of $$\hat{N}$$. The use of convert.units adjusts the units of perpendicular distance measurement (m) to units of transect effort (km). Be sure to perform the conversion correctly or your abundance estimates will be off by orders of magnitude. tables <- Distance:::checkdata(crabseal[crabseal$observer==1,])
crab.ddf.io.abund <- dht(region=tables$region.table, sample=tables$sample.table, obs=tables$obs.table, model=crab.ddf.io, se=TRUE, options=list(convert.units=0.001)) kable(crab.ddf.io.abund$individuals$summary, digits=3, caption="Summary information from crabeater seal aerial survey.") Table 11.2: Summary information from crabeater seal aerial survey. Region Area CoveredArea Effort n ER se.ER cv.ER mean.size se.mean 1 1e+06 8594.082 6138.63 2053 0.334 0.033 0.097 1.18 0.013 kable(crab.ddf.io.abund$individual$N, digits=3, caption="Crabeater seal abundance estimates for study area of arbitrary size.") Table 11.3: Crabeater seal abundance estimates for study area of arbitrary size. Label Estimate se cv lcl ucl df Total 413493.2 41201.49 0.09964248 339670.9 503359.6 128.6257 Distance 7 instructions Specification of the point independence model with distance as the only covariate in the mark-recapture model. However under point independence, not only is the a mark-recapture model but also a distance sampling model (shown here). The abundance estimates are not an exact match between the Distance 7 analysis and the mrds() analysis because of slight differences in the datasets. However, there is general agreement that the seal population size in this arbitrary study area is approximately 400,000 with a CV of ~10%. ### 11.2.1 Extra credit – Crabeater seals with MCDS We can also analyse the crabeater seals data as if it were MCDS data (ignoring $$g(0)$$). Data identical to that avaiable in the Distance project CrabbieMCDSExercise.zip has been ported to crabbieMCDS.csv as if you had entered these data yourself into a spreadsheet. This short exercise guides you through the import of those data into R, and fits a simple half-normal detection function examining the possible improvement of the model by incorporating a side of plane covariate. library(Distance) crab.covariate <- read.csv("crabbieMCDS.csv") head(crab.covariate, n=3)  Study.area Region.Label Area Sample.Label Effort distance size side 1 Nominal_area 1 1000000 99A21 59.72 144.49 1 R 2 Nominal_area 1 1000000 99A21 59.72 125.16 1 L 3 Nominal_area 1 1000000 99A21 59.72 421.40 1 L exp fatigue gscat vis glare ssmi altitude obsname 1 0.0 61.90 1 G N 79 43.05763 YH 2 211.7 62.61 1 G N 79 43.05763 MF 3 0.0 62.86 1 G N 79 43.05763 MH Noting that the data have been read into R appropriately, we are ready to fit a detection function. Note that side of plane is assigned characters (L/R) in our data. Before using side of plane as a covariate, we need to tell R that side of plane takes on discrete values, hence characters are acceptable for use; this happens with the as.factor() function. ds.side <- ds(crab.covariate, key="hn", formula=~as.factor(side), truncation=700) Model contains covariate term(s): no adjustment terms will be included. Fitting half-normal key function AIC= 22304.742 Having fitted a detection function using side of plane as a covariate to the half-normal detection function, we would like to assess the fit of this function to our data. Two visual assessments are provided by the panels below: histogram and function on the left and QQ plot on the right. par(mfrow = c(1, 2)) plot.title <- "Two sets of points\none for each 'side' of plane" plot(ds.side, pch=19, cex=0.5, main = plot.title) gof.result <- ds.gof(ds.side, lwd = 2, lty = 1, pch = ".", cex = 0.5) message <- paste("CVM GOF p-value=", round(gof.result$dsgof$CvM$p, 4))
text(0.6, 0.2, message, cex=0.5)

We could also fit a detection function without the covariate and contrast the two models.

ds.nocov <- ds(crab.covariate, key="hn", formula=~1, truncation=700)
Starting AIC adjustment term selection.
Fitting half-normal key function
Key only model: not constraining for monotonicity.
AIC= 22314.398
Fitting half-normal key function with cosine(2) adjustments
AIC= 22308.645
Fitting half-normal key function with cosine(2,3) adjustments
AIC= 22304.015
Fitting half-normal key function with cosine(2,3,4) adjustments
AIC= 22305.943

Half-normal key function with cosine(2,3) adjustments selected.

AIC score for model without covariate is 22304 and AIC score for model with side as a covariate is 22305 so the model that includes side as a covariate is preferred.

Abundance estimates are provided if you print the contents of summary.covar or summary.nocovar to your console, but we do not provide those results.

### References

Borchers, D. L., J. L. Laake, C. Southwell, and C. G. M. Paxton. 2005. “Accommodating Unmodeled Heterogeneity in Double-Observer Distance Sampling Surveys.” Biometrics 62 (2). Wiley-Blackwell: 372–78. doi:10.1111/j.1541-0420.2005.00493.x.