Mark-recapture distance sampling using R

This document is designed to give you some pointers so that you can perform the Mark-Recapture Distance Sampling practical directly using the mrds package in R, rather than via the Distance visual interface. I assume you have some knowledge of R, the mrds package, and Distance.

Golf tee survey

Luckily for us, the golf tee dataset is provided aspart of the mrds package, so we don’t have to worry about obtaining the data from the Distance GolfteesExercise project.

Open R and load the mrds library and golf tee dataset.

library(mrds)
data(book.tee.data)
#investigate the structure of the dataset
str(book.tee.data)

List of 4
 $ book.tee.dataframe:'data.frame': 324 obs. of  7 variables:
  ..$ object  : num [1:324] 1 1 2 2 3 3 4 4 5 5 ...
  ..$ observer: Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...
  ..$ detected: num [1:324] 1 0 1 0 1 0 1 0 1 0 ...
  ..$ distance: num [1:324] 2.68 2.68 3.33 3.33 0.34 0.34 2.53 2.53 1.46 1.46 ...
  ..$ size    : num [1:324] 2 2 2 2 1 1 2 2 2 2 ...
  ..$ sex     : num [1:324] 1 1 1 1 0 0 1 1 1 1 ...
  ..$ exposure: num [1:324] 1 1 0 0 0 0 1 1 0 0 ...
 $ book.tee.region   :'data.frame': 2 obs. of  2 variables:
  ..$ Region.Label: Factor w/ 2 levels "1","2": 1 2
  ..$ Area        : num [1:2] 1040 640
 $ book.tee.samples  :'data.frame': 11 obs. of  3 variables:
  ..$ Sample.Label: num [1:11] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ Region.Label: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 2 2 2 2 ...
  ..$ Effort      : num [1:11] 10 30 30 27 21 12 23 23 15 12 ...
 $ book.tee.obs      :'data.frame': 162 obs. of  3 variables:
  ..$ object      : int [1:162] 1 2 3 21 22 23 24 59 60 61 ...
  ..$ Region.Label: int [1:162] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Sample.Label: int [1:162] 1 1 1 1 1 1 1 1 1 1 ...

#extract the list elements from the dataset into easy-to-use objects
detections <- book.tee.data$book.tee.dataframe
#make sure sex and exposure are factor variables
detections$sex <- as.factor(detections$sex)
detections$exposure <- as.factor(detections$exposure)
region <- book.tee.data$book.tee.region
samples <- book.tee.data$book.tee.samples
obs <- book.tee.data$book.tee.obs

We’ll start by fitting the initial full independence model, with only distance as a covariate - just as was done in the “FI - MR dist” model in Distance. Indeed, if you did fit that model in Distance, you can look in the Log tab at the R code Distance generated, and compare it with the code we use here.

Feel free to use ? to find out more about any of the functions used – e.g., ?ddf will tell you more about the ddf function.

#Fit the model
fi.mr.dist <- ddf(method='trial.fi',mrmodel=~glm(link='logit',formula=~distance),
                data=detections,meta.data=list(width=4))
#Create a set of tables summarizing the double observer data (this is what Distance does)
detection.tables <- det.tables(fi.mr.dist)
#Print these detection tables out
detection.tables


Observer 1 detections
           Detected
            Missed Detected
  [0,0.4]        1       25
  (0.4,0.8]      2       16
  (0.8,1.2]      2       16
  (1.2,1.6]      6       22
  (1.6,2]        5        9
  (2,2.4]        2       10
  (2.4,2.8]      6       12
  (2.8,3.2]      6        9
  (3.2,3.6]      2        3
  (3.6,4]        6        2

Observer 2 detections
           Detected
            Missed Detected
  [0,0.4]        4       22
  (0.4,0.8]      1       17
  (0.8,1.2]      0       18
  (1.2,1.6]      2       26
  (1.6,2]        1       13
  (2,2.4]        2       10
  (2.4,2.8]      3       15
  (2.8,3.2]      4       11
  (3.2,3.6]      2        3
  (3.6,4]        1        7

Duplicate detections

  [0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6]   (1.6,2]   (2,2.4] (2.4,2.8] 
       21        15        16        20         8         8         9 
(2.8,3.2] (3.2,3.6]   (3.6,4] 
        5         1         1 

Observer 1 detections of those seen by Observer 2
          Missed Detected Prop. detected
[0,0.4]        1       21         0.9545
(0.4,0.8]      2       15         0.8824
(0.8,1.2]      2       16         0.8889
(1.2,1.6]      6       20         0.7692
(1.6,2]        5        8         0.6154
(2,2.4]        2        8         0.8000
(2.4,2.8]      6        9         0.6000
(2.8,3.2]      6        5         0.4545
(3.2,3.6]      2        1         0.3333
(3.6,4]        6        1         0.1429

# They could also be plotted, but I've not done so in the interest of space
# plot(detection.tables)

#Produce a summary of the fitted detection function object
summary(fi.mr.dist)


Summary for trial.fi object 
Number of observations               :  162 
Number seen by primary               :  124 
Number seen by secondary (trials)    :  142 
Number seen by both (detected trials):  104 
AIC                                  :  452.8 


Conditional detection function parameters:
            estimate     se
(Intercept)    2.900 0.4876
distance      -1.059 0.2236

                     Estimate       SE      CV
Average p              0.6423  0.04069 0.06335
Average primary p(0)   0.9479  0.06110 0.06446
N in covered region  193.0486 15.84826 0.08209

#Produce goodness of fit statistics and a qq plot
ddf.gof(fi.mr.dist, main="Full independence, trial mode goodness of fit\nGolftee data")

plot of chunk unnamed-chunk-1


Goodness of fit results for ddf object

Chi-square tests

Distance sampling component:
          [0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4] (2.4,2.8]
Observed   25.000   16.0000   16.0000    22.000   9.000 10.0000   12.0000
Expected   18.068   17.4790   16.6503    15.527  14.079 12.3271   10.3612
Chisquare   2.659    0.1252    0.0254     2.698   1.833  0.4393    0.2592
          (2.8,3.2] (3.2,3.6] (3.6,4]  Total
Observed    9.00000     3.000   2.000 124.00
Expected    8.33458     6.419   4.753 124.00
Chisquare   0.05313     1.821   1.595  11.51

No degrees of freedom for test

Mark-recapture component:
Capture History 01
          [0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4] (2.4,2.8]
Observed  1.00000   2.00000    2.0000    6.0000  5.0000  2.0000    6.0000
Expected  1.35639   1.61713    2.5508    5.0243  3.5512  3.7453    6.8191
Chisquare 0.09364   0.09065    0.1189    0.1895  0.5911  0.8133    0.0984
          (2.8,3.2] (3.2,3.6] (3.6,4]  Total
Observed    6.00000 2.0000000   6.000 38.000
Expected    6.18169 1.9747268   5.179 38.000
Chisquare   0.00534 0.0003235   0.130  2.131
Capture History 11
            [0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4]
Observed  21.000000  15.00000  16.00000  20.00000  8.0000   8.000
Expected  20.643613  15.38287  15.44916  20.97571  9.4488   6.255
Chisquare  0.006153   0.00953   0.01964   0.04539  0.2222   0.487
          (2.4,2.8] (2.8,3.2] (3.2,3.6] (3.6,4]   Total
Observed    9.00000  5.000000  1.000000  1.0000 104.000
Expected    8.18086  4.818310  1.025273  1.8207 104.000
Chisquare   0.08202  0.006851  0.000623  0.3699   1.249


Total chi-square =14.888  P= 0.60351 with 17 degrees of freedom

Distance sampling Kolmogorov-Smirnov test
Test statistic =  0.093983  P =  0.2234 

Distance sampling Cramer-von Mises test(unweighted)
Test statistic =  0.29456  P =  0.14003

#Calculate density estimates using the dht function
dht(fi.mr.dist,region,samples,obs)


Summary for clusters

Summary statistics:
  Region Area CoveredArea Effort   n  k     ER   se.ER   cv.ER
1      1 1040        1040    130  72  6 0.5538 0.02927 0.05285
2      2  640         640     80  52  5 0.6500 0.08293 0.12758
3  Total 1680        1680    210 124 11 0.5905 0.03884 0.06578

Abundance:
  Label Estimate     se      cv    lcl   ucl     df
1     1   112.09  9.139 0.08153  94.83 132.5 26.279
2     2    80.96 11.487 0.14189  57.39 114.2  6.108
3 Total   193.05 16.895 0.08752 161.26 231.1 24.989

Density:
  Label Estimate       se      cv     lcl    ucl     df
1     1   0.1078 0.008788 0.08153 0.09118 0.1274 26.279
2     2   0.1265 0.017948 0.14189 0.08968 0.1784  6.108
3 Total   0.1149 0.010056 0.08752 0.09599 0.1376 24.989

Summary for individuals

Summary statistics:
  Region Area CoveredArea Effort   n    ER  se.ER   cv.ER mean.size
1      1 1040        1040    130 229 1.762 0.1166 0.06618     3.181
2      2  640         640     80 152 1.900 0.3342 0.17591     2.923
3  Total 1680        1680    210 381 1.814 0.1391 0.07669     3.073
  se.mean
1  0.2087
2  0.2262
3  0.1537

Abundance:
  Label Estimate    se      cv   lcl   ucl     df
1     1    356.5 32.35 0.09075 294.5 431.5 17.131
2     2    236.6 44.14 0.18655 147.3 380.1  5.056
3 Total    593.2 60.38 0.10180 478.3 735.6 16.058

Density:
  Label Estimate      se      cv    lcl    ucl     df
1     1   0.3428 0.03111 0.09075 0.2832 0.4149 17.131
2     2   0.3698 0.06898 0.18655 0.2302 0.5939  5.056
3 Total   0.3531 0.03594 0.10180 0.2847 0.4378 16.058

Expected cluster size
  Region Expected.S se.Expected.S cv.Expected.S
1      1      3.181        0.2115       0.06649
2      2      2.923        0.1750       0.05988
3  Total      3.073        0.1391       0.04528

Now, see if you can work out how to change the call to ddf to fit the other models mentioned in the exercise, and then write code to enable you to compare the models and select among them.

Mark-recapture distance sampling using R

Eric Rexstad

August 2014

Golf tee survey