This document is designed to give you some pointers so that you can perform the Mark-Recapture Distance Sampling practical directly using the mrds package in R, rather than via the Distance visual interface. I assume you have some knowledge of R, the mrds package, and Distance.
Luckily for us, the golf tee dataset is provided aspart of the mrds package, so we don’t have to worry about obtaining the data from the Distance GolfteesExercise project.
Open R and load the mrds library and golf tee dataset.
library(mrds)
data(book.tee.data)
#investigate the structure of the dataset
str(book.tee.data)
List of 4
$ book.tee.dataframe:'data.frame': 324 obs. of 7 variables:
..$ object : num [1:324] 1 1 2 2 3 3 4 4 5 5 ...
..$ observer: Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...
..$ detected: num [1:324] 1 0 1 0 1 0 1 0 1 0 ...
..$ distance: num [1:324] 2.68 2.68 3.33 3.33 0.34 0.34 2.53 2.53 1.46 1.46 ...
..$ size : num [1:324] 2 2 2 2 1 1 2 2 2 2 ...
..$ sex : num [1:324] 1 1 1 1 0 0 1 1 1 1 ...
..$ exposure: num [1:324] 1 1 0 0 0 0 1 1 0 0 ...
$ book.tee.region :'data.frame': 2 obs. of 2 variables:
..$ Region.Label: Factor w/ 2 levels "1","2": 1 2
..$ Area : num [1:2] 1040 640
$ book.tee.samples :'data.frame': 11 obs. of 3 variables:
..$ Sample.Label: num [1:11] 1 2 3 4 5 6 7 8 9 10 ...
..$ Region.Label: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 2 2 2 2 ...
..$ Effort : num [1:11] 10 30 30 27 21 12 23 23 15 12 ...
$ book.tee.obs :'data.frame': 162 obs. of 3 variables:
..$ object : int [1:162] 1 2 3 21 22 23 24 59 60 61 ...
..$ Region.Label: int [1:162] 1 1 1 1 1 1 1 1 1 1 ...
..$ Sample.Label: int [1:162] 1 1 1 1 1 1 1 1 1 1 ...
#extract the list elements from the dataset into easy-to-use objects
detections <- book.tee.data$book.tee.dataframe
#make sure sex and exposure are factor variables
detections$sex <- as.factor(detections$sex)
detections$exposure <- as.factor(detections$exposure)
region <- book.tee.data$book.tee.region
samples <- book.tee.data$book.tee.samples
obs <- book.tee.data$book.tee.obs
We’ll start by fitting the initial full independence model, with only distance as a covariate - just as was done in the “FI - MR dist” model in Distance. Indeed, if you did fit that model in Distance, you can look in the Log tab at the R code Distance generated, and compare it with the code we use here.
Feel free to use ?
to find out more about any of the functions used – e.g., ?ddf
will tell you more about the ddf
function.
#Fit the model
fi.mr.dist <- ddf(method='trial.fi',mrmodel=~glm(link='logit',formula=~distance),
data=detections,meta.data=list(width=4))
#Create a set of tables summarizing the double observer data (this is what Distance does)
detection.tables <- det.tables(fi.mr.dist)
#Print these detection tables out
detection.tables
Observer 1 detections
Detected
Missed Detected
[0,0.4] 1 25
(0.4,0.8] 2 16
(0.8,1.2] 2 16
(1.2,1.6] 6 22
(1.6,2] 5 9
(2,2.4] 2 10
(2.4,2.8] 6 12
(2.8,3.2] 6 9
(3.2,3.6] 2 3
(3.6,4] 6 2
Observer 2 detections
Detected
Missed Detected
[0,0.4] 4 22
(0.4,0.8] 1 17
(0.8,1.2] 0 18
(1.2,1.6] 2 26
(1.6,2] 1 13
(2,2.4] 2 10
(2.4,2.8] 3 15
(2.8,3.2] 4 11
(3.2,3.6] 2 3
(3.6,4] 1 7
Duplicate detections
[0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4] (2.4,2.8]
21 15 16 20 8 8 9
(2.8,3.2] (3.2,3.6] (3.6,4]
5 1 1
Observer 1 detections of those seen by Observer 2
Missed Detected Prop. detected
[0,0.4] 1 21 0.9545
(0.4,0.8] 2 15 0.8824
(0.8,1.2] 2 16 0.8889
(1.2,1.6] 6 20 0.7692
(1.6,2] 5 8 0.6154
(2,2.4] 2 8 0.8000
(2.4,2.8] 6 9 0.6000
(2.8,3.2] 6 5 0.4545
(3.2,3.6] 2 1 0.3333
(3.6,4] 6 1 0.1429
# They could also be plotted, but I've not done so in the interest of space
# plot(detection.tables)
#Produce a summary of the fitted detection function object
summary(fi.mr.dist)
Summary for trial.fi object
Number of observations : 162
Number seen by primary : 124
Number seen by secondary (trials) : 142
Number seen by both (detected trials): 104
AIC : 452.8
Conditional detection function parameters:
estimate se
(Intercept) 2.900 0.4876
distance -1.059 0.2236
Estimate SE CV
Average p 0.6423 0.04069 0.06335
Average primary p(0) 0.9479 0.06110 0.06446
N in covered region 193.0486 15.84826 0.08209
#Produce goodness of fit statistics and a qq plot
ddf.gof(fi.mr.dist, main="Full independence, trial mode goodness of fit\nGolftee data")
Goodness of fit results for ddf object
Chi-square tests
Distance sampling component:
[0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4] (2.4,2.8]
Observed 25.000 16.0000 16.0000 22.000 9.000 10.0000 12.0000
Expected 18.068 17.4790 16.6503 15.527 14.079 12.3271 10.3612
Chisquare 2.659 0.1252 0.0254 2.698 1.833 0.4393 0.2592
(2.8,3.2] (3.2,3.6] (3.6,4] Total
Observed 9.00000 3.000 2.000 124.00
Expected 8.33458 6.419 4.753 124.00
Chisquare 0.05313 1.821 1.595 11.51
No degrees of freedom for test
Mark-recapture component:
Capture History 01
[0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4] (2.4,2.8]
Observed 1.00000 2.00000 2.0000 6.0000 5.0000 2.0000 6.0000
Expected 1.35639 1.61713 2.5508 5.0243 3.5512 3.7453 6.8191
Chisquare 0.09364 0.09065 0.1189 0.1895 0.5911 0.8133 0.0984
(2.8,3.2] (3.2,3.6] (3.6,4] Total
Observed 6.00000 2.0000000 6.000 38.000
Expected 6.18169 1.9747268 5.179 38.000
Chisquare 0.00534 0.0003235 0.130 2.131
Capture History 11
[0,0.4] (0.4,0.8] (0.8,1.2] (1.2,1.6] (1.6,2] (2,2.4]
Observed 21.000000 15.00000 16.00000 20.00000 8.0000 8.000
Expected 20.643613 15.38287 15.44916 20.97571 9.4488 6.255
Chisquare 0.006153 0.00953 0.01964 0.04539 0.2222 0.487
(2.4,2.8] (2.8,3.2] (3.2,3.6] (3.6,4] Total
Observed 9.00000 5.000000 1.000000 1.0000 104.000
Expected 8.18086 4.818310 1.025273 1.8207 104.000
Chisquare 0.08202 0.006851 0.000623 0.3699 1.249
Total chi-square =14.888 P= 0.60351 with 17 degrees of freedom
Distance sampling Kolmogorov-Smirnov test
Test statistic = 0.093983 P = 0.2234
Distance sampling Cramer-von Mises test(unweighted)
Test statistic = 0.29456 P = 0.14003
#Calculate density estimates using the dht function
dht(fi.mr.dist,region,samples,obs)
Summary for clusters
Summary statistics:
Region Area CoveredArea Effort n k ER se.ER cv.ER
1 1 1040 1040 130 72 6 0.5538 0.02927 0.05285
2 2 640 640 80 52 5 0.6500 0.08293 0.12758
3 Total 1680 1680 210 124 11 0.5905 0.03884 0.06578
Abundance:
Label Estimate se cv lcl ucl df
1 1 112.09 9.139 0.08153 94.83 132.5 26.279
2 2 80.96 11.487 0.14189 57.39 114.2 6.108
3 Total 193.05 16.895 0.08752 161.26 231.1 24.989
Density:
Label Estimate se cv lcl ucl df
1 1 0.1078 0.008788 0.08153 0.09118 0.1274 26.279
2 2 0.1265 0.017948 0.14189 0.08968 0.1784 6.108
3 Total 0.1149 0.010056 0.08752 0.09599 0.1376 24.989
Summary for individuals
Summary statistics:
Region Area CoveredArea Effort n ER se.ER cv.ER mean.size
1 1 1040 1040 130 229 1.762 0.1166 0.06618 3.181
2 2 640 640 80 152 1.900 0.3342 0.17591 2.923
3 Total 1680 1680 210 381 1.814 0.1391 0.07669 3.073
se.mean
1 0.2087
2 0.2262
3 0.1537
Abundance:
Label Estimate se cv lcl ucl df
1 1 356.5 32.35 0.09075 294.5 431.5 17.131
2 2 236.6 44.14 0.18655 147.3 380.1 5.056
3 Total 593.2 60.38 0.10180 478.3 735.6 16.058
Density:
Label Estimate se cv lcl ucl df
1 1 0.3428 0.03111 0.09075 0.2832 0.4149 17.131
2 2 0.3698 0.06898 0.18655 0.2302 0.5939 5.056
3 Total 0.3531 0.03594 0.10180 0.2847 0.4378 16.058
Expected cluster size
Region Expected.S se.Expected.S cv.Expected.S
1 1 3.181 0.2115 0.06649
2 2 2.923 0.1750 0.05988
3 Total 3.073 0.1391 0.04528
Now, see if you can work out how to change the call to ddf
to fit the other models mentioned in the exercise, and then write code to enable you to compare the models and select among them.