In this example we use a data set of simulated minke whales in the Southern Ocean to examine data collected in two strata.

library(Distance)
whales <- read.csv("minke.csv")
head(whales)
  Region.Label  Area Sample.Label Effort distance
1        South 84734            1  86.75     0.10
2        South 84734            1  86.75     0.22
3        South 84734            1  86.75     0.16
4        South 84734            1  86.75     0.78
5        South 84734            1  86.75     0.21
6        South 84734            1  86.75     0.95

In this dataset, the Region.Label takes on two values, South and North relating to the two strata in the data set.

Model selection: stratum-specific detection functions or not?

For full geographic stratification we make two separate calls to ds() behaving as if the two strata have nothing in common; which is what the full stratification analysis presumes.

whale.trunc <- 1.5
whale.full.strat1 <- ds(whales[whales$Region.Label=="South",], truncation=whale.trunc, 
                        key="hr",  adjustment=NULL)
whale.full.strat2 <- ds(whales[whales$Region.Label=="North",], truncation=whale.trunc,
                        key="hr",  adjustment=NULL)

The model selection metric for the full stratification analysis is the sum of AIC for the two distinct analyses.

full.aic <- summary(whale.full.strat1)$ds$aic + summary(whale.full.strat2)$ds$aic

AIC scores for the two strata analysed separately: 8.6176 + 37.2772 = 45.8948

Contrast 3.4619, 0.6002 with (3.182, 0.5547) and 2.7847, 0.9902 with (2.770, 0.9706). The first pair of estimates were produced with this current R analysis whereas the second set of each pair was produced by Program Distance 6.2 (numerical results may be slightly different on your computer).

Using stratum as a covariate

Here we manufacture a new variable in our dataset stratum based upon the Region.Label. The new variable is then used in a formula as a discrete covariate.

whales$stratum <- ifelse(whales$Region.Label=="North", "N", "S")
whale.strat.covariate <- ds(whales, truncation=whale.trunc, quiet=TRUE,
                  formula = ~as.factor(stratum),
                  key="hr",  adjustment=NULL)

AIC score for this model with stratum as a covariate is 43.9582.

whale.pooledf0 <- ds(whales, truncation=whale.trunc,
                        key="hr",  adjustment=NULL)

AIC score for this model pooling sightings from both strata into a single detection function is 48.6384.

Model selection results

Our model selection table for this stratified survey design

Model Num. parameters AIC
Full geographic stratification 4 45.8948
Detection function shared between strata 2 48.6384
Stratum as covariate 3 43.9582

This shows that the pooled analysis (AIC=48.6384) is not preferred to the full geographic stratification analysis (AIC=45.8948). The model with the smaller AIC is preferable. However if we introduce stratum as a covariate, this forms a halfway house between the extremes, with an added parameter causing the two detection functions to share the same basic shape, but detectability falls off more slowly in one stratum compared to the other (see following figure) and the lowest of the three AIC scores is 43.9582.

The detection function that falls off most rapidly is the detection function for the southern stratum, nearer the Antarctic coast where observation conditions were understandable poorer.

par(mfrow=c(1,2))
plot(whale.strat.covariate, main="Minke whales, \ndetection function uses stratum as covariate")
covar.fit <- ddf.gof(whale.strat.covariate$ddf)
message <- paste("Cramer von-Mises W=", round(covar.fit$dsgof$CvM$W,3), 
                 "\nP=", round(covar.fit$dsgof$CvM$p,3))
text(0.6, 0.1, message, cex=0.8)

Detection function and qq-plot goodness of fit for minke whales.

par(mfrow=c(1,1))

What remains is to examine the estimated abundance produced by the three models.

Stratum specific abundance estimates

Full geographic stratification

Estimates for the two strata analysed individually.

Label Estimate se cv lcl ucl df
Total 9981 3875 0.3882 4468 22298 13.97
Total 4588 1200 0.2616 2688 7833 21.14

Pooled detection function

Estimates of group abundance when a detection function is fitted to data pooled across strata. These results are actually incorrect because effort was not equally allocated between the strata. The southern stratum is much smaller (84000km2) than the northern (630000km2). But the southern stratum is more desirable habitat for the minke whales because it is closer to the ice edge in Antarctica. The southern stratum had much greater survey effort per area than the northern stratum. This is not represented in the pooled analysis.

Label Estimate se cv lcl ucl df
North 12182 4638.8 0.3808 5500 26980 12.97
South 3653 910.1 0.2491 2182 6118 17.96
Total 15835 4834.4 0.3053 8389 29892 15.26

Stratum as covariate in detection function

Group abundance estimates when strata is a covariate in the detection function (this model was preferred in the model selection exercise).

Label Estimate se cv lcl ucl df
North 9863 3760 0.3813 4451 21856 13.03
South 4651 1225 0.2633 2719 7956 22.16
Total 14514 3970 0.2735 8215 25645 16.06