Variance estimation for systematic survey designs solution

Author

Centre for Research into Ecological and Environmental Modelling
University of St Andrews

Modified

November 2024

Solution

Variance estimation for systematic designs

Basic (default) variance estimation

Recall the data for this example, in which we have a strong gradient in animal density across our study region and at the same time we have a difference in the lengths of the transects, such that short transects are in regions of high animal density and long transects are in regions of low animal density.

library(Distance)
data("Systematic_variance_2")
conversion.factor <- convert_units("metre", "kilometre", "square kilometre")
sysvar2.hn <- ds(data=Systematic_variance_2, key="hn", adjustment="cos",
                 convert_units=conversion.factor)
print(sysvar2.hn$dht$individuals$D)
  Label Estimate       se        cv      lcl      ucl       df
1 Total 2044.592 566.3958 0.2770214 1161.012 3600.614 20.74468
print(sysvar2.hn$dht$individuals$N)
  Label Estimate       se        cv     lcl      ucl       df
1 Total 1022.296 283.1979 0.2770214 580.506 1800.307 20.74468

The point estimates are good (\(\hat D = 2,044\) animals per unit area and \(\hat N=1,022\) - note the size of the area) but the precision obtained with the default estimator is poor: estimated abundance ranges from about 580 to 1,800 - a three-fold difference over which we are uncertain. Given that our survey covered 40% of the triangular region and had a good sample size (254 animals on 20 transects), this would be a disappointing result in practice.

Variance estimation with bootstrapping

The function bootdht_Nhat_summarize pulls out the estimates of abundance \((\hat{N_i})\) for all bootstrap replicates \(i = 1, \cdots, N_{boot}\) where \(N_{boot}\) is the number of replicates requested.

The following command performs the bootstrap.

est.boot <- bootdht(model=sysvar2.hn, flatfile=Systematic_variance_2,
                    summary_fun=bootdht_Nhat_summarize, 
                    convert_units=conversion.factor, nboot=100)
summary(est.boot)
Bootstrap results

Boostraps          : 100 
Successes          : 100 
Failures           : 0 

     median    mean     se    lcl    ucl   cv
Nhat 967.21 1024.58 257.47 651.87 1595.4 0.27

The bootstrap results are very similar to the analytical results, as we would expect, because again this process assumed the transects were placed at random.

Post-stratification to improve variance estimation

Systematic_variance_2$Sample.Label <- as.numeric(Systematic_variance_2$Sample.Label)
est.O2 <- dht2(sysvar2.hn, flatfile=Systematic_variance_2, 
               strat_formula=~1, convert_units=conversion.factor, er_est="O2")
print(est.O2, report="density")
Density estimates from distance sampling
Stratification : geographical 
Variance       : O2, n/L 
Multipliers    : none 
Sample fraction : 1 


Summary statistics:
 .Label Area CoveredArea Effort   n  k     ER se.ER cv.ER
  Total  0.5      0.1922   9.61 254 20 26.431 1.459 0.055

Density estimates:
 .Label Estimate      se   cv      LCI      UCI     df
  Total 2044.592 162.914 0.08 1744.988 2395.636 75.871

Component percentages of variance:
 .Label Detection    ER
  Total     52.03 47.97

The precision of the estimated abundance has greatly improved in the post-stratified analysis (Fewster et al., 2009).

It must be remembered that we have not made any change to our data by the post-stratification; we are using getting a better estimate of the variance. In this case, the increase in precision could make a fundamental difference to the utility of the survey: it might make the difference between being able to make a management decision or not. Usually, trends will not be as extreme as they are in this example and post-stratification will not make a great difference. Such an situation is illustrated in the next problem.

Systematic designs where post-stratification is not needed (optional)

These data did not exhibit strong trends across the survey region and, hence, there are no great differences between the CVs and 95% confidence intervals using the two methods.

# Access the data
data("Systematic_variance_1")
Systematic_variance_1$Sample.Label <- as.numeric(Systematic_variance_1$Sample.Label)
sysvar1.hn <- ds(Systematic_variance_1, key="hn", adjustment=NULL, 
                 convert_units=conversion.factor)
print(sysvar1.hn$dht$individuals$D)
  Label Estimate       se         cv      lcl      ucl       df
1 Total 1954.016 160.5554 0.08216688 1657.276 2303.888 50.59541
print(sysvar1.hn$dht$individuals$N)
  Label Estimate       se         cv      lcl      ucl       df
1 Total 977.0078 80.27768 0.08216688 828.6378 1151.944 50.59541
est2.O2 <- dht2(sysvar1.hn, flatfile=Systematic_variance_1, strat_formula=~1,
                convert_units=conversion.factor, er_est="O2")
print(est2.O2, report="density")
Density estimates from distance sampling
Stratification : geographical 
Variance       : O2, n/L 
Multipliers    : none 
Sample fraction : 1 


Summary statistics:
 .Label Area CoveredArea Effort   n  k    ER se.ER cv.ER
  Total  0.5      0.2058  10.29 252 20 24.49 1.594 0.065

Density estimates:
 .Label Estimate      se    cv      LCI      UCI     df
  Total 1954.015 162.491 0.083 1653.804 2308.723 49.172

Component percentages of variance:
 .Label Detection    ER
  Total     38.76 61.24

References

Fewster, R. M., Buckland, S. T., Burnham, K. P., Borchers, D. L., Jupp, P. E., Laake, J. L., & Thomas, L. (2009). Estimating the encounter rate variance in distance sampling. Biometrics, 65(1), 225–236. https://doi.org/10.1111/j.1541-0420.2008.01018.x