Analysis of animals that occur in groups

Author

Centre for Research into Ecological and Environmental Modelling
University of St Andrews

Modified

November 2024

Analysis of animals that occur in groups

Description of survey

The data for this practical is simulated (so we can compare estimates derived from our analysis with truth), but the simulation is based upon a series of surveys conducted in the eastern Atlantic 2010-2019 by the U.S. National Marine Fisheries Service for a programme called Atlantic Marine Assessment Program for Protected Species (Palka et al., 2017). Estimates of abundance of Risso’s dolphins (Grampus griseus) were derived from these survey data and reported in Roberts et al. (2022).

Because the data are simulated, we know the following characteristics of the population we are studying:

Attribute true value
Number of groups 4333
Average group size 6
Number of individuals 26000

You will contrast the estimates you derive from your analyses against these known values.

Read in the data

In contrast to other data sets associated with R practicals, this data set is not included in the Distance R package. The code in the next chunk reads the CSV file directly from the workshop training materials website. Copy the code exactly, alternatively, you can download the data file from this link

rissogithub <- "https://raw.githubusercontent.com/distanceworkshops/async2024-2/main/09-clusters/R-prac/Risso_survey.csv"
risso <- read.csv(rissogithub)
head(risso, n=3)
  Region.Label    Area Sample.Label Effort distance size
1       region 1300000            1   1300    0.129    3
2       region 1300000            1   1300    1.464    4
3       region 1300000            1   1300    0.386    5

Data organisation

From the first few lines of the Risso’s data frame that organising the data for animals detected in groups is very similar to animals detected individually. There remains Region.Label, Area, Sample.Label, Effort and distance fields. To this is added the field size for each detection, representing the size of the detected group.

Focus on group size

Risso’s dolphins (and many other species), travel in aggregations. Explore the distribution of observed group sizes in this data set.

aveobs.size <- round(mean(risso$size),2)
histlabel <- paste0("Observed group sizes of Risso's dolphins\n",
                    "Mean observed size= ", aveobs.size)
hist(risso$size, nc=15, main=histlabel, xlab="Observed group size")

It is important to note that the mean size of detected groups is 7.21, because we know the mean size of groups in the population was 6.

We can explore the possible reason for this difference between mean size in the population and mean size in our sample by creating this diagnostic plot. This plot should be a standard part of any analysis that involves detection of animals in groups.

with(risso,scatter.smooth(distance, size, pch=20,
                          lwd=2, xlab="Detection distance",
                          ylab="Size of detected group",
                          main="Risso's survey\nDiagnostic plot"))
Figure 1: Diagnostic plot for existance of size bias in group size estimates.

Note in Figure 1 the absence of small groups at distances beyond ~2nm. Beyond ~4nm, not groups are detected that are smaller than 6. This pattern is indicative of bias in the estimated size of the average group. We will see the consequence of this bias in average group size when we perform a naive analysis.

Analysis of survey data

Naive analysis

If we apply distance sampling to the perpendicular distances recorded to the centre of the detected groups, we will estimate the abundance of groups (\(\widehat{N_s}\)), corrected for imperfect detectability. To convert abundance of groups (\(\widehat{N_s}\)) to abundance of individuals (\(\widehat{N}\)), we multiply:

\[\widehat{N_s} \times \bar{s} = \widehat{N}\] where \(\bar{s}\) is the average size of groups in the population (Buckland et al., 2015, sec. 6.3.1.3). We do not know the average size of groups in the population, but rather we estimate it by using the average size of our detected groups.

Because there exists a field named size in the risso data frame, the ds software knows observations were of groups. Output from ds will create estimates both of \(\widehat{N_s}\) and \(\widehat{N}\) (if study region Area is also provided), and companion estimates \(\widehat{D_s}\) and \(\widehat{D}\).

Fit the three key function detection models to the data in the usual manner and perform model selection to choose a most appropriate model (also perform absolute goodness of fit, courtesy of `summarize_ds_models).

library(Distance)
naive.uniform <- ds(data=risso, key="unif", adjustment="cos")
naive.hn <- ds(data=risso, key="hn", adjustment = "cos") 
naive.hr <- ds(data=risso, key="hr", adjustment = NULL) # no adjustments for simplicity
knitr::kable(summarize_ds_models(naive.uniform, naive.hn, naive.hr, output="plain"),
             digits=3, caption="Model selection for models not considering size bias.")

All three models fit. It is a close AIC contest between the unadjusted hazard rate and the half normal with one adjustment. For our purposes, let’s focus upon the hazard rate model, although the inference will be virtually identical were we to use the half normal model for our inference.

plot(naive.hr, nc=40, main="Hazard rate model with no adjustments")

A brief look at the data summary coming from the fitted model. Evaluate the number of detections and numbers of replicate transects.

knitr::kable(naive.hr$dht$clusters$summary)

Examine the estimates of abundance of clusters (\(\widehat{N_s}\)) from the hazard rate model. Note how the following code carefully extracts estimates only for clusters. We could do the same for the estimated density of clusters, but omit that here.

knitr::kable(naive.hr$dht$clusters$N)

Compare this estimate against the number of clusters (4333) in the population we simulated.

How well did this model estimate the number of individuals (\(\widehat{N}\)) in the population?

knitr::kable(naive.hr$dht$individuals$N)

Compare this estimate against the number of individuals in the population (26000). The reason for this lies in the estimation of average group size in the population, also estimated in the model object by the average size of detected groups:

knitr::kable(naive.hr$dht$Expected.S)

Compare this estimate against the estimate we produced during our exploratory data analysis plotting the histogram of sizes of observed groups: 7.21.

Analysis adjusting for size bias problem

Because we recognize that the size of clusters influences probability of their inclusion in our sample, we can incorporate this concept in the detection function model we construct. As introduced in Module 8, we can add covariates in addition to perpendicular distance into our detection function models. This is what we will do to perhaps overcome the size bias problem.

Aside: average group size as derived parameter
  • When using group size as a covariate we no longer estimate average group size directly from observed group sizes. Instead, the ds function uses something called Horwitz-Thompson-like estimators to first estimate the abundance of groups. Then, using the same estimation approach, ds estimates the abundance of individuals.
  • After the two estimates \(\widehat{N_s}\) and \(\widehat{N}\) are produced, their ratio is computed. This ratio is a less biased estimate of average group size in the population.
  • See the next-to-last slide in Module 8 lecture
clever.hn <- ds(data=risso, key="hn", formula = ~size) 
clever.hr <- ds(data=risso, key="hr", formula = ~size)

Contrast AIC scores between hazard rate and half normal models with and without the size covariate.

AIC(naive.hn, naive.hr, clever.hn, clever.hr)

Compare relative and absolute fit of hazard rate and half normal key functions that include the size covariate.

knitr::kable(summarize_ds_models(clever.hn, clever.hr, output="plain"), digits=3,
             caption = "Comparison of hazard rate and half normal models incorporating group size as covariate.")

Using the best model based upon AIC, repeat the model output interrogation you conducted (above) with the naive models: examine the estimates of \(\widehat{N_s}\), \(\widehat{N}\) and \(E(s)\) (defined as the expected value of group size in the population):

knitr::kable(clever.hr$dht$clusters$N)

Compare this estimate against the number of clusters (4333) in the population we simulated.

How well did this model estimate the number of individuals (\(\widehat{N}\)) in the population?

knitr::kable(clever.hr$dht$individuals$N)

Compare this estimate against the number of individuals in the population (26000). The reason for this lies in the estimation of average group size in the population, also estimated in the model object by the average size of detected groups:

knitr::kable(clever.hr$dht$Expected.S)

References

Buckland, S. T., Rexstad, E. A., Marques, T. A., & Oedekoven, C. S. (2015). Distance sampling: Methods and applications. Springer.
Palka, D. L., Chavez-Rosales, S., Josephson, E., Cholewiak, D., Haas, H. L., Garrison, L., … Orphanides, C. (2017). Atlantic Marine Assessment Program for Protected Species: 2010-2014; Supplement to Final Report Boem 2017-071; Appendix I. Bureau of Ocean Energy Management.
Roberts, J. J., Yack, T. M., Cañadas, A., Fujioka, E., Halpin, P. N., Barco, S. G., … Zoidis, A. M. (2022). Density model for Risso’s dolphin (Grampus griseus) for the U.S. East Coast, version 5.1, 2023-05-27, and supplementary report. Durham North Carolina: Marine Geospatial Ecology Laboratory, Duke University.