Distance sampling survey design

Author

Centre for Research into Ecological and Environmental Modelling
University of St Andrews

Modified

November 2024

Introduction

We provide two exercises in survey design so you can choose the one you feel is most useful to you.

The first example involves designing a line transect survey to estimate the abundance of porpoise, common dolphins and seals in and around St Andrews Bay.
- It considers how you choose your design based on effort limitations.
- It also compares an aerial survey based on systematic parallel lines with a boat based survey using zigzags.
The second example involves designing a point transect bird survey in Tentsmuir Forest.
- This looks at how to project your study area from latitude and longitude on to a flat plane using R.
- It also involves defining a design for multiple strata with different coverage in each strata.

Systematic parallel line aerial survey of marine mammals in St Andrews bay

Exercise objective

Decide on a systematic line spacing which uses as much of the 250km of available effort as possible without risking not being able to complete the survey. We must remember that some of the 250km will be spent travelling off-effort between transects.

This project involves designing an aerial survey of porpoise, common dolphins and seals in and around St Andrews Bay. (For those who know the area: the nearer St Andrews bay region has been extended in an easterly direction out past Bell Rock, as there are some pockets of deeper water that are of interest with regard to the distribution of cetaceans. The survey region has a chunk missing due to a no-fly zone around Buddon Ness, just south of Carnoustie because of the Royal Marines base there).

This uses the package leaflet to give you a better image of the portion of the North Sea where the aerial survey for marine mammals was proposed, here is a quick figure, showing the Kingdom of Fife extending eastward from the coast of Scotland into the North Sea.

St Andrews Bay, north and east of St Andrews

Load the distance sampling survey design R library.

library(dssd)

Study Region

Set up the study region and plot it. The shapefile for this study area is contained within the dssd R library. This shapefile has already been projected from latitude and longitude on to a flat plane and its units are in metres. The first line of code below returns the path for the shapefile within the R library and may vary on different computers. You will then pass this shapefile pathway to the make.region function to set up the survey region. As this shapefile does not have a projection (.prj) file associated with it we should tell dssd the units (m) when we create the survey region.

#Find the pathway to the file
shapefile.name <- system.file("extdata", "StAndrew.shp", package = "dssd")
#Create the region using this shapefile
region <- make.region(region.name = "St Andrews Bay",
                      units = "m",
                      shape = shapefile.name)
#Plot the region
plot(region)

Coverage

The next step is to set up a coverage grid. A coverage grid is a grid of regularly spaced points and is used to assess whether each point in the survey region is equally likely to sampled. The coverage grid needs to be created before the coverage simulation is run.

# Set up coverage grid
cover <- make.coverage(region, n.grid.points = 500)

Systematic Parallel Design

The small survey plane available can complete a total flight time of around 250km (excluding the flight time to and from the landing strip at Fife Ness). Generally, systematic parallel line designs are preferable for aerial surveys as they allow some rest time for observers as the plane travels between transects and avoids the sharp turns associated with zigzag designs.

Firstly, we will consider the design angle. Often animal density is affected by distance to coast so it is probably wise for this survey to orientate lines approximately perpendicular to the coast. To do this we can select a design angle of 90 degrees. We can therefore expect to spend a little more than 40 km (the height of the survey region) on off-effort transit time and might hope to be able to complete around 200km of transects. dssd lets us specify the desired line length as a design parameter and will then choose an appropriate value for transect spacing. We will choose a minus sampling strategy and set the truncation distance to 2km. Note that as our survey region coordinates are in metres we also need to supply the design parameters in metres.

# Define the design
design.LL200 <- make.design(region = region,
                      transect.type = "line",
                      design = "systematic",
                      line.length = 200000,
                      design.angle = 90,
                      edge.protocol = "minus",
                      truncation = 2000,
                      coverage.grid = cover)

Now we have defined the design we should check it visually by creating a survey (a single set of transects).

# Create a single survey from the design
survey.LL200 <- generate.transects(design.LL200)
# Plot the region and the survey
plot(region, survey.LL200)

Single survey generated from the systematic parallel line design

The survey consists of parallel systematically spaced transects running horizontally across the survey region roughly perpendicular to the coast as we wanted. We can also view the details of the survey which will tell us what spacing dssd used to try and achieve a line length of 200 km.

# Display the survey details
survey.LL200


   Strata St Andrews Bay:
   _______________________
Design:  systematically spaced parallel transects
Spacing:  4937.5
Line length: 199960.8
Trackline length: 248012.3
Cyclic trackline length: 284758.4
Number of samplers:  8
Design angle:  90
Edge protocol:  minus
Covered area:  779476845
Strata coverage: 78.93%
Strata area:  987500079

   Study Area Totals:
   _________________
Line length: 199960.8
Trackline length: 248012.3
Cyclic trackline length: 284758.4
Number of samplers:  8
Covered area:  779476845
Average coverage: 78.93%

The spacing used by dssd was 4938m which gives us 8 samplers and a coverage of just under 80%. In addition, this example survey has a line length of just under 200km and a trackline length of just over 248 km. However, given the random nature of the design and the fact that the width of the study region is not constant everybody should get slightly different values. Although the trackline length was just under 250km it is not sufficient to only look at one survey, we need to know that all surveys under this design will have a trackline length of < 250km.

To assess the design statistics across many surveys we will now run a coverage simulation. This simulation will randomly generate many surveys from our design and record coverage as well as various statistics including line length and trackline length. As we know that coverage for a parallel line design is largely uniform (apart from edge effects due to minus sampling) we do not need to run too many repetitions, 100 should be sufficient to give us an indication of the range of line lengths and trackline lengths for this design.

# Run the coverage simulation
design.LL200 <- run.coverage(design.LL200, reps = 100)
design.LL200

Examine the design statistics. The mean line length should be around 200km (200,000 m). Now look at the maximum trackline length, we need this value to be less than 250km (250,000 m).

Use the results of this simulation to create some new designs based on various spacings to find the maximum line length that can be achieved without risking exceeding the maximum trackline length of 250km (remember to generate a line length of 200km dssd selected a spacing of 4938m). Try transect spacings of 5000m or 5500m.

# Define the design
design.space500 <- make.design(region = region,
                      transect.type = "line",
                      design = "systematic",
                      spacing = 5000,
                      design.angle = 90,
                      edge.protocol = "minus",
                      truncation = 2000,
                      coverage.grid = cover)

Questions:

What spacing would you select for this design?
What is the maximum trackline length for the design you have selected?
What on-effort line length are we likely to achieve?

Zigzag Design

Zigzag designs are often more efficient in their use of effort having less off-effort transit time between transects. For this survey another option would be to complete a boat-based survey. The boat survey will have the same total effort available allowing us a trackline length of 250km.

Define a zigzag design for the same region. For zigzag designs the design angle has a different definition, it describes the angle across which the zigzags are constructed. Set the vertical design angle to 0. Zigzag designs also require an additional argument as zigzags can only be created inside convex shapes. Specify the bounding shape, choose a convex hull as it is more efficient than a minimum bounding rectangle. A convex hull works as if we were stretching an elastic band around the survey region. The code below shows you how to create the zigzag design, you should then create a single realisation of this design and plot it to check it looks acceptable.

# Define the zigzag design
design.zz.4500 <- make.design(region = region,
                      transect.type = "line",
                      design = "eszigzag",
                      spacing = 4500,
                      design.angle = 0,
                      edge.protocol = "minus",
                      bounding.shape = "convex.hull",
                      truncation = 2000,
                      coverage.grid = cover)

survey.zz <- generate.transects(design.zz.4500)
plot(region, survey.zz)

Single survey generated from the equal spaced zigzag design

Run a coverage simulation to verify that we have stayed within the restraints of our survey effort; a total trackline length of < 250km. Run the coverage simulation with more repetitions to also assess the coverage.

Examine the design statistics.

Questions:

Does this design meet our survey effort constraint?
What is the maximum total trackline length for this design?
What line length are we likely to achieve with this design?
Is this higher or lower than the systematic parallel design?

# Run the coverage simulation
design.zz.4500 <- run.coverage(design.zz.4500, reps = 500)
# Display the design statistics
design.zz.4500

Examine the coverage. Sometimes with zigzag surveys generated inside convex hulls produce areas of higher coverage in narrower parts of the survey region at either end of the design axis. One of the easiest ways to assess coverage is visually by plotting the coverage grid.

# Plot the coverage grid
plot(design.zz.4500)

Questions:

Do you think the coverage scores look uniform across the study region?
Given the lack of uniformity, where are they higher/lower?
Why do you think this is?
- Note, revisit one of the parallel line designs and plot the coverage scores to compare (although there are fewer repetitions you can still get an idea of coverage).

Point Transect Bird Survey in Tentsmuir Forest

Exericse objective

Design a point transect survey looking at songbird abundance in Tentsmuir Forest so that separate estimates of abundance can be obtained for the two strata.

Tenstsmuir Forest is a mature pine forest located on the east coast of Scotland ~10km north of St Andrews. The western area of Tentsmuir around Morton Lochs is of special interest in this study as this part of the forest forms part of the Tentsmuir National Nature Reserve. This study region has therefore been divided into two stratum, the first is the main part of the forest and the second is the area of forest adjacent to Morton Lochs.

Load the distance sampling survey design R library and also the spatial library sf. The sf library needs to be loaded in this example as we need to project the survey region.

library(dssd)
library(sf)

Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE

Projecting your Study Region

This exercise demonstrates how to deal with unprojected shapefiles. Study areas should always be projected onto a flat plane before you use them to design your survey. This is because in most parts of the world one degree latitude is not the same in distance as one degree longitude. If we didn’t project, our study region and any surveys generated in it, would be distorted possibly leading to non-uniform coverage.

Load the study region and project it onto a flat plane using an Albers Equal Area Conical projection. As we have to project the shapefile we load the shape object separately instead of directly into a region object.

#Load the unprojected shapefile
shapefile.name <- system.file("extdata", "TentsmuirUnproj.shp", package = "dssd")
sf.shape <- read_sf(shapefile.name)
# Check current coordinate reference system
st_crs(sf.shape)

Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

# Define a European Albers Equal Area projection
proj4string <- "+proj=aea +lat_1=56 +lat_2=62 +lat_0=50 +lon_0=-3 +x_0=0 
                +y_0=0 +ellps=intl +units=m"
# Project the study area on to a flat plane
projected.shape <- st_transform(sf.shape, crs = proj4string)

Create the region object for dssd using the projected shape and plot it to check what it looks like.

# Create the survey region in dssd
region.tm <- make.region(region.name = "Tentsmuir",
                         strata.name = c("Main Area", "Morton Lochs"),
                         shape = projected.shape)
# Plot the survey region
plot(region.tm)

Tentsmuir Forest: showing the main stratum and the Morton Loch stratum.

Coverage

Set up a coverage grid. A coverage grid is a grid of regularly spaced points and is used to assess whether each point in the survey region is equally likely to sampled. The coverage grid needs to be created before the coverage simulation is run.

# Set up coverage grid
cover.tm <- make.coverage(region.tm, n.grid.points = 1000)

Design

Set up a systematic point transect design. We will assume that we have sufficient resources to survey 40 point transects. As the Morton Lochs stratum is of special interest we will give it higher coverage. We will therefore explicitly allocate 25 samplers to the main stratum and 15 to the Morton Lochs stratum (note that the area of the Morton Lochs stratum is much small than the main stratum). If we wanted to allocate the same effort to both stratum we could provide the samplers argument with the single value of 40 and it would divide the effort equally between the strata. We will leave the design angle as 0 and set the truncation distance to 100 m. We will use a minus sampling approach at the edges.

Question:

What are the analysis implications of a design with unequal coverage?

# Set up a multi strata systematic point transect design
design.tm <- make.design(region = region.tm,
                         transect.type = "point",
                         design = "systematic",
                         samplers = c(25,15),
                         design.angle = 0,
                         edge.protocol = "minus",
                         truncation = 100,
                         coverage.grid = cover.tm)

Generate a Survey

You will now generate a single survey from this design and plot it inside the survey region to check what it looks like. If you want to check whether the covered areas of the samplers in the Morton Lochs stratum overlap add the argument covered.area = TRUE to the plot function.

# Create a single survey from the design
survey.tm <- generate.transects(design.tm)
# Plot the region and the survey
plot(region.tm, survey.tm, covered.area = TRUE)

Examine the survey information.

Questions:

What spacing was used in each strata to try and achieve the desired number of samplers?
Did your survey achieve exactly the number of samplers you requested?
How much does coverage differ between the two strata for this realisation?

# Display survey information
survey.tm

Save coordinates to a file

If this survey is to be conducted in the field, you will want the coordinates that you can load into a handheld GPS. The function write.transects() can write waypoints of the survey (in this case the point transect stations) to text, comma-separated value or GPX files.

write.transects(survey.tm,
                dsn = "tentsmuir-points.gpx",
                layer = "points",
                dataset.options = "GPX_USE_EXTENSIONS=yes",
                proj4string=sf::st_crs(sf.shape))

The GPX file can be transferred to a GPS, or viewed using Google Earth.

Realised survey with locations written to GPX file and imported into Google Earth.

Assessing Coverage and Design Statistics

We will now run a coverage simulation to assess how much the number of samplers and average coverage varies between surveys. We will also be able to assess how coverage varies spatially to see if edge effects are of concern.

View the design statistics.

Questions:

What is the minimum number of samplers you will achieve in each strata?
Is this sufficient to complete separate analyses in each stratum?

Plot the coverage scores.

Questions:

Does it appear that there is even coverage within each strata?
- As there is such a difference in the range of coverage scores between strata you may need to plot each strata individually.

# View the design statistics
print(design.tm)


   Strata Main Area:
   __________________
Design:  systematically spaced transects
Spacing:  NA
Number of samplers:  25
Design angle:  0
Edge protocol:  minus

   Strata Morton Lochs:
   _____________________
Design:  systematically spaced transects
Spacing:  NA
Number of samplers:  15
Design angle:  0
Edge protocol:  minus

Strata areas:  14108643, 715265
Region units:  m
Coverage Simulation repetitions:  100

    Number of samplers:
    
        Main Area Morton Lochs Total
Minimum      22.0         13.0  36.0
Mean         25.1         15.1  40.2
Median       25.0         15.0  40.0
Maximum      27.0         18.0  44.0
sd            0.9          1.1   1.3

    Covered area:
    
        Main Area Morton Lochs      Total
Minimum 683662.63    363258.99 1076099.58
Mean    768186.34    420283.45 1188469.79
Median  772479.34    417730.22 1188447.46
Maximum 816669.52    468461.70 1264365.86
sd       25071.81     25394.87   32875.01

    % of region covered:
    
        Main Area Morton Lochs Total
Minimum      4.85        50.79  7.26
Mean         5.44        58.76  8.02
Median       5.48        58.40  8.02
Maximum      5.79        65.49  8.53
sd           0.18         3.55  0.22

    Coverage Score Summary:
    
         Main Area Morton Lochs      Total
Minimum 0.00000000    0.2200000 0.00000000
Mean    0.05483087    0.5750000 0.08094378
Median  0.05000000    0.6450000 0.06000000
Maximum 0.15000000    0.7800000 0.78000000
sd      0.02997747    0.1456197 0.12170445

# Plot the coverage scores
plot(design.tm)

# coverage scores for individual strata could be plotted separately
# plot(design.tm, strata.id = 1)
# plot(design.tm, strata.id = 2)