Aims

By the end of this exercise you should feel confident doing the following:

Preamble

First need to load the requisite R libraries

library(rgdal)
## Loading required package: sp
## rgdal: version: 1.0-4, (SVN revision 548)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rgdal/proj
##  Linking to sp version: 1.1-1
library(ggplot2)
library(Distance)
## Loading required package: mrds
## This is mrds 2.1.14
## Built: R 3.2.0; ; 2015-07-30 10:07:19 UTC; unix
library(knitr)

Load the data

The observations are located in a “geodatabase” we created in Arc. We want to pull out the “Sightings” table (called a “layer”) and make it into a data.frame (so that it’s easier for R to manipulate).

distdata <- readOGR("Analysis.gdb", layer="Sightings")
## OGR data source with driver: OpenFileGDB 
## Source: "Analysis.gdb", layer: "Sightings"
## with 137 features
## It has 6 fields
distdata <- as.data.frame(distdata)

We can check it has the correct format using head:

head(distdata)
##    Survey GroupSize SeaState  Distance        SightingTime SightingID
## 1 en04395         2      3.0  246.0173 2004/06/28 10:22:21          1
## 2 en04395         2      2.5 1632.3934 2004/06/28 13:18:14          2
## 3 en04395         1      3.0 2368.9941 2004/06/28 14:13:34          3
## 4 en04395         1      3.5  244.6977 2004/06/28 15:06:01          4
## 5 en04395         1      4.0 2081.3468 2004/06/29 10:48:31          5
## 6 en04395         1      2.4 1149.2632 2004/06/29 14:35:34          6
##   coords.x1 coords.x2
## 1   -65.636    39.576
## 2   -65.648    39.746
## 3   -65.692    39.843
## 4   -65.717    39.967
## 5   -65.820    40.279
## 6   -65.938    40.612

The Distance package expects certain column names to be used. Renaming is much easier to do in R than ArcGIS, so we do it here.

distdata$distance <- distdata$Distance
distdata$object <- distdata$SightingID
distdata$size <- distdata$GroupSize

Let’s see what we did:

head(distdata)
##    Survey GroupSize SeaState  Distance        SightingTime SightingID
## 1 en04395         2      3.0  246.0173 2004/06/28 10:22:21          1
## 2 en04395         2      2.5 1632.3934 2004/06/28 13:18:14          2
## 3 en04395         1      3.0 2368.9941 2004/06/28 14:13:34          3
## 4 en04395         1      3.5  244.6977 2004/06/28 15:06:01          4
## 5 en04395         1      4.0 2081.3468 2004/06/29 10:48:31          5
## 6 en04395         1      2.4 1149.2632 2004/06/29 14:35:34          6
##   coords.x1 coords.x2  distance object size
## 1   -65.636    39.576  246.0173      1    2
## 2   -65.648    39.746 1632.3934      2    2
## 3   -65.692    39.843 2368.9941      3    1
## 4   -65.717    39.967  244.6977      4    1
## 5   -65.820    40.279 2081.3468      5    1
## 6   -65.938    40.612 1149.2632      6    1

We now have four “extra” columns.

Exploratory analysis

Before setting off fitting detection functions, let’s look at the relationship of various variables in the data.

Don’t worry too much about understanding the code that generates these plots at the moment.

Distances

Obviously, the most important covariate in a distance sampling analysis is distance itself. We can plot a histogram of the distances to check that (1) we imported the data correctly and (2) it conforms to the usual shape for line transect data.

hist(distdata$distance, xlab="Distance (m)", main="Distance to sperm whale observations")

Size and distance

We might expect that there will be a relationship between the distance at whcih we see animals and the size of the groups observed (larger groups are easier to see at larger distances), so let’s plot that to help us visualise the relationship.

# plot of size versus distance and sea state vs distance, linear model and LOESS smoother overlay

# put the data into a simple format, only selecting what we need
distplot <- distdata[,c("distance","size","SeaState")]
names(distplot) <- c("Distance", "Size", "Beaufort")
library(reshape2)
# "melt" the data to have only three columns (try head(distplot))
distplot <- melt(distplot, id.vars="Distance", value.name="covariate")

# make the plot
p <- ggplot(distplot, aes(x=covariate, y=Distance)) +
      geom_point() +
      facet_wrap(~variable, scale="free") +
      geom_smooth(method="loess", se=FALSE) +
      geom_smooth(method="lm", se=FALSE) +
      labs(x="Covariate value", y="Distance (m)")
print(p)

Distance and sea state

We might also expect that increaing sea state would result in a drop in observations. We can plot histograms of distance for each sea state level (making the sea state take only values 0,1,2,4,5 for this).

distdata$SeaStateCut <- cut(distdata$SeaState,seq(0,5,by=1), include.lowest=TRUE)
p <- ggplot(distdata) +
      geom_histogram(aes(distance)) +
      facet_wrap(~SeaStateCut) +
      labs(x="Distance (m)", y="Count")
print(p)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Survey effect

Given we are including data from two different surveys we can also investigate the relationship between survey and distances observed.

p <- ggplot(distdata) +
      geom_histogram(aes(distance)) +
      facet_wrap(~Survey) +
      labs(x="Distance (m)", y="Count")
print(p)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.