R
environmentThe Windows-based GUI software for conducting distance sampling analysis (Distance for Windows, heretofore DistWin) has been the industry-standard for two decades. Our development work continues to incorporate state-of-the-art analytical methods for software users. Most of that software development takes place in the R
language.
The readdst
package was created for statisticans and biologists who have existing DistWin
projects and wish to port them into the R
environment. It was originally intended as an in-house testing tool to enable comparison of analysis results of newly-developed analysis software, with results produced by DistWin
. The purpose of the package is to convert data and analyses stored in DistWin
projects into data objects and function calls for analysis using R
.
This vignette is intended to give a few examples of the capabilities of readdst
. Readers of this vignette should have some familiarity with the organisation of DistWin
projects along with functions in the mrds
library for fitting detection functions.
Note too, readdst
can only work with DistWin
projects created by DistWin
versions 6 or 7.
DistWin
projectDistWin
projects carry not only the data collected in the field, but also survey descriptions, data filters, model definitions and analysis results. All of these facets are stored in Access database files created by DistWin
. Reading the Access files into R
is at the heart of readdst
.
DistWin
.There is a project that comes with the distribution of DistWin
located in My Documents\My Distance Projects\Sample Projects
We will use to demonstrate the use of DistWin
. In addition, we have taken another project that ships with DistWin
, Stratify example
, renamed it Vignette-stratify
and added further analyses to that project for use in this vignette. This third example ships with the readdst
R package and is found in the library folder associated with the readdst
package (see below).
Purists among the readers will recognise that spaces in directory names can cause problems for
R
. If you will be convertingDistWin
projects located in other locations on a Windows machine, or are working under a different operating system where spaces in pathnames are forbidden, then you may need to make some alterations to the example code provided in this vignette.
Synthetic line transect data, units of measure in non-SI units (effort and perpendicular distance in nautical miles). Animals were detected in groups, so cluster size enters into the calculation of animal density. The project contains a multiplier derived from an external estimate of \(\widehat{g(0)}\), but for purposes of this vignette, the use of the multiplier has been removed from the DistWin
analysis.
The study area also contains two types of habitat and the design incorporated habitat information. Consequently, the brief analysis focuses upon whether it is most parsimonious to fit habitat-specific detection functions, or whether a single detection function pooled across habitat types.
This project demonstrates the use of the MRDS (multiple covariate) analysis engine in conjunction with a point transect sampling survey (using conventional SI measurement units). The analyses present in the project represent 16 model results presented in Marques et al. (2007). All analyses employ a data filter specifying an 82.5m truncation distance.
The following code chunk performs some preliminary setup before conducting any conversions. First, if running on a Windows machine, the code checks that the 32-bit version of R is being used. R uses the RODBC
package to read Access files; this package is only supported by 32-bit versions of R. On a Mac OS X or Linux/Unix machine, readdst
makes use of mdb-tools
and the Hmisc
R
package.
Second, a path to the Sample Projects
directory is created for subsequent use when accessing projects that ship with DistWin
.
library(readdst)
if (.Platform$OS.type=="windows" && R.Version()$platform!="i386-w64-mingw32") print("32-bit version of R needed to run RODBC")
home.dir <- path.expand("~")
sample.proj.dir <- paste0(home.dir, "/My Distance Projects/Sample Projects/")
The Amakihi project that ships with DistWin
and resides in My Documents\My Distance Projects\Sample Projects
is converted with these two commands. There are no arguments to the function convert_project()
other than the absolute path to the DistWin
project (without the .dst
file extension).
amakihi.proj.file <- paste0(sample.proj.dir, "Amakihi")
amakihi.proj <- convert_project(amakihi.proj.file)
The Vignette-stratify
project that ships with readdst
and resides wherever R
libraries are stored, is covered with these three commands.
stratify.proj.directory <- system.file("Stratify", package="readdst")
stratify.proj.name <- paste0(stratify.proj.directory, "/Vignette-stratify")
stratify.proj <- convert_project(stratify.proj.name)
DistWin
projectsThe object created by convert_project()
is a named list. There are as many elements in the list as analyses (run or not run) defined in the DistWin
project. The names of the list elements correspond to the names of the analyses:
length(stratify.proj)
[1] 3
names(stratify.proj)
[1] "Half-normal cosine no stratification exact"
[2] "Half-normal cosine strat-specific detfn"
[3] "Half-normal cosine pooled detfn"
Each analysis is itself a named list, with list elements describing such things as data filter used and the equivalent call to ddf()
from the mrds
library to fit the detection function specified in the analysis. Several of these components will be discussed later.
Stored with each analysis are the data that were used in the analysis. Recall that DistWin
allowed seperate specification of model definitions and data filters, which could be coupled to form an analysis. The data (in two forms) as well as the measurement units for effort, perpendicular (radial) distances and area are all stored in the env
element of each analysis list element1.
ls(stratify.proj$'Half-normal cosine no stratification exact'$env)
[1] "data" "obs.table" "region.table" "sample.table"
[5] "units"
Data are stored both in the flatfile
format with all levels of the data hierarchy in a single object,
G0 | G0.SE | Cluster.strat | distance | size | object | Study.Area | Region.Label | Sample.Label |
---|---|---|---|---|---|---|---|---|
0.8367 | 0.1738 | 1 | 0.10 | 1 | 1 | Stratify example | Ideal Habitat | 1 |
0.8367 | 0.1738 | 1 | 0.22 | 1 | 2 | Stratify example | Ideal Habitat | 1 |
0.8367 | 0.1738 | 2 | 0.16 | 2 | 3 | Stratify example | Ideal Habitat | 1 |
or in the multiple object form with a separate data frame for the region.table
, sample.table
and obs.table
.
The convert_project()
function makes the data for a distance sampling analysis available and transforms the syntax of detection function fitting into calls to ddf()
. This makes it possible to:
DistWin
in R
, and/orDistWin
.Recall each analysis from a DistWin
project is represented as a list element of an object created by convert_project()
. The function run_analysis()
takes as an argument an analysis and conducts that analysis in R
using ddf()
to fit a detection function and produce estimates of detection probability and \(\hat{N}\) in the covered region.
library(mrds) # to access GOF function and dht()
This is mrds 2.1.15
Built: R 3.2.3; ; 2016-03-03 21:45:18 UTC; unix
stratify.reanalyse <- run_analysis(stratify.proj$'Half-normal cosine no stratification exact')
plot(stratify.reanalyse, showpoints=FALSE, pl.den=0)
gof.test <- ddf.gof(stratify.reanalyse, qq=FALSE)
text(1.2,0.9, cex=0.6,
paste("K-S GOF D=",round(gof.test$dsgof$ks$Dn, 4), "\nP=", round(gof.test$dsgof$ks$p,3)))
summary(stratify.reanalyse)
Summary for ds object
Number of observations : 90
Distance range : 0 - 1.94
AIC : 63.87982
Detection function:
Half-normal key function
Detection function parameters
Scale coefficient(s):
estimate se
(Intercept) -0.3514199 0.08078866
Estimate SE CV
Average p 0.4519568 0.03470614 0.07679084
N in covered region 199.1340969 21.80150344 0.10948152
Producing estimates of abundance from the fitted ddf()
model requires a call to dht()
.
stratify.abund <- dht(stratify.reanalyse,
region = stratify.proj$`Half-normal cosine no stratification exact`$env$region.table,
sample = stratify.proj$`Half-normal cosine no stratification exact`$env$sample.table,
obs = stratify.proj$`Half-normal cosine no stratification exact`$env$obs.table)
Region | Area | CoveredArea | Effort | n | ER | se.ER | cv.ER | mean.size | se.mean |
---|---|---|---|---|---|---|---|---|---|
Ideal Habitat | 85000 | 1470.52 | 379 | 84 | 0.2216359 | 0.0668354 | 0.3015550 | 2.153846 | 0.2905123 |
Marginal Habitat | 600000 | 3918.80 | 1010 | 123 | 0.1217822 | 0.0523166 | 0.4295915 | 2.411765 | 0.3477394 |
Total | 685000 | 5389.32 | 1389 | 207 | 0.1490281 | 0.0412544 | 0.2768229 | 2.300000 | 0.2330121 |
Label | Estimate | se | cv | lcl | ucl |
---|---|---|---|---|---|
Ideal Habitat | 10743.12 | 3343.031 | 0.3111788 | 5212.731 | 22140.91 |
Marginal Habitat | 41668.36 | 18184.107 | 0.4364009 | 16071.287 | 108034.41 |
Total | 52411.48 | 18631.076 | 0.3554770 | 24130.762 | 113836.55 |
The call to dht()
shows the use of the hierarchial data type (region, sample and observation tables) available within each analysis.
It would be possible to perform new analyses of data present in DistWin
projects, such as fitting a hazard-rate key function to the stratify
data set that had only been fitted with half-normal detection functions.
stratify.hazard <- ddf(dsmodel=~cds(key="hr", formula=~1, adj.series="cos", adj.order=2),
meta.data=list(width=1.94000005722046,left=0),
control=list(mono=TRUE, mono.strict=TRUE), method="ds",
data=stratify.proj$'Half-normal cosine no stratification exact'$env$data)
hazard.aic <- stratify.hazard$criterion
halfnorm.aic <- stratify.reanalyse$criterion
This hazard rate detection function can also be fitted by an equivalent call to the ds()
function found in the Distance
package available on CRAN. ds()
serves as a wrapper for ddf()
. We show both the ddf()
(created by run_analysis()
in readdst
) and ds()
so readers can select the approach they desire.
stratify.hazard <- ds(key="hr", formula=~1, adj.series="cos", adj.order=2,
truncation=1.94000005722046,
data=stratify.proj$'Half-normal cosine no stratification exact'$env$data)
The hazard rate model (with a cosine adjustment) had an AIC of 60.5 while the unadjusted half-normal model had an AIC of 63.9 suggesting we were on firm footing when fitting the half-normal model to the stratify
dataset.
DistWin
and R analysesDistWin
projects contain not only the analyses but also the results of those analyses. Analysis results are stored in DistWin
projects according to a coding scheme to identify the statistics or parameter estimates. Values of statistic and estimates can be extracted from a DistWin
project using get_stats()
; however this function is not intended for use by users.
Instead, get_stats()
is usually called from within the function test_stats()
. The test_stats()
function takes as its first argument a single analysis from a converted DistWin
project. The tasks performed by test_stats()
are:
R
,DistWin
-generated results from the DistWin
project andDistWin
and R
results.Statistic | Distance_value | mrds_value | Difference | Pass |
---|---|---|---|---|
n | 90.0000000 | 90.0000000 | 0.0000000 | ✓ |
parameters | 1.0000000 | 1.0000000 | 0.0000000 | ✓ |
AIC | 63.8801003 | 63.8798192 | 0.0000044 | ✓ |
Chi^2 p | 0.1707298 | 0.9906514 | 4.8024520 | |
P_a | 0.4519590 | 0.4519568 | 0.0000050 | ✓ |
CV(P_a) | 0.0767919 | 0.0767908 | 0.0000136 | ✓ |
log-likelihood | -30.9400501 | -30.9399096 | 0.0000000 | ✓ |
K-S p | 0.6884868 | 0.6884929 | 0.0000089 | ✓ |
C-vM p | 1.0000000 | 0.9000000 | 0.1000000 | |
density | 0.0505962 | 0.0573726 | 0.1339330 | |
CV(density) | 0.2693232 | 0.2818972 | 0.0466873 | |
individuals | 34658.0000000 | 39300.2621443 | 0.1339449 | |
CV(individuals) | 0.2693232 | 0.2818972 | 0.0466873 |
The column labeled Difference shows the ratio of the mrds
value over the DistWin
value. When called from the R
console, the final Pass column will possess ticks if the ratio is less than 0.05 to quickly assess the agreement between DistWin
and mrds
result components.
Note test_stats()
can take only a single analysis as an argument, rather than a complete set of analyses that may be embedded within a DistWin
project. To test multiple analyses, lapply()
could be wrapped around the call to test_stats()
.
There are a large number of reasons for disagreement between DistWin
and R
results. Reasons may include differences in:
readdst
readdst
is a work in progress. As noted, it was developed for in-house testing of developing code against known solutions produced by DistWin
. There are many nuances of analyses that can be performed by DistWin
that have not been included in readdst
conversion feature. The following list is incomplete, but gives a general idea of what is not included in readdst
capabilities.
dsm
, mads
or DSsim
enginesFor this reason we recommend using the latest version of readdst
from GitHub using the following commands:
install.packages("devtools")
devtools::install_github("DistanceDevelopment/readdst")
Note that environments in R have rather different behaviour to other R objects – if you modify them within a function, you modify the object everywhere (so called “reference semantics”). This can have unexpected consequences when you pass them to your own functions. See http://adv-r.had.co.nz/Environments.html for more information.↩