Performs a bootstrap for simple distance sampling models using the same data
structures as dht
. Note that only geographical stratification
as supported in dht
is allowed.
Usage
bootdht(
model,
flatfile,
resample_strata = FALSE,
resample_obs = FALSE,
resample_transects = TRUE,
nboot = 100,
summary_fun = bootdht_Nhat_summarize,
convert_units = 1,
select_adjustments = FALSE,
sample_fraction = 1,
multipliers = NULL,
progress_bar = "base",
cores = 1,
convert.units = NULL
)
Arguments
- model
a model fitted by
ds
or a list of models- flatfile
Data provided in the flatfile format. See
flatfile
for details. Please note, it is a current limitation of bootdht that all Sample.Label identifiers must be unique across all strata, i.e.transect ids must not be re-used from one strata to another. An easy way to achieve this is to paste together the stratum names and transect ids.- resample_strata
should resampling happen at the stratum (
Region.Label
) level? (DefaultFALSE
)- resample_obs
should resampling happen at the observation (
object
) level? (DefaultFALSE
)- resample_transects
should resampling happen at the transect (
Sample.Label
) level? (DefaultTRUE
)- nboot
number of bootstrap replicates
- summary_fun
function that is used to obtain summary statistics from the bootstrap, see Summary Functions below. By default
bootdht_Nhat_summarize
is used, which just extracts abundance estimates.- convert_units
conversion between units for abundance estimation, see "Units", below. (Defaults to 1, implying all of the units are "correct" already.) This takes precedence over any unit conversion stored in
model
.- select_adjustments
select the number of adjustments in each bootstrap, when
FALSE
the exact detection function specified inmodel
is fitted to each replicate. Setting this option toTRUE
can significantly increase the runtime for the bootstrap. Note that for this to workmodel
must have been fitted withadjustment!=NULL
.- sample_fraction
what proportion of the transects was covered (e.g., 0.5 for one-sided line transects).
- multipliers
list
of multipliers. See "Multipliers" below.- progress_bar
which progress bar should be used? Default "base" uses
txtProgressBar
, "none" suppresses output, "progress" uses theprogress
package, if installed.- cores
number of CPU cores to use to compute the estimates. See "Parallelization" below.
- convert.units
deprecated, see same argument with underscore, above.
Summary Functions
The function summary_fun
allows the user to specify what summary
statistics should be recorded from each bootstrap. The function should take
two arguments, ests
and fit
. The former is the output from
dht2
, giving tables of estimates. The latter is the fitted detection
function object. The function is called once fitting and estimation has been
performed and should return a data.frame
. Those data.frame
s
are then concatenated using rbind
. One can make these functions
return any information within those objects, for example abundance or
density estimates or the AIC for each model. See Examples below.
Multipliers
It is often the case that we cannot measure distances to individuals or groups directly, but instead need to estimate distances to something they produce (e.g., for whales, their blows; for elephants their dung) – this is referred to as indirect sampling. We may need to use estimates of production rate and decay rate for these estimates (in the case of dung or nests) or just production rates (in the case of songbird calls or whale blows). We refer to these conversions between "number of cues" and "number of animals" as "multipliers".
The multipliers
argument is a list
, with 3 possible elements (creation
and decay
). Each element of which is either:
data.frame
and must have at least a column namedrate
, which abundance estimates will be divided by (the term "multiplier" is a misnomer, but kept for compatibility with Distance for Windows). Additional columns can be added to give the standard error and degrees of freedom for the rate if known asSE
anddf
, respectively. You can use a multirowdata.frame
to have different rates for different geographical areas (for example). In this case the rows need to have a column (or columns) tomerge
with the data (for exampleRegion.Label
).a
function
which will return a single estimate of the relevant multiplier. Seemake_activity_fn
for a helper function for use with theactivity
package.
Model selection
Model selection can be performed on a per-replicate basis within the bootstrap. This has three variations:
when
select_adjustments
isTRUE
then adjustment terms are selected by AIC within each bootstrap replicate (provided thatmodel
had theorder
andadjustment
options set to non-NULL
.if
model
is a list of fitted detection functions, each of these is fitted to each replicate and results generated from the one with the lowest AIC.when
select_adjustments
isTRUE
andmodel
is a list of fitted detection functions, each model fitted to each replicate and number of adjustments is selected via AIC. This last option can be extremely time consuming.
Parallelization
If cores
>1 then the parallel
/doParallel
/foreach
/doRNG
packages
will be used to run the computation over multiple cores of the computer. To
use this component you need to install those packages using:
install.packages(c("foreach", "doParallel", "doRNG"))
It is advised that
you do not set cores
to be greater than one less than the number of cores
on your machine. The doRNG
package is required to make analyses
reproducible (set.seed
can be used to ensure the same answers).
It is also hard to debug any issues in summary_fun
so it is best to run a
small number of bootstraps first in parallel to check that things work. On
Windows systems summary_fun
does not have access to the global environment
when running in parallel, so all computations must be made using only its
ests
and fit
arguments (i.e., you can not use R objects from elsewhere
in that function, even if they are available to you from the console).
Another consequence of the global environment being unavailable inside
parallel bootstraps is that any starting values in the model object passed
in to bootdht
must be hard coded (otherwise you get back 0 successful
bootstraps). For a worked example showing this, see the camera trap distance
sampling online example at
https://examples.distancesampling.org/Distance-cameratraps/camera-distill.html.
See also
summary.dht_bootstrap
for how to summarize the results,
bootdht_Nhat_summarize
and bootdht_Dhat_summarize
for an examples of
summary functions.
Examples
if (FALSE) { # \dontrun{
# fit a model to the minke data
data(minke)
mod1 <- ds(minke)
# summary function to save the abundance estimate
Nhat_summarize <- function(ests, fit) {
return(data.frame(Nhat=ests$individuals$N$Estimate))
}
# perform 5 bootstraps
bootout <- bootdht(mod1, flatfile=minke, summary_fun=Nhat_summarize, nboot=5)
# obtain basic summary information
summary(bootout)
} # }