Distance Detection Function Fitting

Generic function for fitting detection functions for distance sampling with single and double observer configurations. Independent observer, trial and dependent observer (removal) configurations are included. This is a generic function which does little other than to validate the calling arguments and methods and then calls the appropriate method specific function to do the analysis.

Usage

ddf(
  dsmodel = call(),
  mrmodel = call(),
  data,
  method = "ds",
  meta.data = list(),
  control = list(),
  call = NULL
)

Arguments

dsmodel: distance sampling model specification
mrmodel: mark-recapture model specification
data: dataframe containing data to be analyzed
method: analysis method
meta.data: list containing settings controlling data structure
control: list containing settings controlling model fitting
call: not implemented for top level ddf function, this is set by ddf as it is passed to the other ddf generics.

Value

model object of class=(method, "ddf")

Details

The fitting code has certain expectations about data. It should be a dataframe with at least the following fields named and defined as follows:

`object`	object number
`observer`	observer number (1 or 2) for double observer; only 1 if single observer
`detected`	1 if detected by the observer and 0 if missed; always 1 for single observer
`distance`	perpendicular distance

If the data are for clustered objects, the dataframe should also contain a field named size that gives the observed number in the cluster. If the data are for a double observer survey, then there are two records for each observation and each should have the same object number. The code assumes the observations are listed in the same order for each observer such that if the data are subsetted by observer there will be the same number of records in each and each subset will be in the same object order. In addition to these predefined and pre-named fields, the dataframe can have any number and type of fields that are used as covariates in the dsmodel and mrmodel. At present, discrepancies between observations in distance, size and any user-specified covariates cannot be assimilated into the uncertainty of the estimate. The code presumes the values for those fields are the same for both records (observer=1 and observer=2) and it uses the value from observer 1. Thus it makes sense to make the values the same for both records in each pair even when both detect the object or when observer 1 doesn't detect the object the data would have to be taken from observer 2 and would not be consistent.

Five different fitting methods are currently available and these in turn define whether dsmodel and mrmodel need to be defined.

Method	Single/Double	`dsmodel`	`mrmodel`
`ds`	Single	yes	no
`io`	Double	yes	yes
`io.fi`	Double	no	yes
`trial`	Double	yes	yes
`trial.fi`	Double	no	yes
`rem`	Double	yes	yes
`rem.fi`	Double	no	yes

Methods with the suffix ".fi" use the assumption of full independence and do not use the distance sampling portion of the likelihood which is why a dsmodel is not needed. An mrmodel is only needed for double observer surveys and thus is not needed for method ds.

The dsmodel specifies the detection function g(y) for the distance sampling data and the models restrict g(0)=1. For single observer data g(y) is the detection function for the single observer and if it is a double observer survey it is the relative detection function (assuming g(0)=1) of both observers as a team (the unique observations from both observers). In double observer surveys, the detection function is p(y)=p(0)g(y) such that p(0)<1. The detection function g(y) is specified by dsmodel and p(0) estimated from the conditional detection functions (see mrmodel below). The value of dsmodel is specified using a hybrid formula/function notation. The model definition is prefixed with a ~ and the remainder is a function definition with specified arguments. At present there are two different functions, cds and mcds, for conventional distance sampling and multi-covariate distance sampling. Both functions have the same required arguments (key,formula). The first specifies the key function this can be half-normal ("hn"), hazard-rate ("hr"), gamma ("gamma") or uniform ("unif"). The argument formula specifies the formula for the log of the scale parameter of the key function (e.g., the equivalent of the standard deviation in the half-normal). The variable distance should not be included in the formula because the scale is for distance. See Marques and Buckland (2004) for more details on the representation of the scale formula. For the hazard rate and gamma functions, an additional shape.formula can be specified for the model of the shape parameter. The default will be ~1. Adjustment terms can be specified by setting adj.series which can have the values: "none", "cos" (cosine), "poly" (polynomials), and "herm" (Hermite polynomials). One must also specify a vector of orders for the adjustment terms (adj.order) and a scaling (adj.scale) which may be "width" or "scale" (for scaling by the scale parameter). Note that the uniform key can only be used with adjustments (usually cosine adjustments for a Fourier-type analysis).

The mrmodel specifies the form of the conditional detection functions (i.e.,probability it is seen by observer j given it was seen by observer 3-j) for each observer (j=1,2) in a double observer survey. The value is specified using the same mix of formula/function notation but in this case the functions are glm and gam. The arguments for the functions are formula and link. At present, only glm is allowed and it is restricted to link=logit. Thus, currently the only form for the conditional detection functions is logistic as expressed in eq 6.32 of Laake and Borchers (2004) . In contrast to dsmodel, the argument formula will typically include distance and all other covariates that affect detection probability. For example, mrmodel=~glm(formula=~distance+size+sex) constructs a conditional detection function based on the logistic form with additive factors, distance, size, and sex. As another example, mrmodel=~glm(formula=~distance*size+sex) constructs the same model with an added interaction between distance and size.

The argument meta.data is a list that enables various options about the data to be set. These options include:

point: if TRUE the data are from point counts and FALSE (default) implies line transect data
width: distance specifying half-width of the transect
left: distance specifying inner truncation value
binned: TRUE or FALSE to specify whether distances should be binned for analysis
breaks: if binned=TRUE, this is a required sequence of break points that are used for plotting/gof. They should match distbegin, distend values if bins are fixed
int.range: an integration range for detection probability; either a vector of 2 or matrix with 2 columns
mono: constrain the detection function to be weakly monotonically decreasing (only applicable when there are no covariates in the detection function)
mono.strict: when TRUE constrain the detection function to be strictly monotonically decreasing (again, only applicable when there are no covariates in the detection function)

Using meta.data=list(int.range=c(1,10)) is the same as meta.data=list(left=1,width=10). If meta.data=list(binned=TRUE) is used, the dataframe needs to contain the fields distbegin and distend for each observation which specify the left and right hand end points of the distance interval containing the observation. This is a general data structure that allows the intervals to change rather than being fixed as in the standard distance analysis tools. Typically, if the intervals are changing so is the integration range. For example, assume that distance bins are generated using fixed angular measurements from an aircraft in which the altitude is varying. Because all analyses are truncated (i.e., the last interval does not go to infinity), the transect width (and the left truncation point if there is a blindspot below the aircraft) can potentially change for each observation. The argument int.range can also be entered as a matrix with 2 columns (left and width) and a row for each observation.

The argument control is a list that enables various analysis options to be set. It is not necessary to set any of these for most analyses. They were provided so the user can optionally see intermediate fitting output and to control fitting if the algorithm doesn't converge which happens infrequently. The list values include:

showit: Integer (0-3, default 0) controls the (increasing)amount of information printed during fitting. 0 - none, >=1 - information about refitting and bound changes is printed, >=2 - information about adjustment term fitting is printed, ==3 -per-iteration parameter estimates and log-likelihood printed.
estimate: if FALSE fits model but doesn't estimate predicted probabilities
refit: if TRUE the algorithm will attempt multiple optimizations at different starting values if it doesn't converge
nrefits: number of refitting attempts
initial: a named list of starting values for the dsmodel parameters (e.g. $scale, $shape, $adjustment)
lowerbounds: a vector of lowerbounds for the dsmodel parameters in the order the ds parameters will appear in the par element of the ddf object, i.e. fit.ddf$par where fit.ddf is a fitted ddf model.
upperbounds: a vector of upperbounds for the dsmodel parameters in the order the ds parameters will appear in the par element of the ddf object, i.e. fit.ddf$par where fit.ddf is a fitted ddf model.
limit: if TRUE restrict analysis to observations with detected=1
debug: if TRUE, if fitting fails, return an object with fitting information
nofit: if TRUE don't fit a model, but use the starting values and generate an object based on those values
optimx.method: one (or a vector of) string(s) giving the optimisation method to use. If more than one is supplied, the results from one are used as the starting values for the next. See optimx
optimx.maxit: maximum number of iterations to use in the optimisation.
mono.random.start: By default when monotonicity constraints are enforced, a grid of starting values are tested. Instead random starting values can be used (uniformly distributed between the upper and lower bounds). Set TRUE for random start, FALSE (default) uses the grid method
mono.method: The optimiser method to be used when (strict) monotonicity is enforced. Can be either slsqp or solnp. Default slsqp.
mono.startvals: Controls if the mono.optimiser should find better starting values by first fitting a key function without adjustments, and then use those start values for the key function parameters when fitting the key + adjustment series detection function. Defaults to FALSE
mono.outer.iter: Number of outer iterations to be used by solnp when fitting a monotonic model and solnp is selected. Default 200.
silent: silences warnings within ds fitting method (helpful for running many times without generating many warning/error messages).
optimizer: By default this is set to 'both' for single observer analyses and 'R' for double observer analyses. For single observer analyses where optimizer = 'both', the R optimizer will be used and if present the MCDS optimizer will also be used. The result with the best likelihood value will be selected. To run only a specified optimizer set this value to either 'R' or 'MCDS'. The MCDS optimizer cannot currently be used for detection function fitting with double observer analyses. See mcds_dot_exe for more information.
winebin: Location of the wine binary used to run MCDS.exe. See mcds_dot_exe for more information.

Examples of distance sampling analyses are available at https://distancesampling.org/resources/vignettes.html.

Hints and tips on fitting (particularly optimisation issues) are on the mrds_opt manual page.

References

Laake JL, Borchers DL (2004). “Advanced distance sampling: estimating abundance of biological population.” In chapter Methods for incomplete detection at distance zero. Oxford University Press.

Marques FFC, Buckland ST (2004). “Advanced distance sampling.” In chapter Covariate models for the detection function, 31-47. Oxford University Press.

Author

Jeff Laake

Examples

# load data
data(book.tee.data)
region <- book.tee.data$book.tee.region
egdata <- book.tee.data$book.tee.dataframe
samples <- book.tee.data$book.tee.samples
obs <- book.tee.data$book.tee.obs

# fit a half-normal detection function
result <- ddf(dsmodel=~mcds(key="hn", formula=~1), data=egdata, method="ds",
              meta.data=list(width=4))

# fit an independent observer model with full independence
result.io.fi <- ddf(mrmodel=~glm(~distance), data=egdata, method="io.fi",
                    meta.data=list(width = 4))

# fit an independent observer model with point independence
result.io <- ddf(dsmodel=~cds(key = "hn"), mrmodel=~glm(~distance),
                 data=egdata, method="io", meta.data=list(width=4))
if (FALSE) { # \dontrun{

# simulated single observer point count data (see ?ptdata.single)
data(ptdata.single)
ptdata.single$distbegin <- (as.numeric(cut(ptdata.single$distance,
                            10*(0:10)))-1)*10
ptdata.single$distend <- (as.numeric(cut(ptdata.single$distance,
                          10*(0:10))))*10
model <- ddf(data=ptdata.single, dsmodel=~cds(key="hn"),
             meta.data=list(point=TRUE,binned=TRUE,breaks=10*(0:10)))

summary(model)

plot(model,main="Single observer binned point data - half normal")

model <- ddf(data=ptdata.single, dsmodel=~cds(key="hr"),
             meta.data=list(point=TRUE, binned=TRUE, breaks=10*(0:10)))

summary(model)

plot(model, main="Single observer binned point data - hazard rate")

dev.new()

# simulated double observer point count data (see ?ptdata.dual)
# setup data
data(ptdata.dual)
ptdata.dual$distbegin <- (as.numeric(cut(ptdata.dual$distance,
                          10*(0:10)))-1)*10
ptdata.dual$distend <- (as.numeric(cut(ptdata.dual$distance,
                        10*(0:10))))*10

model <- ddf(method="io", data=ptdata.dual, dsmodel=~cds(key="hn"),
             mrmodel=~glm(formula=~distance*observer),
             meta.data=list(point=TRUE, binned=TRUE, breaks=10*(0:10)))

summary(model)

plot(model, main="Dual observer binned point data", new=FALSE, pages=1)

model <- ddf(method="io", data=ptdata.dual,
             dsmodel=~cds(key="unif", adj.series="cos", adj.order=1),
             mrmodel=~glm(formula=~distance*observer),
             meta.data=list(point=TRUE, binned=TRUE, breaks=10*(0:10)))

summary(model)

par(mfrow=c(2,3))
plot(model,main="Dual observer binned point data",new=FALSE)

} # }