Further optimisation improvement details

Posted 01 August 2021 by Dave Miller

The newest version of our R package mrds (2.2.5), used for fitting detection functions by Distance, has some improvements to its optimisation system. Here are some more details of recent changes and what you might need to watch out for.

We left our last blog entry on the question of “Should I re-run previous analyses?”. I thought I’d elaborate on that a little more here, so folks have an idea of which models these changes might be expected to have an effect on.

These changes were made possible thanks to funding from the University of St Andrews’ Knowledge Exchange and Impact Fund.

Changes across the board

The last version of mrds had some major changes to the optimization system that both mrds and Distance uses to estimate model parameters. These changes affect all models fitted by Distance and all models that include a ds component in mrds (i.e., there are no changes to the optimization of mr-only models that just address the double observer (g(0)) part of the model).

Changes big to small

For models when monotonicity isn’t enforced

Fitting models without adjustment terms (key function only models, with or without covariates) or models with monotonicity turned off should see the biggest potential shifts in parameter estimates due to changes in how the overall optimization system works.

These changes includes refining the stopping rules for when the model has converged, which starting values were used during refits and when bounds on parameters could be safely expanded. For key-plus-adjustment models, the optimizer fits a key function only first, then fixing those key parameters, tries to find adjustment parameters that improve the model fit. Finally, both sets of parameters are simultaneously optimized to get the best values. We’ve refined the way this is done to make it easier for the optimizer to see where the optimum is by making the optimization over the subset of the parameters, rather than by fixing parameters and optimizing over the full set.

There can often be issues when using models with covariates when the observed values are on very different scales (e.g., a model containing minutes after sunrise that a detection happens, which might vary into the hundreds and observer ID, which will be coded 0,1 as a factor). In this case it’s hard for the optimizer to know which direction is best to go in, due to the difference in scaling. Finding a relative rescaling between the covariates solves this problem and is something we have included in mrds for a while. However, there was a bug in this code, which had scaled things inversely, so the problem was amplified rather than fixed by rescaling! This is now fixed and we’ve made it so that the scaling will kick-in at smaller scale differences, which should help in more marginal situations.

These changes will mainly affect models with covariates where the covariates are measured on different numerical scales (mostly where they differ by a factor of 3 times or more). Key-plus-adjustment models will see some changes but only if monotonicity is not enforced.

Models using Hermite polynomial adjustments

One of the options for adjustments are Hermite polynomials. It turns out that there are multiple definitions of these polynomials. Distance for Windows uses the “probabalists” definition which are more numerically stable but previously mrds used a very naïve definition, which was prone to numerical issues. That has been replaced which should massively improve performance for models using the Hermite adjustment. This change should affect any models using Hermite polynomials, we strongly recommend refitting any such models for improved results.

Integration is hard

While finding the best detection function parameters, we need to integrate the detection function between its truncation points (for line transects this is $\mu$, for points we integrate the detection function multiplied by $2\pi$ times distance, giving $\nu$). This integration can give strange results if the parameters are particularly wacky (which sometimes happens on the way to “good” values). When these “bad” values occurred previously we had set these to be very small numbers (leading to less good likelihood values) but this can lead the optimizer down a bad path. Instead we now leave in bad values and the optimizer knows to steer around them. If the integral is negative (which doesn’t make sense) we still set a “bad” likelihood value (of zero) and then check once we get to final values to ensure that the detection function is greater than zero everywhere (as it should be). These changes most likely affect models that were “hard” to fit in previous versions, for example those which produced errors about “Detection function integral <=0”.

Constrained optimizer minor improvement

Constrained optimisations should also see some benefits too: from the Hermite definition and the integration improvements, as well as from a better system for getting starting values. Though these improvements are less drastic for non-Hermite polynomial models, there may be some differences in fits, especially for models that perhaps failed to fit in previous versions of Distance or mrds.

That’s all folks

I hope that’s been a useful description of the changes and has illustrated the places where improvements might be found.

As I said last time: if you do re-run any models and notice that the likelihood (or equivalently AIC) is worse than with a previous version of Distance or mrds on the same data please let us know!