Multivariate smoothing, model selection

Recap

  • How GAMs work
  • How to include detection info
  • Simple spatial-only models
  • How to check those models

Univariate models are fun, but...

Ecology is not univariate

  • Many variables affect distribution
  • Want to model the right ones
  • Select between possible models
    • Smooth term selection
    • Response distribution
  • Large literature on model selection

Models with multiple smooths

Adding smooths

  • Already know that + is our friend
  • Add everything then remove smooth terms?
dsm_all <- dsm(count~s(x, y) +
                        s(Depth) +
                        s(DistToCAS) +
                        s(SST) +
                        s(EKE) +
                        s(NPP),
                  ddf.obj=df_hr,
                  segment.data=segs, observation.data=obs,
                  family=tw())

Now we have a huge model, what do we do?

Smooth term selection

  • Classically, two main approaches
  • Both have problems
  • Usually use \( p \)-values

Stepwise selection - path dependence

All possible subsets - computationally expensive (fishing?)

p-values

  • \( p \)-values can calculate
  • Test for zero effect of a smooth
  • They are approximate for GAMs (but useful)
  • Reported in summary

p-values example

summary(dsm_all)

Family: Tweedie(p=1.25) 
Link function: log 

Formula:
count ~ s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + 
    s(NPP) + offset(off.set)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -20.6369     0.2752     -75   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
               edf Ref.df     F p-value    
s(x,y)       5.236  7.169 1.233  0.2928    
s(Depth)     3.568  4.439 6.640 1.6e-05 ***
s(DistToCAS) 1.000  1.000 1.503  0.2205    
s(SST)       5.927  6.987 2.067  0.0407 *  
s(EKE)       1.763  2.225 2.577  0.0696 .  
s(NPP)       2.393  3.068 0.855  0.4680    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.224   Deviance explained = 44.8%
-REML = 368.97  Scale est. = 3.9334    n = 949

Shrinkage or extra penalties

  • Use penalty to remove terms during fitting
  • Two methods
  • Basis s(..., bs="ts") - thin plate splines with shrinkage
    • nullspace should be shrunk less than the wiggly part
  • dsm(..., select=TRUE) - extra penalty
    • no assumption of how much to shrink the nullspace

Shrinkage example

dsm_ts_all <- dsm(count~s(x, y, bs="ts") +
                        s(Depth, bs="ts") +
                        s(DistToCAS, bs="ts") +
                        s(SST, bs="ts") +
                        s(EKE, bs="ts") +
                        s(NPP, bs="ts"),
                  ddf.obj=df_hr,
                  segment.data=segs, observation.data=obs,
                  family=tw())

Shrinkage example

summary(dsm_ts_all)

Family: Tweedie(p=1.277) 
Link function: log 

Formula:
count ~ s(x, y, bs = "ts") + s(Depth, bs = "ts") + s(DistToCAS, 
    bs = "ts") + s(SST, bs = "ts") + s(EKE, bs = "ts") + s(NPP, 
    bs = "ts") + offset(off.set)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -20.260      0.234  -86.59   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                   edf Ref.df     F  p-value    
s(x,y)       1.888e+00     29 0.705 3.56e-06 ***
s(Depth)     3.679e+00      9 4.811 2.15e-10 ***
s(DistToCAS) 9.339e-05      9 0.000   0.6797    
s(SST)       3.827e-01      9 0.063   0.2160    
s(EKE)       8.196e-01      9 0.499   0.0178 *  
s(NPP)       3.570e-04      9 0.000   0.8359    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.11   Deviance explained =   35%
-REML = 385.04  Scale est. = 4.5486    n = 949

Extra penalty example

dsm_sel <- dsm(count~s(x, y) +
                     s(Depth) +
                     s(DistToCAS) +
                     s(SST) +
                     s(EKE) +
                     s(NPP),
                  ddf.obj=df_hr,
                  segment.data=segs, observation.data=obs,
                  family=tw(), select=TRUE)

Extra penalty example

summary(dsm_sel)

Family: Tweedie(p=1.266) 
Link function: log 

Formula:
count ~ s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + 
    s(NPP) + offset(off.set)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -20.4285     0.2454  -83.23   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                   edf Ref.df     F  p-value    
s(x,y)       7.694e+00     29 1.272 2.67e-07 ***
s(Depth)     3.645e+00      9 4.005 3.24e-10 ***
s(DistToCAS) 1.944e-05      9 0.000   0.7038    
s(SST)       2.010e-04      9 0.000   0.8216    
s(EKE)       1.417e+00      9 0.630   0.0127 *  
s(NPP)       2.318e-04      9 0.000   0.5152    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.16   Deviance explained = 40.8%
-REML =    378  Scale est. = 4.2252    n = 949

EDF comparison

             allterms select     ts
s(x,y)          5.236 7.6936 1.8875
s(Depth)        3.568 3.6449 3.6794
s(DistToCAS)    1.000 0.0000 0.0001
s(SST)          5.927 0.0002 0.3827
s(EKE)          1.763 1.4174 0.8196
s(NPP)          2.393 0.0002 0.0004

Double penalty can be slow

  • Lots of smoothing parameters to estimate
length(dsm_ts_all$sp)
[1] 6
length(dsm_sel$sp)
[1] 12

Let's employ a mixture of these techniques

How do we select smooth terms?

  1. Look at EDF
    • Terms with EDF<1 may not be useful
    • These can usually be removed
  2. Remove non-significant terms by \( p \)-value
    • Decide on a significance level and use that as a rule

(In some sense leaving “shrunk” terms in is more “consistent”, but can be computationally annoying)

Example of selection

Selecting smooth terms


Family: Tweedie(p=1.277) 
Link function: log 

Formula:
count ~ s(x, y, bs = "ts") + s(Depth, bs = "ts") + s(DistToCAS, 
    bs = "ts") + s(SST, bs = "ts") + s(EKE, bs = "ts") + s(NPP, 
    bs = "ts") + offset(off.set)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -20.260      0.234  -86.59   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                   edf Ref.df     F  p-value    
s(x,y)       1.888e+00     29 0.705 3.56e-06 ***
s(Depth)     3.679e+00      9 4.811 2.15e-10 ***
s(DistToCAS) 9.339e-05      9 0.000   0.6797    
s(SST)       3.827e-01      9 0.063   0.2160    
s(EKE)       8.196e-01      9 0.499   0.0178 *  
s(NPP)       3.570e-04      9 0.000   0.8359    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.11   Deviance explained =   35%
-REML = 385.04  Scale est. = 4.5486    n = 949

Shrinkage in action

plot of chunk smooth-shrinkage

Same model with no shrinkage