*“perhaps the most important part of applied statistical modelling”*

Simon Wood

- Checking \( \neq \) validation!
- As with detection function, checking is important
- Want to know the model conforms to assumptions
- What assumptions should we check?

- Convergence
- Basis size
- Residuals

- Fitting the GAM involves an optimization
- By default this is REstricted Maximum Likelihood (REML) score
- Sometimes this can go wrong
- R will warn you!

```
gam.check(dsm_tw_xy_depth)
```

```
Method: REML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-3.468176e-05,1.090937e-05]
(score 374.7249 & scale 4.172176).
Hessian positive definite, eigenvalue range [1.179219,301.267].
Model rank = 39 / 39
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(x,y) 29.00 11.11 0.65 <2e-16 ***
s(Depth) 9.00 3.84 0.81 0.33
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
Error in while (mean(ldxx/(ldxx + ldss)) > 0.4) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In sqrt(w) : NaNs produced
Error in while (mean(ldxx/(ldxx + ldss)) > 0.4) { :
missing value where TRUE/FALSE needed
```

This is **rare**

“most statistical computational problems are due not to the algorithm being used but rather the model itself”

Andrew Gelman

- Set
`k`

per term - e.g.
`s(x, k=10)`

or`s(x, y, k=100)`

- Penalty removes “extra” wigglyness
*up to a point!*

- (But computation is slower with bigger
`k`

)

```
gam.check(dsm_x_tw)
```

```
Method: REML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-3.08755e-06,4.928064e-07]
(score 409.936 & scale 6.041307).
Hessian positive definite, eigenvalue range [0.7645492,302.127].
Model rank = 10 / 10
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(x) 9.00 4.96 0.76 0.44
```

```
dsm_x_tw_k <- dsm(count~s(x, k=20), ddf.obj=df,
segment.data=segs, observation.data=obs,
family=tw())
gam.check(dsm_x_tw_k)
```

```
Method: REML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-2.301238e-08,3.930667e-09]
(score 409.9245 & scale 6.033913).
Hessian positive definite, eigenvalue range [0.7678456,302.0336].
Model rank = 20 / 20
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(x) 19.00 5.25 0.76 0.39
```

- Generally, double
`k`

and see what happens - Didn't increase the EDF much here
- Other things can cause low “
`p-value`

” and “`k-index`

” - Increasing
`k`

can cause problems (nullspace)

- (Usually) Don't need to worry about things being too wiggly
`k`

gives the maximum complexity- Penalty deals with the rest

- Generally residuals = observed value - fitted value
- BUT hard to see patterns in these “raw” residuals
- Need to standardise \( \Rightarrow \)
**deviance residuals** - Residual sum of squares \( \Rightarrow \) linear model
- deviance \( \Rightarrow \) GAM

- Expect these residuals \( \sim N(0,1) \)