The key function contains two parameters, the scale and the shape, and so the gradient is two-dimensional. Current implementation assumes that scaled dist is x/scale, not x/width
matrix of derivatives of the hazard rate key function w.r.t. the scale parameter and the shape parameter.
d key / d scale = (shape * exp(-(1/ (x/scale) ^ shape)) / ((x/scale) ^ shape ) * scale) d key / d shape = - ((log(x / scale) * exp(-(1/ (x/scale) ^ shape))) / (x/scale) ^ shape)
When distance = 0, the gradients are also zero. However, the equation below will result in NaN and (-)Inf due to operations such as log(0) or division by zero. We correct for this in line 33.