13. Nonlinear models

Using GP we can define a Gaussian process. Using GP.addlintransf() we can represent finite linear transformations of the process, and we can take derivatives with GP.addx(). This means that we can only do linear operations on the process before putting the data in.

A common non-linear operation is putting a boundary on the possible data values. Gaussian distributions don’t play nicely with boundaries—they are defined on \((-\infty,\infty)\)—so it is necessary to map the Gaussian process space to an interval with a nonlinear function.

lsqfitgp is designed to work with the general purpose fitting module lsqfit (after which it takes the name) for this kind of situations. If you want to know more, it has a good documentation.

Let’s see how to fit some data that is constrained in (-1, 1). To map the Gaussian process space to the data space, we’ll use a hyperbolic tangent. It has the properties \(\tanh(\pm\infty) = \pm 1\), \(\tanh'(0) = 1\), \(\tanh(-x) = -\tanh(x)\).

We first define a Gaussian process as usual and take a sample from it as fake data.

import lsqfitgp as lgp
import numpy as np
import gvar

gp = lgp.GP(lgp.ExpQuad())

x = np.arange(15)
gp.addx(x, 'data')

data_gp = gvar.sample(gp.prior('data'))

Then we map it to (-1, 1):

data = np.tanh(data_gp)

Note that we first sampled the latent Gaussian process obtaining data_gp, and then passed it through our nonlinear function to obtain the fake data. A possibly serious mistake would be to do the converse, first passing the prior through the nonlinear function, with np.tanh(gp.prior('data')), and then sampling from it. To see this intuitively, consider that in the latter case the fake data would not satisfy the requirement of being bounded within (-1, 1).

Now we’ll add errors to the data. If data does not have errors, there’s not really a problem to start with: you can map the data to \((-\infty, \infty)\) with np.arctanh(), do the Gaussian process fit, take some samples, map the samples back with np.tanh().

You may do that even with errors, either at first order using gvar.arctanh(), or by transforming the errors manually yourself. However, in that way you would be doing a fit with Gaussian errors on the transformed data. What we will do is a fit with Gaussian errors on the data itself. Another case we won’t explore in which just transforming the data before the fit is not sufficient is when the mapping between the Gaussian process and the data depends on a fit parameter.

err = 0.1
data += err * np.random.randn(len(data))
data = gvar.gvar(data, np.full_like(data, err))

Then as usual we add a finer grid of points where we will compute the prediction:

xplot = np.linspace(-10, 25, 200)
gp.addx(xplot, 'plot')

Now we define the prior and model function following the requirements of lsqfit.nonlinear_fit and run the fit:

import lsqfit

prior = {
    'gproc': gp.prior('data')
}

def fcn(params):
    return gvar.tanh(params['gproc'])

fit = lsqfit.nonlinear_fit(data=data, fcn=fcn, prior=prior)
print(fit.format(maxline=True))

Output:

Least Square Fit:
  chi2/dof [dof] = 1.1 [15]    Q = 0.37    logGBF = -7.9817

Parameters:
        gproc 0   -0.97 (22)     [  0.0 (1.0) ]
              1   -0.50 (12)     [  0.0 (1.0) ]
              2    0.37 (11)     [  0.0 (1.0) ]
              3    0.31 (11)     [  0.0 (1.0) ]
              4   -1.05 (24)     [  0.0 (1.0) ]  *
              5   -1.97 (56)     [  0.0 (1.0) ]  *
              6   -1.59 (47)     [  0.0 (1.0) ]  *
              7   -0.79 (17)     [  0.0 (1.0) ]
              8    0.28 (11)     [  0.0 (1.0) ]
              9    0.37 (11)     [  0.0 (1.0) ]
             10    0.20 (10)     [  0.0 (1.0) ]
             11   -0.56 (13)     [  0.0 (1.0) ]
             12   -1.47 (37)     [  0.0 (1.0) ]  *
             13   -1.26 (32)     [  0.0 (1.0) ]  *
             14   -1.09 (26)     [  0.0 (1.0) ]  *

Fit:
      key        y[key]      f(p)[key]
--------------------------------------
        0    -0.77 (10)    -0.749 (95)
        1    -0.46 (10)    -0.459 (97)
        2     0.36 (10)     0.352 (98)
        3     0.30 (10)     0.296 (98)
        4    -0.80 (10)    -0.783 (93)
        5    -1.20 (10)    -0.962 (42)  **
        6    -0.90 (10)    -0.921 (71)
        7    -0.69 (10)    -0.657 (96)
        8     0.29 (10)     0.268 (98)
        9     0.34 (10)     0.350 (97)
       10     0.20 (10)     0.198 (98)
       11    -0.51 (10)    -0.512 (97)
       12    -1.00 (10)    -0.899 (71)
       13    -0.83 (10)    -0.851 (87)
       14    -0.83 (10)    -0.798 (95)

Settings:
  svdcut/n = 1e-12/0    tol = (1e-08,1e-10,1e-10*)    (itns/time = 29/0.2)
  fitter = scipy_least_squares    method = trf

Let’s plot everything. First we compute the posterior on the xplot points:

gpplot = gp.predfromfit({'data': fit.p['gproc']}, 'plot')

This time we use GP.predfromfit() instead of the usual GP.predfromdata(). This method takes into account that the distribution represented by fit.p['gproc'] is not the uncertainty of some datapoints, but is already the distribution of some points of our process. We want to “extend” fit.p['gproc'], not condition on it.

Then we inject the extended posterior into a copy of the fit result dictionary:

fitp = dict(fit.p)     # dict() makes a copy of fit.p
fitp['gproc'] = gpplot

(This copy-and-replace step is a bit redundant here, it is for when there are also other parameters beside the Gaussian process, and we do not want to modify the fit result dictionary for good bookkeeping practice.) Then we plot both the data space and the Gaussian process space.

from matplotlib import pyplot as plt

fig, axs = plt.subplots(2, 1, sharex=True, num='lsqfitgp example', figsize=[6.4, 7])

for sample in gvar.raniter(fitp, 2):
    axs[0].plot(xplot, fcn(sample), color='red', alpha=0.5)
    axs[1].plot(xplot, sample['gproc'], color='red', alpha=0.5)

ax = axs[0]
ax.set_title('data space')
for boundary in 1, -1:
    ax.axhline(boundary, color='gray', linestyle=':')
ax.errorbar(x, gvar.mean(data), yerr=gvar.sdev(data), fmt='.k', capsize=4)

ax = axs[1]
ax.set_title('Gaussian process space')
ax.plot(x, data_gp, '.k', label='true')
ax.plot(x, np.arctanh(gvar.mean(data)), 'xk', label='with errors, mapped back')
ax.legend()

fig.tight_layout()
fig.savefig('nonlinear1.png')
_images/nonlinear1.png

The thing to notice here is that, in the Gaussian process space, the samples can get quite far from the points. This is because the nonlinear mapping stretches the space near the boundaries. However, in the data space they look fine: this is because we did the fit in the data space.