1. Kernels reference¶
This is a list of all the specific kernels implemented in lsqfitgp
.
Kernels are reported with a simplified signature where the positional arguments
are r or r2 if the kernel is isotropic, delta if it is stationary, or
x, y for generic kernels, and with only the keyword arguments specific to
the kernel. All kernels also understand the general keyword arguments of
Kernel
(or their specific superclass), while there are no positional
arguments when instantiating the kernel and the call signature of instances is
always x, y.
Example: the kernel GammaExp
is listed as GammaExp(r, gamma=1)
.
This means you could use it this way:
import lsqfitgp as lgp
import numpy as np
kernel = lgp.GammaExp(loc=0.3, scale=2, gamma=1.4)
x = np.random.randn(100)
covmat = kernel(x[:, None], x[None, :])
On multidimensional input, isotropic kernels will compute the euclidean distance. In general non-isotropic kernels will act separately on each dimension, i.e., \(k(x_1,y_1,x_2,y_2) = k(x_1,y_1) k(x_2,y_2)\), apart from kernels defined in terms of the dot product.
For all isotropic and stationary (i.e., depending only on \(x - y\))
kernels \(k(x, x) = 1\), and the typical lengthscale is approximately 1 for
default values of the keyword parameters, apart from some specific cases like
Constant
.
Warning
Taking second or higher order derivatives might give problems with isotropic kernels with signature parameter r, while those with r2 won’t have any issue.
1.1. Index¶
1.1.1. Isotropic kernels¶
1.1.2. Stationary kernels¶
1.1.3. Other kernels¶
1.2. Documentation¶
- lsqfitgp.BagOfWords(x, y)¶
Bag of words kernel.
\[\begin{split}k(x, y) &= \sum_{w \in \text{words}} c_w(x) c_w(y), \\ c_w(x) &= \text{number of times word $w$ appears in $x$}\end{split}\]The words are defined as non-empty substrings delimited by spaces or one of the following punctuation characters: ! « » ” “ ” ‘ ’ / ( ) ‘ ? ¡ ¿ „ ‚ < > , ; . : - – —.
- lsqfitgp.BrownianBridge(x, y)¶
Brownian bridge kernel.
\[k(x, y) = \min(x, y) - xy, \quad x, y \in [0, 1]\]It is a Wiener process conditioned on being zero at x = 1.
- lsqfitgp.Categorical(x, y, cov=None)¶
Categorical kernel.
\[k(x, y) = \texttt{cov}[x, y]\]A kernel over integers from 0 to N-1. The parameter cov is the covariance matrix of the values.
- lsqfitgp.Celerite(delta, gamma=1, B=0)¶
Celerite kernel.
\[k(x, y) = \exp(-\gamma|x - y|) \big( \cos(x - y) + B \sin(|x - y|) \big)\]This is the covariance function of an AR(2) process with complex roots. The parameters must satisfy the condition \(|B| \le \gamma\). For \(B = \gamma\) it is equivalent to the
Harmonic
kernel with \(\eta Q = 1/B, Q > 1\), and it is derivable.Reference: Daniel Foreman-Mackey, Eric Agol, Sivaram Ambikasaran, and Ruth Angus: Fast and Scalable Gaussian Process Modeling With Applications To Astronomical Time Series.
- lsqfitgp.Constant(r2)¶
Constant kernel.
\[k(r) = 1\]This means that all points are completely correlated, thus it is equivalent to fitting with an horizontal line. This can be seen also by observing that 1 = 1 x 1.
- lsqfitgp.Cos(delta)¶
Cosine kernel.
\[k(x, y) = \cos(x - y) = \cos x \cos y + \sin x \sin y\]Samples from this kernel are harmonic functions. It can be multiplied with another kernel to introduce anticorrelations.
- lsqfitgp.ExpQuad(r2)¶
Exponential quadratic kernel.
\[k(r) = \exp \left( -\frac 12 r^2 \right)\]It is smooth and has a strict typical lengthscale, i.e., oscillations are strongly suppressed under a certain wavelength, and correlations are strongly suppressed over a certain distance.
- lsqfitgp.Expon(delta)¶
Exponential kernel.
\[k(x, y) = \exp(-|x - y|)\]In 1D it is equivalent to the Matérn 1/2 kernel, however in more dimensions it acts separately while the Matérn kernel is isotropic.
- lsqfitgp.Fourier(delta, n=2)¶
Fourier kernel.
\[\begin{split}k(x, y) &= \frac1{\zeta(2n)} \sum_{k=1}^\infty \frac {\cos(2\pi kx)}{k^n} \frac {\cos(2\pi ky)}{k^n} + \frac1{\zeta(2n)} \sum_{k=1}^\infty \frac {\sin(2\pi kx)}{k^n} \frac {\sin(2\pi ky)}{k^n} = \\ &= \frac1{\zeta(2n)} \sum_{k=1}^\infty \frac {\cos(2\pi k(x-y))} {k^{2n}} = \\ &= (-1)^{n+1} \frac1{\zeta(2n)} \frac {(2\pi)^{2n}} {2(2n)!} B_{2n}(x-y \bmod 1),\end{split}\]where \(B_s(x)\) is a Bernoulli polynomial. It is equivalent to fitting with a Fourier series of period 1 with independent priors on the coefficients with mean zero and variance \(1/(\zeta(2n)k^{2n})\). The process is \(n - 1\) times derivable.
Note that the \(k = 0\) term is not included in the summation, so the mean of the process over one period is forced to be zero.
- lsqfitgp.FracBrownian(x, y, H=0.5)¶
Fractional brownian motion kernel.
\[k(x, y) = \frac 12 (x^{2H} + y^{2H} - |x-y|^{2H}), \quad H \in (0, 1), \quad x, y \ge 0\]For H = 1/2 (default) it is the Wiener kernel. For H in (0, 1/2) the increments are anticorrelated (strong oscillation), for H in (1/2, 1) the increments are correlated (tends to keep a slope).
- lsqfitgp.GammaExp(r, gamma=1)¶
Gamma exponential kernel.
\[k(r) = \exp(-r^\texttt{gamma}), \quad \texttt{gamma} \in [0, 2]\]For gamma = 2 it is the Gaussian kernel, for gamma = 1 it is the Matérn 1/2 kernel, for gamma = 0 it is the constant kernel. The process is differentiable only for gamma = 2, however as gamma gets closer to 2 the variance of the non-derivable component goes to zero.
- lsqfitgp.Gibbs(x, y, scalefun=<function <lambda>>)¶
Gibbs kernel.
\[k(x, y) = \sqrt{ \frac {2 s(x) s(y)} {s(x)^2 + s(y)^2} } \exp \left( -\frac {(x - y)^2} {s(x)^2 + s(y)^2} \right), \quad s = \texttt{scalefun}.\]Kernel which in some sense is like a Gaussian kernel where the scale changes at every point. The scale is computed by the parameter scalefun which must be a callable taking the x array and returning a scale for each point. By default scalefun returns 1 so it is a Gaussian kernel.
Consider that the default parameter scale acts before scalefun, so for example if scalefun(x) = x then scale has no effect. You should include all rescalings in scalefun to avoid surprises.
- lsqfitgp.Harmonic(delta, Q=1)¶
Damped stochastically driven harmonic oscillator kernel.
\[\begin{split}k(x, y) = \exp\left( -\frac {\tau} {Q} \right) \begin{cases} \cosh(\eta\tau) + \sinh(\eta\tau) / (\eta Q) & 0 < Q < 1 \\ 1 + \tau & Q = 1 \\ \cos(\eta\tau) + \sin(\eta\tau) / (\eta Q) & Q > 1, \end{cases}\end{split}\]where \(\tau = |x - y|\) and \(\eta = \sqrt{|1 - 1/Q^2|}\).
The process is the solution to the stochastic differential equation
\[f''(x) + 2/Q f'(x) + f(x) = w(x),\]where w is white noise.
The parameter Q is the quality factor, i.e., the ratio between the energy stored in the oscillator and the energy lost in each cycle due to damping. The angular frequency is 1, i.e., the period is 2π. The process is derivable one time.
In 1D, for Q = 1 (default) and scale = sqrt(1/3), it is the Matérn 3/2 kernel.
Reference: Daniel Foreman-Mackey, Eric Agol, Sivaram Ambikasaran, and Ruth Angus: Fast and Scalable Gaussian Process Modeling With Applications To Astronomical Time Series.
- lsqfitgp.Linear(x, y)¶
Dot product kernel.
\[k(x, y) = x \cdot y = \sum_i x_i y_i\]In 1D it is equivalent to fitting with a line passing by the origin.
- lsqfitgp.Matern(r, nu=None)¶
Matérn kernel of real order.
\[k(r) = \frac {2^{1-\nu}} {\Gamma(\nu)} x^\nu K_\nu(x), \quad \nu = \texttt{nu} > 0, \quad x = \sqrt{2\nu} r\]The nearest integer below nu indicates how many times the Gaussian process is derivable: so for nu < 1 it is continuous but not derivable, for 1 <= nu < 2 it is derivable but has not a decond derivative, etc. The half-integer case (nu = 1/2, 3/2, …) uses internally a simpler formula so you should prefer it. Also, taking derivatives of the process is supported only for half-integer nu.
- lsqfitgp.Matern12(r)¶
Matérn kernel of order 1/2 (continuous, not derivable).
\[k(r) = \exp(-r)\]
- lsqfitgp.Matern32(r)¶
Matérn kernel of order 3/2 (derivable one time).
\[k(r) = (1 + x) \exp(-x), \quad x = \sqrt3 r\]
- lsqfitgp.Matern52(r)¶
Matérn kernel of order 5/2 (derivable two times).
\[k(r) = (1 + x + x^2/3) \exp(-x), \quad x = \sqrt5 r\]
- lsqfitgp.NNKernel(x, y, sigma0=1)¶
Neural network kernel.
\[k(x, y) = \frac 2 \pi \arcsin \left( \frac { 2 (q + x \cdot y) }{ (1 + 2 (q + x \cdot x)) (1 + 2 (q + y \cdot y)) } \right), \quad q = \texttt{sigma0}^2\]Kernel which is equivalent to a neural network with one infinite hidden layer with Gaussian priors on the weights and error function response. In other words, you can think of the process as a superposition of sigmoids where sigma0 sets the dispersion of the centers of the sigmoids.
- lsqfitgp.OrnsteinUhlenbeck(x, y)¶
Ornstein-Uhlenbeck process kernel.
\[k(x, y) = \exp(-|x - y|) - \exp(-(x + y)), \quad x, y \ge 0\]It is a random walk plus a negative feedback term that keeps the asymptotical variance constant. It is asymptotically stationary; often the name “Ornstein-Uhlenbeck” is given to the stationary part only, which here is provided as
Expon
.
- lsqfitgp.PPKernel(r, q=0, D=1)¶
Piecewise polynomial kernel.
\[\begin{split}k(r) = \text{polynomial}_{q,D}(r) \begin{cases} 1 - r & r \in [0, 1) \\ 0 & \text{otherwise} \end{cases}\end{split}\]An isotropic kernel with finite support. The covariance is nonzero only when the distance between the points is less than 1. Parameter q in (0, 1, 2, 3) sets the differentiability, while parameter D sets the maximum dimensionality the kernel can be used with. Default is q = 0 (non derivable), D = 1 (can be used only in 1D).
- lsqfitgp.Periodic(delta, outerscale=1)¶
Periodic Gaussian kernel.
\[k(x, y) = \exp \left( -2 \left( \frac {\sin((x - y) / 2)} {\texttt{outerscale}} \right)^2 \right)\]A Gaussian kernel over a transformed periodic space. It represents a periodic process. The usual scale parameter sets the period, with the default scale = 1 giving a period of 2π, while the outerscale parameter sets the length scale of the correlations.
- lsqfitgp.RatQuad(r2, alpha=2)¶
Rational quadratic kernel.
\[k(r) = \left( 1 + \frac {r^2} {2 \alpha} \right)^{-\alpha}, \quad \alpha = \texttt{alpha}\]It is equivalent to a lengthscale mixture of Gaussian kernels where the scale distribution is a gamma with shape parameter alpha. For alpha -> infinity, it becomes the Gaussian kernel. It is smooth.
- lsqfitgp.Rescaling(x, y, stdfun=None)¶
Outer product kernel.
\[k(x, y) = \texttt{stdfun}(x) \texttt{stdfun}(y)\]A totally correlated kernel with arbitrary variance. Parameter stdfun must be a function that takes x or y and computes the standard deviation at the point. It can yield negative values; points with the same sign of fun will be totally correlated, points with different sign will be totally anticorrelated. Use this kernel to modulate the variance of other kernels. By default stdfun returns a constant, so it is equivalent to
Constant
.
- lsqfitgp.Taylor(x, y)¶
Exponential-like power series kernel.
\[k(x, y) = \sum_{k=0}^\infty \frac {x^k}{k!} \frac {y^k}{k!} = I_0(2 \sqrt{xy})\]It is equivalent to fitting with a Taylor series expansion in zero with independent priors on the coefficients k with mean zero and standard deviation 1/k!.
- lsqfitgp.White(r2)¶
White noise kernel.
\[\begin{split}k(x, y) = \begin{cases} 1 & x = y \\ 0 & x \neq y \end{cases}\end{split}\]
- lsqfitgp.Wiener(x, y)¶
Wiener kernel.
\[k(x, y) = \min(x, y), \quad x, y > 0\]A kernel representing a non-differentiable random walk starting at 0.
- lsqfitgp.WienerIntegral(x, y)¶
Kernel for a process whose derivative is a Wiener process.
\[\begin{split}k(x, y) = \frac 12 \begin{cases} x^2 (y - x/3) & x < y, \\ y^2 (x - y/3) & y \le x \end{cases}\end{split}\]