BART

The bayestree submodule contains classes to set up a Gaussian process regression with the BART kernel. See the bart, barteasy and acic examples.

The original version of these models is implemented in the R packages BART, dbarts, and bcf. Pros of the GP version provided here:

  • No MCMC, the result is computed quite directly.

  • Fast inference on the hyperparameters.

  • Allows a wider region of hyperparameter space.

Cons:

  • Does not scale to large datasets.

  • Worse predictive performance at fixed hyperparameters.

  • Slower at fixed hyperparameters.

  • May use more memory.

Overall, this version should typically be a better choice than the R packages if you have at most 10000 datapoints.

Index

bart, bcf

Classes

class lsqfitgp.bayestree.bart(x_train, y_train, *, weights=None, fitkw={}, kernelkw={}, marginalize_mean=True)[source]

Nonparametric Bayesian regression with a GP version of BART.

Evaluate a Gaussian process regression with a kernel which accurately approximates the infinite trees limit of BART. The hyperparameters are optimized to their marginal MAP.

Parameters:
x_train(n, p) array or dataframe

Observed covariates.

y_train(n,) array

Observed outcomes.

weights(n,) array

Weights used to rescale the error variance (as 1 / weight).

fitkwdict

Additional arguments passed to empbayes_fit, overrides the defaults.

kernelkwdict

Additional arguments passed to BART, overrides the defaults.

marginalize_meanbool

If True (default), marginalize the intercept of the model.

See also

lsqfitgp.BART

Notes

The regression model is:

\[\begin{split}y_i &= \mu + \lambda f(\mathbf x_i) + \varepsilon_i, \\ \varepsilon_i &\overset{\mathrm{i.i.d.}}{\sim} N(0, \sigma^2 / w_i), \\ \mu &\sim N( (\max(\mathbf y) + \min(\mathbf y)) / 2, (\max(\mathbf y) - \min(\mathbf y))^2 / 4 ), \\ \log \sigma^2 &\sim N( \log(\overline{w(y - \bar y)^2}), 4 ), \\ \log \lambda &\sim N( \log ((\max(\mathbf y) - \min(\mathbf y)) / 4), 4 ), \\ f &\sim \mathrm{GP}( 0, \mathrm{BART}(\alpha,\beta) ), \\ \alpha &\sim \mathrm{B}(2, 1), \\ \beta &\sim \mathrm{IG}(1, 1).\end{split}\]

To make the inference, \((f, \boldsymbol\varepsilon, \mu)\) are marginalized analytically, and the marginal posterior mode of \((\sigma, \lambda, \alpha, \beta)\) is found by numerical minimization, after transforming them to express their prior as a Gaussian copula. Their marginal posterior covariance matrix is estimated with an approximation of the hessian inverse. See empbayes_fit and use the parameter fitkw to customize this procedure.

The tree splitting grid of the BART kernel is set using quantiles of the observed covariates. This corresponds to settings usequants=True, numcut=inf in the R packages BayesTree and BART. Use the kernelkw parameter to customize the grid.

Attributes:
meangvar

The prior mean \(\mu\).

sigmafloat or gvar

The error term standard deviation \(\sigma\). If there are weights, the sdev for each unit is obtained dividing sigma by sqrt(weight).

alphagvar

The numerator of the tree spawn probability \(\alpha\) (named base in BayesTree and BART).

betagvar

The depth exponent of the tree spawn probability \(\beta\) (named power in BayesTree and BART).

meansdevgvar

The prior standard deviation \(\lambda\) of the latent regression function.

fitempbayes_fit

The hyperparameters fit object.

Methods

gp(*[, hp, x_test, weights, rng])

Create a Gaussian process with the fitted hyperparameters.

data(*[, hp, rng])

Get the data to be passed to GP.pred on a GP object returned by gp.

pred(*[, hp, error, format, x_test, ...])

Predict the outcome at given locations.

data(*, hp='map', rng=None)[source]

Get the data to be passed to GP.pred on a GP object returned by gp.

Parameters:
hpstr or dict

The hyperparameters to use. If 'map', use the marginal maximum a posteriori. If 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.

rngnumpy.random.Generator, optional

Random number generator, used if hp == 'sample'.

Returns:
datadict

A dictionary representing y_train in the format required by the GP.pred method.

gp(*, hp='map', x_test=None, weights=None, rng=None)[source]

Create a Gaussian process with the fitted hyperparameters.

Parameters:
hpstr or dict

The hyperparameters to use. If 'map', use the marginal maximum a posteriori. If 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.

x_testarray or dataframe, optional

Additional covariates for “test points”.

weightsarray, optional

Weights for the error variance on the test points.

rngnumpy.random.Generator, optional

Random number generator, used if hp == 'sample'.

Returns:
gpGP

A centered Gaussian process object. To add the mean, use the mean attribute of the bart object. The keys of the GP are ‘Xmean’, ‘Xnoise’, and ‘X’, where the “X” stands either for ‘train’ or ‘test’, and X = Xmean + Xnoise.

pred(*, hp='map', error=False, format='matrices', x_test=None, weights=None, rng=None)[source]

Predict the outcome at given locations.

Parameters:
hpstr or dict

The hyperparameters to use. If 'map', use the marginal maximum a posteriori. If 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.

errorbool

If False (default), make a prediction for the latent mean. If True, add the error term.

format{‘matrices’, ‘gvar’}

If ‘matrices’ (default), return the mean and covariance matrix separately. If ‘gvar’, return an array of gvars.

x_testarray or dataframe, optional

Covariates for the locations where the prediction is computed. If not specified, predict at the data covariates.

weightsarray, optional

Weights for the error variance on the test points.

rngnumpy.random.Generator, optional

Random number generator, used if hp == 'sample'.

Returns:
If format is ‘matrices’ (default):
mean, covarrays

The mean and covariance matrix of the Normal posterior distribution over the regression function at the specified locations.

If format is ‘gvar’:
outarray of GVar

The same distribution represented as an array of GVar objects.


class lsqfitgp.bayestree.bcf(*, y, z, x_mu, x_tau=None, pihat, include_pi='mu', weights=None, fitkw={}, kernelkw_mu={}, kernelkw_tau={}, marginalize_mean=True, gpaux=None, x_aux=None, otherhp={}, transf='standardize')[source]

Nonparametric Bayesian regression with a GP version of BCF.

BCF (Bayesian Causal Forests) is a regression method for observational causal inference studies introduced in [1] based on a pair of BART models.

This class evaluates a Gaussian process regression with a kernel which accurately approximates BCF in the infinite trees limit of each BART model. The hyperparameters are optimized to their marginal MAP.

The model is (loosely, see notes below) \(y = \mu(x) + z\tau(x)\), so \(\tau(x)\) is the expected causal effect of \(z\) on \(y\) at location \(x\).

Parameters:
y(n,) array, series or dataframe

Outcome.

z(n,) array, series or dataframe

Binary treatment status: 0 control group, 1 treatment group.

x_mu(n, p) array, series or dataframe

Covariates for the \(\mu\) model.

x_tau(n, q) array, series or dataframe, optional

Covariates for the \(\tau\) model. If not specified, use x_mu.

pihat(n,) array, series or dataframe

Estimated propensity score, i.e., P(Z=1|X).

include_pi{‘mu’, ‘tau’, ‘both’}, optional

Whether to include the propensity score in the \(\mu\) model, the \(\tau\) model, or both. Default is 'mu'.

weights(n,) array, series or dataframe

Weights used to rescale the error variance (as 1 / weight).

fitkwdict

Additional arguments passed to empbayes_fit, overrides the defaults.

kernelkw_mu, kernelkw_taudict

Additional arguments passed to BART for each model, overrides the defaults.

marginalize_meanbool

If True (default), marginalize the intercept of the model.

gpauxcallable, optional

If specified, this function is called with a pair (hp, gp), where hp is a dictionary of hyperparameters, and gp is a GP object under construction, and is expected to return a modified gp with a new process named 'aux' defined with defproc or similar. The process is added to the regression model. The input to the process is a structured array with fields:

‘train’bool

Indicates whether the data is training set (the one passed on initialization) or test set (the one passed to pred or gp).

‘i’int

Index of the flattened array.

‘z’int

Treatment status.

‘mu’, ‘tau’structured

The values in x_mu and x_tau, converted to indices according to the BART grids. Where pihat has been added, there are two subfields: 'x' which contains the covariates, and 'pihat', the latter expressed in indices as well.

‘pihat’float

The pihat argument. Contrary to the subfield included under 'mu' and/or 'tau', this field contains the original values.

‘aux’structured

The values in x_aux, if specified.

x_aux(n, k) array, series or dataframe, optional

Additional covariates for the 'aux' process.

otherhpdictionary of gvar

A dictionary with the prior of arbitrary additional hyperpameters, intended to be used by gpaux or transf.

transf(list of) str or pair of callable

Data transformation. Either a string indicating a pre-defined transformation, or a pair (from_data, to_data), two functions with signatures from_data(hp, y) -> eta and to_data(hp, eta) -> y, where eta is the value to which the model is fit, and hp is the dictionary of hyperparameters. The functions must be ufuncs and one the inverse of the other w.r.t. the second parameter. from_data must be derivable with jax w.r.t. y.

If a list of such specifications is provided, the transformations are applied in order, with the first one being the outermost, i.e., the one applied first to the data.

If a transformation uses additional hyperparameters, either predefined automatically or passed by the user through otherhp, they are inferred with the rest of the hyperparameters.

The pre-defined transformations are:

‘standardize’ (default)

eta = (y - mean(train_y)) / sdev(train_y)

‘yeojohnson’

The Yeo-Johnson transformation [2] to reduce skewness. The \(\lambda\) parameter is bounded in \((0, 2)\) for implementation convenience, this restriction may be lifted in future versions.

See also

lsqfitgp.BART

Notes

The regression model is:

\[\begin{split}\eta_i = g(y_i; \ldots) &= m + {} \\ &\phantom{{}={}} + \lambda_\mu \mu(\mathbf x^\mu_i, \hat\pi_i?) + {} \\ &\phantom{{}={}} + \lambda_\tau \tau(\mathbf x^\tau_i, \hat\pi_i?) (z_i - z_0) + {} \\ &\phantom{{}={}} + \mathrm{aux}(i, z_i, \mathbf x^\mu_i, \mathbf x^\tau_i, \hat\pi_i, \mathbf x^\text{aux}_i) + {} \\ &\phantom{{}={}} + \varepsilon_i, \\ \varepsilon_i &\sim N(0, \sigma^2 / w_i), \\ m &\sim N(0, 1), \\ \log \sigma^2 &\sim N(\log\bar w, 4), \\ \lambda_\mu &\sim \mathrm{HalfCauchy}(2), \\ \lambda_\tau &\sim \mathrm{HalfNormal}(1.48), \\ \mu &\sim \mathrm{GP}(0, \mathrm{BART}(\alpha_\mu, \beta_\mu) ), \\ \tau &\sim \mathrm{GP}(0, \mathrm{BART}(\alpha_\tau, \beta_\tau) ), \\ \mathrm{aux} & \sim \mathrm{GP}(0, \text{<user defined>}), \\ \alpha_\mu, \alpha_\tau &\sim \mathrm{B}(2, 1), \\ \beta_\mu, \beta_\tau &\sim \mathrm{IG}(1, 1), \\ z_0 &\sim U(0, 1),\end{split}\]

To make the inference, \((\mu, \tau, \boldsymbol\varepsilon, m, \mathrm{aux})\) are marginalized analytically, and the marginal posterior mode of \((\sigma, \lambda_*, \alpha_*, \beta_*, z_0, \ldots)\) is found by numerical minimization, after transforming them to express their prior as a Gaussian copula. Their marginal posterior covariance matrix is estimated with an approximation of the hessian inverse. See empbayes_fit and use the parameter fitkw to customize this procedure.

The tree splitting grid of the BART kernel is set using quantiles of the observed covariates. This corresponds to settings usequants=True, numcut=inf in the R packages BayesTree and BART. Use the parameters kernelkw_mu and kernelkw_tau to customize the grids.

The difference between the regression model evaluated at \(Z=1\) vs. \(Z=0\) can be interpreted as the causal effect \(Z \rightarrow Y\) if the unconfoundedness assumption is made:

\[\{Y(Z=0), Y(Z=1)\} \perp\!\!\!\perp Z \mid X.\]

In practical terms, this holds when:

  1. \(X\) are pre-treatment variables, i.e., they represent quantities causally upstream of \(Z\).

  2. \(X\) are sufficient to adjust for all common causes of \(Z\) and \(Y\), such that the only remaining difference is the causal effect and not just a correlation.

Here \(X\) consists in x_tau, x_mu and x_aux. However these arrays may also be used to pass “technical” values used to set up the model, that do not satisfy the uncounfoundedness assumption, if you know what you are doing.

References

[1]

P. Richard Hahn, Jared S. Murray, Carlos M. Carvalho “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion),” Bayesian Analysis 15(3), 965-1056, September 2020, https://doi.org/10.1214/19-BA1195

[2]

Yeo, In-Kwon; Johnson, Richard A. (2000). “A New Family of Power Transformations to Improve Normality or Symmetry”. Biometrika. 87 (4): 954–959. https://doi.org/10.1093/biomet/87.4.954

Attributes:
mfloat or gvar

The prior mean \(m\).

sigmagvar

The error term standard deviation \(\sigma\). If there are weights, the sdev for each unit is obtained dividing sigma by sqrt(weight).

alpha_mu, alpha_taugvar

The numerator of the tree spawn probability \(\alpha_*\) (named base in R bcf).

beta_mu, beta_taugvar

The depth exponent of the tree spawn probability \(\beta_*\) (named power in R bcf).

lambda_mu, lambda_taugvar

The prior standard deviation \(\lambda_*\).

z_0gvar

The treatment coding parameter.

fitempbayes_fit

The hyperparameters fit object.

Methods

gp(*[, hp, z, x_mu, x_tau, pihat, x_aux, ...])

Create a Gaussian process with the fitted hyperparameters.

data(*[, hp, rng])

Get the data to be passed to GP.pred on a GP object returned by gp.

pred(*[, hp, error, z, x_mu, x_tau, pihat, ...])

Predict the transformed outcome at given locations.

from_data(y, *[, hp, rng])

Transforms outcomes \(y\) to the regression variable \(\eta\).

to_data(eta, *[, hp, rng])

Convert the regression variable \(\eta\) to outcomes \(y\).

data(*, hp='map', rng=None)[source]

Get the data to be passed to GP.pred on a GP object returned by gp.

Parameters:
hpstr or dict

The hyperparameters to use. If 'map' (default), use the marginal maximum a posteriori. If 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.

rngnumpy.random.Generator, optional

Random number generator, used if hp == 'sample'.

Returns:
datadict

A dictionary representing eta in the format required by the GP.pred method.

from_data(y, *, hp='map', rng=None)[source]

Transforms outcomes \(y\) to the regression variable \(\eta\).

Parameters:
y(n,) array

Outcomes.

hpstr or dict

The hyperparameters to use. If 'map' (default), use the marginal maximum a posteriori. If 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.

rngnumpy.random.Generator, optional

Random number generator, used if hp == 'sample'.

Returns:
eta(n,) array

Transformed outcomes.

gp(*, hp='map', z=None, x_mu=None, x_tau=None, pihat=None, x_aux=None, weights=None, rng=None)[source]

Create a Gaussian process with the fitted hyperparameters.

Parameters:
hpstr or dict

The hyperparameters to use. If 'map' (default), use the marginal maximum a posteriori. If 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.

z(m,) array, series or dataframe, optional

Treatment status at test points. If specified, also x_mu, pihat, x_tau and x_aux (the latter two if and only also specified at initialization) must be specified.

x_mu(m, p) array, series or dataframe, optional

Control model covariates at test points.

x_tau(m, q) array, series or dataframe, optional

Moderating model covariates at test points.

pihat(m,) array, series or dataframe, optional

Estimated propensity score at test points.

x_aux(m, k) array, series or dataframe, optional

Additional covariates for the 'aux' process.

weights(m,) array, series or dataframe, optional

Weights for the error variance on the test points.

rngnumpy.random.Generator, optional

Random number generator, used if hp == 'sample'.

Returns:
gpGP

A centered Gaussian process object. To add the mean, use the m attribute of the bcf object. The keys of the GP are 'Xmean', 'Xnoise', and 'X', where the “X” stands either for ‘train’ or ‘test’, and X = Xmean + Xnoise.

This Gaussian process is defined on the transformed data eta.

pred(*, hp='map', error=False, z=None, x_mu=None, x_tau=None, pihat=None, x_aux=None, weights=None, transformed=True, samples=None, gvars=False, rng=None)[source]

Predict the transformed outcome at given locations.

Parameters:
hpstr or dict

The hyperparameters to use. If 'map' (default), use the marginal maximum a posteriori. If 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.

errorbool, default False

If False, make a prediction for the latent mean. If True, add the error term.

z(m,) array, series or dataframe, optional

Treatment status at test points. If specified, also x_mu, pihat, x_tau and x_aux (the latter two if and only also specified at initialization) must be specified.

x_mu(m, p) array, series or dataframe, optional

\(\mu\) model covariates at test points.

x_tau(m, q) array, series or dataframe, optional

\(\tau\) model covariates at test points.

pihat(m,) array, series or dataframe, optional

Estimated propensity score at test points.

x_aux(m, k) array, series or dataframe, optional

Additional covariates for the 'aux' process at test points.

weights(m,) array, series or dataframe, optional

Weights for the error variance on the test points.

transformedbool, default True

If True, return the prediction on the transformed outcome \(\eta\), else the observable outcome \(y\).

samplesint, optional

If specified, indicates the number of samples to take from the posterior. If not, return the mean and covariance matrix of the posterior.

gvarsbool, default False

If True, return the mean and covariance matrix of the posterior as an array of GVar variables.

rngnumpy.random.Generator, optional

Random number generator, used if hp == 'sample' or samples is not None.

Returns:
If samples is None and gvars is False (default):
mean, cov(m,) and (m, m) arrays

The mean and covariance matrix of the Normal posterior distribution over the regression function or \(\eta\) at the specified locations.

If samples is None and gvars is True:
out(m,) array of gvars

The same distribution represented as an array of GVar objects.

If samples is an integer:
sample(samples, m) array

Posterior samples over either the regression function, \(\eta\), or \(y\).

to_data(eta, *, hp='map', rng=None)[source]

Convert the regression variable \(\eta\) to outcomes \(y\).

Parameters:
eta(n,) array

Transformed outcomes.

hpstr or dict

The hyperparameters to use. If 'map' (default), use the marginal maximum a posteriori. If 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.

rngnumpy.random.Generator, optional

Random number generator, used if hp == 'sample'.

Returns:
y(n,) array

Outcomes.