BART¶
The bayestree
submodule contains classes to set up a Gaussian process
regression with the BART
kernel. See the bart,
barteasy and bcf examples.
The original version of these models is implemented in the R packages BART, dbarts, and bcf. Pros of the GP version provided here:
No MCMC, the result is computed quite directly.
Fast inference on the hyperparameters.
Allows a wider region of hyperparameter space.
Cons:
Does not scale to large datasets.
Worse predictive performance at fixed hyperparameters.
Slower at fixed hyperparameters.
May use more memory.
Overall, this version should typically be a better choice than the R packages if you have at most 10000 datapoints.
Index¶
Classes¶
- class lsqfitgp.bayestree.bart(x_train, y_train, *, weights=None, fitkw={}, kernelkw={}, marginalize_mean=True)[source]¶
Nonparametric Bayesian regression with a GP version of BART.
Evaluate a Gaussian process regression with a kernel which accurately approximates the infinite trees limit of BART. The hyperparameters are optimized to their marginal MAP.
- Parameters:
- x_train(n, p) array or dataframe
Observed covariates.
- y_train(n,) array
Observed outcomes.
- weights(n,) array
Weights used to rescale the error variance (as 1 / weight).
- fitkwdict
Additional arguments passed to
empbayes_fit
, overrides the defaults.- kernelkwdict
Additional arguments passed to
BART
, overrides the defaults.- marginalize_meanbool
If True (default), marginalize the intercept of the model.
- Attributes:
- meangvar
The prior mean \(\mu\).
- sigmafloat or gvar
The error term standard deviation \(\sigma\). If there are weights, the sdev for each unit is obtained dividing
sigma
by sqrt(weight).- alphagvar
The numerator of the tree spawn probability \(\alpha\) (named
base
in BayesTree and BART).- betagvar
The depth exponent of the tree spawn probability \(\beta\) (named
power
in BayesTree and BART).- meansdevgvar
The prior standard deviation \(\lambda\) of the latent regression function.
- fitempbayes_fit
The hyperparameters fit object.
Methods
gp
(*[, hp, x_test, weights, rng])Create a Gaussian process with the fitted hyperparameters.
data
(*[, hp, rng])Get the data to be passed to
GP.pred
on a GP object returned bygp
.pred
(*[, hp, error, format, x_test, ...])Predict the outcome at given locations.
See also
Notes
The regression model is:
\[\begin{split}y_i &= \mu + \lambda f(\mathbf x_i) + \varepsilon_i, \\ \varepsilon_i &\overset{\mathrm{i.i.d.}}{\sim} N(0, \sigma^2 / w_i), \\ \mu &\sim N( (\max(\mathbf y) + \min(\mathbf y)) / 2, (\max(\mathbf y) - \min(\mathbf y))^2 / 4 ), \\ \log \sigma^2 &\sim N( \log(\overline{w(y - \bar y)^2}), 4 ), \\ \log \lambda &\sim N( \log ((\max(\mathbf y) - \min(\mathbf y)) / 4), 4 ), \\ f &\sim \mathrm{GP}( 0, \mathrm{BART}(\alpha,\beta) ), \\ \alpha &\sim \mathrm{B}(2, 1), \\ \beta &\sim \mathrm{IG}(1, 1).\end{split}\]To make the inference, \((f, \boldsymbol\varepsilon, \mu)\) are marginalized analytically, and the marginal posterior mode of \((\sigma, \lambda, \alpha, \beta)\) is found by numerical minimization, after transforming them to express their prior as a Gaussian copula. Their marginal posterior covariance matrix is estimated with an approximation of the hessian inverse. See
empbayes_fit
and use the parameterfitkw
to customize this procedure.The tree splitting grid of the BART kernel is set using quantiles of the observed covariates. This corresponds to settings
usequants=True
,numcut=inf
in the R packages BayesTree and BART. Use thekernelkw
parameter to customize the grid.- data(*, hp='map', rng=None)[source]¶
Get the data to be passed to
GP.pred
on a GP object returned bygp
.- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
, use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- datadict
A dictionary representing
y_train
in the format required by theGP.pred
method.
- gp(*, hp='map', x_test=None, weights=None, rng=None)[source]¶
Create a Gaussian process with the fitted hyperparameters.
- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
, use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- x_testarray or dataframe, optional
Additional covariates for “test points”.
- weightsarray, optional
Weights for the error variance on the test points.
- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- gpGP
A centered Gaussian process object. To add the mean, use the
mean
attribute of thebart
object. The keys of the GP are ‘Xmean’, ‘Xnoise’, and ‘X’, where the “X” stands either for ‘train’ or ‘test’, and X = Xmean + Xnoise.
- pred(*, hp='map', error=False, format='matrices', x_test=None, weights=None, rng=None)[source]¶
Predict the outcome at given locations.
- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
, use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- errorbool
If
False
(default), make a prediction for the latent mean. IfTrue
, add the error term.- format{‘matrices’, ‘gvar’}
If ‘matrices’ (default), return the mean and covariance matrix separately. If ‘gvar’, return an array of gvars.
- x_testarray or dataframe, optional
Covariates for the locations where the prediction is computed. If not specified, predict at the data covariates.
- weightsarray, optional
Weights for the error variance on the test points.
- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- If
format
is ‘matrices’ (default): - mean, covarrays
The mean and covariance matrix of the Normal posterior distribution over the regression function at the specified locations.
- If
format
is ‘gvar’: - outarray of
GVar
The same distribution represented as an array of
GVar
objects.
- If
- class lsqfitgp.bayestree.bcf(*, y, z, x_mu, x_tau=None, pihat, include_pi='mu', weights=None, fitkw={}, kernelkw_mu={}, kernelkw_tau={}, marginalize_mean=True, gpaux=None, x_aux=None, otherhp={}, transf='standardize')[source]¶
Nonparametric Bayesian regression with a GP version of BCF.
BCF (Bayesian Causal Forests) is a regression method for observational causal inference studies introduced in [1] based on a pair of BART models.
This class evaluates a Gaussian process regression with a kernel which accurately approximates BCF in the infinite trees limit of each BART model. The hyperparameters are optimized to their marginal MAP.
The model is (loosely, see notes below) \(y = \mu(x) + z\tau(x)\), so \(\tau(x)\) is the expected causal effect of \(z\) on \(y\) at location \(x\).
- Parameters:
- y(n,) array, series or dataframe
Outcome.
- z(n,) array, series or dataframe
Binary treatment status: 0 control group, 1 treatment group.
- x_mu(n, p) array, series or dataframe
Covariates for the \(\mu\) model.
- x_tau(n, q) array, series or dataframe, optional
Covariates for the \(\tau\) model. If not specified, use
x_mu
.- pihat(n,) array, series or dataframe
Estimated propensity score, i.e., P(Z=1|X).
- include_pi{‘mu’, ‘tau’, ‘both’}, optional
Whether to include the propensity score in the \(\mu\) model, the \(\tau\) model, or both. Default is
'mu'
.- weights(n,) array, series or dataframe
Weights used to rescale the error variance (as 1 / weight).
- fitkwdict
Additional arguments passed to
empbayes_fit
, overrides the defaults.- kernelkw_mu, kernelkw_taudict
Additional arguments passed to
BART
for each model, overrides the defaults.- marginalize_meanbool
If True (default), marginalize the intercept of the model.
- gpauxcallable, optional
If specified, this function is called with a pair
(hp, gp)
, wherehp
is a dictionary of hyperparameters, andgp
is aGP
object under construction, and is expected to return a modifiedgp
with a new process named'aux'
defined withdefproc
or similar. The process is added to the regression model. The input to the process is a structured array with fields:- ‘train’bool
Indicates whether the data is training set (the one passed on initialization) or test set (the one passed to
pred
orgp
).- ‘i’int
Index of the flattened array.
- ‘z’int
Treatment status.
- ‘mu’, ‘tau’structured
The values in
x_mu
andx_tau
, converted to indices according to the BART grids. Wherepihat
has been added, there are two subfields:'x'
which contains the covariates, and'pihat'
, the latter expressed in indices as well.- ‘pihat’float
The
pihat
argument. Contrary to the subfield included under'mu'
and/or'tau'
, this field contains the original values.- ‘aux’structured
The values in
x_aux
, if specified.
- x_aux(n, k) array, series or dataframe, optional
Additional covariates for the
'aux'
process.- otherhpdictionary of gvar
A dictionary with the prior of arbitrary additional hyperpameters, intended to be used by
gpaux
ortransf
.- transf(list of) str or pair of callable
Data transformation. Either a string indicating a pre-defined transformation, or a pair
(from_data, to_data)
, two functions with signaturesfrom_data(hp, y) -> eta
andto_data(hp, eta) -> y
, whereeta
is the value to which the model is fit, andhp
is the dictionary of hyperparameters. The functions must be ufuncs and one the inverse of the other w.r.t. the second parameter.from_data
must be derivable withjax
w.r.t.y
.If a list of such specifications is provided, the transformations are applied in order, with the first one being the outermost, i.e., the one applied first to the data.
If a transformation uses additional hyperparameters, either predefined automatically or passed by the user through
otherhp
, they are inferred with the rest of the hyperparameters.The pre-defined transformations are:
- ‘standardize’ (default)
eta = (y - mean(train_y)) / sdev(train_y)
- ‘yeojohnson’
The Yeo-Johnson transformation [2] to reduce skewness. The \(\lambda\) parameter is bounded in \((0, 2)\) for implementation convenience, this restriction may be lifted in future versions.
- Attributes:
- mfloat or gvar
The prior mean \(m\).
- sigmagvar
The error term standard deviation \(\sigma\). If there are weights, the sdev for each unit is obtained dividing
sigma
by sqrt(weight).- alpha_mu, alpha_taugvar
The numerator of the tree spawn probability \(\alpha_*\) (named
base
in R bcf).- beta_mu, beta_taugvar
The depth exponent of the tree spawn probability \(\beta_*\) (named
power
in R bcf).- lambda_mu, lambda_taugvar
The prior standard deviation \(\lambda_*\).
- z_0gvar
The treatment coding parameter.
- fitempbayes_fit
The hyperparameters fit object.
Methods
gp
(*[, hp, z, x_mu, x_tau, pihat, x_aux, ...])Create a Gaussian process with the fitted hyperparameters.
data
(*[, hp, rng])Get the data to be passed to
GP.pred
on a GP object returned bygp
.pred
(*[, hp, error, z, x_mu, x_tau, pihat, ...])Predict the transformed outcome at given locations.
from_data
(y, *[, hp, rng])Transforms outcomes \(y\) to the regression variable \(\eta\).
to_data
(eta, *[, hp, rng])Convert the regression variable \(\eta\) to outcomes \(y\).
See also
Notes
The regression model is:
\[\begin{split}\eta_i = g(y_i; \ldots) &= m + {} \\ &\phantom{{}={}} + \lambda_\mu \mu(\mathbf x^\mu_i, \hat\pi_i?) + {} \\ &\phantom{{}={}} + \lambda_\tau \tau(\mathbf x^\tau_i, \hat\pi_i?) (z_i - z_0) + {} \\ &\phantom{{}={}} + \mathrm{aux}(i, z_i, \mathbf x^\mu_i, \mathbf x^\tau_i, \hat\pi_i, \mathbf x^\text{aux}_i) + {} \\ &\phantom{{}={}} + \varepsilon_i, \\ \varepsilon_i &\sim N(0, \sigma^2 / w_i), \\ m &\sim N(0, 1), \\ \log \sigma^2 &\sim N(\log\bar w, 4), \\ \lambda_\mu &\sim \mathrm{HalfCauchy}(2), \\ \lambda_\tau &\sim \mathrm{HalfNormal}(1.48), \\ \mu &\sim \mathrm{GP}(0, \mathrm{BART}(\alpha_\mu, \beta_\mu) ), \\ \tau &\sim \mathrm{GP}(0, \mathrm{BART}(\alpha_\tau, \beta_\tau) ), \\ \mathrm{aux} & \sim \mathrm{GP}(0, \text{<user defined>}), \\ \alpha_\mu, \alpha_\tau &\sim \mathrm{Beta}(2, 1), \\ \beta_\mu, \beta_\tau &\sim \mathrm{InvGamma}(1, 1), \\ z_0 &\sim U(0, 1),\end{split}\]To make the inference, \((\mu, \tau, \boldsymbol\varepsilon, m, \mathrm{aux})\) are marginalized analytically, and the marginal posterior mode of \((\sigma, \lambda_*, \alpha_*, \beta_*, z_0, \ldots)\) is found by numerical minimization, after transforming them to express their prior as a Gaussian copula. Their marginal posterior covariance matrix is estimated with an approximation of the hessian inverse. See
empbayes_fit
and use the parameterfitkw
to customize this procedure.The tree splitting grid of the BART kernel is set using quantiles of the observed covariates. This corresponds to settings
usequants=True
,numcut=inf
in the R packages BayesTree and BART. Use the parameterskernelkw_mu
andkernelkw_tau
to customize the grids.The difference between the regression model evaluated at \(Z=1\) vs. \(Z=0\) can be interpreted as the causal effect \(Z \rightarrow Y\) if the unconfoundedness assumption is made:
\[\{Y(Z=0), Y(Z=1)\} \perp\!\!\!\perp Z \mid X.\]In practical terms, this holds when:
\(X\) are pre-treatment variables, i.e., they represent quantities causally upstream of \(Z\).
\(X\) are sufficient to adjust for all common causes of \(Z\) and \(Y\), such that the only remaining difference is the causal effect and not just a correlation.
Here \(X\) consists in
x_tau
,x_mu
andx_aux
. However these arrays may also be used to pass “technical” values used to set up the model, that do not satisfy the uncounfoundedness assumption, if you know what you are doing.References
[1]P. Richard Hahn, Jared S. Murray, Carlos M. Carvalho “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion),” Bayesian Analysis 15(3), 965-1056, September 2020, https://doi.org/10.1214/19-BA1195
[2]Yeo, In-Kwon; Johnson, Richard A. (2000). “A New Family of Power Transformations to Improve Normality or Symmetry”. Biometrika. 87 (4): 954–959. https://doi.org/10.1093/biomet/87.4.954
- data(*, hp='map', rng=None)[source]¶
Get the data to be passed to
GP.pred
on a GP object returned bygp
.- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
(default), use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- datadict
A dictionary representing
eta
in the format required by theGP.pred
method.
- from_data(y, *, hp='map', rng=None)[source]¶
Transforms outcomes \(y\) to the regression variable \(\eta\).
- Parameters:
- y(n,) array
Outcomes.
- hpstr or dict
The hyperparameters to use. If
'map'
(default), use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- eta(n,) array
Transformed outcomes.
- gp(*, hp='map', z=None, x_mu=None, x_tau=None, pihat=None, x_aux=None, weights=None, rng=None)[source]¶
Create a Gaussian process with the fitted hyperparameters.
- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
(default), use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- z(m,) array, series or dataframe, optional
Treatment status at test points. If specified, also
x_mu
,pihat
,x_tau
andx_aux
(the latter two if and only also specified at initialization) must be specified.- x_mu(m, p) array, series or dataframe, optional
Control model covariates at test points.
- x_tau(m, q) array, series or dataframe, optional
Moderating model covariates at test points.
- pihat(m,) array, series or dataframe, optional
Estimated propensity score at test points.
- x_aux(m, k) array, series or dataframe, optional
Additional covariates for the
'aux'
process.- weights(m,) array, series or dataframe, optional
Weights for the error variance on the test points.
- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- gpGP
A centered Gaussian process object. To add the mean, use the
m
attribute of thebcf
object. The keys of the GP are'@mean'
,'@noise'
, and'@'
, where the “@” stands either for ‘train’ or ‘test’, and @ = @mean + @noise.This Gaussian process is defined on the transformed data
eta
.
- pred(*, hp='map', error=False, z=None, x_mu=None, x_tau=None, pihat=None, x_aux=None, weights=None, transformed=True, samples=None, gvars=False, rng=None)[source]¶
Predict the transformed outcome at given locations.
- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
(default), use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- errorbool, default False
If
False
, make a prediction for the latent mean. IfTrue
, add the error term.- z(m,) array, series or dataframe, optional
Treatment status at test points. If specified, also
x_mu
,pihat
,x_tau
andx_aux
(the latter two if and only also specified at initialization) must be specified.- x_mu(m, p) array, series or dataframe, optional
\(\mu\) model covariates at test points.
- x_tau(m, q) array, series or dataframe, optional
\(\tau\) model covariates at test points.
- pihat(m,) array, series or dataframe, optional
Estimated propensity score at test points.
- x_aux(m, k) array, series or dataframe, optional
Additional covariates for the
'aux'
process at test points.- weights(m,) array, series or dataframe, optional
Weights for the error variance on the test points.
- transformedbool, default True
If
True
, return the prediction on the transformed outcome \(\eta\), else the observable outcome \(y\).- samplesint, optional
If specified, indicates the number of samples to take from the posterior. If not, return the mean and covariance matrix of the posterior.
- gvarsbool, default False
If
True
, return the mean and covariance matrix of the posterior as an array ofGVar
variables.- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
orsamples
is notNone
.
- Returns:
- If
samples
isNone
andgvars
isFalse
(default): - mean, cov(m,) and (m, m) arrays
The mean and covariance matrix of the Normal posterior distribution over the regression function or \(\eta\) at the specified locations.
- If
samples
isNone
andgvars
isTrue
: - out(m,) array of gvars
The same distribution represented as an array of
GVar
objects.- If
samples
is an integer: - sample(samples, m) array
Posterior samples over either the regression function, \(\eta\), or \(y\).
- If
- to_data(eta, *, hp='map', rng=None)[source]¶
Convert the regression variable \(\eta\) to outcomes \(y\).
- Parameters:
- eta(n,) array
Transformed outcomes.
- hpstr or dict
The hyperparameters to use. If
'map'
(default), use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- y(n,) array
Outcomes.