11. BART¶
The bayestree
submodule contains classes to set up a Gaussian process
regression with the BART
kernel. See the bart,
barteasy and acic examples.
The original version of these models is implemented in the R packages BART, dbarts, and bcf. Pros of the GP version provided here:
No MCMC, the result is computed quite directly.
Fast inference on the hyperparameters.
Allows a wider region of hyperparameter space.
Cons:
Does not scale to large datasets.
Worse predictive performance at fixed hyperparameters.
Slower at fixed hyperparameters.
May use more memory.
Overall, this version should tipycally be a better choice than the R packages if you have at most 10000 datapoints.
11.1. Index¶
11.2. Documentation¶
- class lsqfitgp.bayestree.bart(x_train, y_train, *, weights=None, fitkw={}, kernelkw={}, marginalize_mean=True)[source]¶
Nonparametric Bayesian regression with a GP version of BART.
Evaluate a Gaussian process regression with a kernel which accurately approximates the infinite trees limit of BART. The hyperparameters are optimized to their marginal MAP.
- Parameters:
- x_train(n, p) array or dataframe
Observed covariates.
- y_train(n,) array
Observed outcomes.
- weights(n,) array
Weights used to rescale the error variance (as 1 / weight).
- fitkwdict
Additional arguments passed to
empbayes_fit
, overrides the defaults.- kernelkwdict
Additional arguments passed to
BART
, overrides the defaults.- marginalize_meanbool
If True (default), marginalize the intercept of the model.
See also
Notes
The regression model is:
\[\begin{split}y_i &= \mu + \lambda f(\mathbf x_i) + \varepsilon_i, \\ \varepsilon_i &\overset{\mathrm{i.i.d.}}{\sim} N(0, \sigma^2 / w_i), \\ \mu &\sim N( (\max(\mathbf y) + \min(\mathbf y)) / 2, (\max(\mathbf y) - \min(\mathbf y))^2 / 4 ), \\ \log \sigma^2 &\sim N( \log(\overline{w(y - \bar y)^2}), 4 ), \\ \log \lambda &\sim N( \log ((\max(\mathbf y) - \min(\mathbf y)) / 4), 4 ), \\ f &\sim \mathrm{GP}( 0, \mathrm{BART}(\alpha,\beta) ), \\ \alpha &\sim \mathrm{B}(2, 1), \\ \beta &\sim \mathrm{IG}(1, 1).\end{split}\]To make the inference, \((f, \boldsymbol\varepsilon, \mu)\) are marginalized analytically, and the marginal posterior mode of \((\sigma, \lambda, \alpha, \beta)\) is found by numerical minimization, after transforming them to express their prior as a Gaussian copula. Their marginal posterior covariance matrix is estimated with an approximation of the hessian inverse. See
empbayes_fit
and use the parameterfitkw
to customize this procedure.The tree splitting grid of the BART kernel is set using quantiles of the observed covariates. This corresponds to settings
usequants=True
,numcut=inf
in the R packages BayesTree and BART. Use thekernelkw
parameter to customize the grid.- Attributes:
- meangvar
The prior mean \(\mu\).
- sigmafloat or gvar
The error term standard deviation \(\sigma\). If there are weights, the sdev for each unit is obtained dividing
sigma
by sqrt(weight).- alphagvar
The numerator of the tree spawn probability \(\alpha\) (named
base
in BayesTree and BART).- betagvar
The depth exponent of the tree spawn probability \(\beta\) (named
power
in BayesTree and BART).- meansdevgvar
The prior standard deviation \(\lambda\) of the latent regression function.
- fitempbayes_fit
The hyperparameters fit object.
Methods
gp
(*[, hp, x_test, weights, rng])Create a Gaussian process with the fitted hyperparameters.
data
(*[, hp, rng])Get the data to be passed to
GP.pred
on a GP object returned bygp
.pred
(*[, hp, error, format, x_test, ...])Predict the outcome at given locations.
- data(*, hp='map', rng=None)[source]¶
Get the data to be passed to
GP.pred
on a GP object returned bygp
.- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
, use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- datadict
A dictionary representing
y_train
in the format required by theGP.pred
method.
- gp(*, hp='map', x_test=None, weights=None, rng=None)[source]¶
Create a Gaussian process with the fitted hyperparameters.
- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
, use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- x_testarray or dataframe, optional
Additional covariates for “test points”.
- weightsarray, optional
Weights for the error variance on the test points.
- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- gpGP
A centered Gaussian process object. To add the mean, use the
mean
attribute of thebart
object. The keys of the GP are ‘Xmean’, ‘Xnoise’, and ‘X’, where the “X” stands either for ‘train’ or ‘test’, and X = Xmean + Xnoise.
- pred(*, hp='map', error=False, format='matrices', x_test=None, weights=None, rng=None)[source]¶
Predict the outcome at given locations.
- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
, use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- errorbool
If
False
(default), make a prediction for the latent mean. IfTrue
, add the error term.- format{‘matrices’, ‘gvar’}
If ‘matrices’ (default), return the mean and covariance matrix separately. If ‘gvar’, return an array of gvars.
- x_testarray or dataframe, optional
Covariates for the locations where the prediction is computed. If not specified, predict at the data covariates.
- weightsarray, optional
Weights for the error variance on the test points.
- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- If
format
is ‘matrices’ (default): - mean, covarrays
The mean and covariance matrix of the Normal posterior distribution over the regression function at the specified locations.
- If
format
is ‘gvar’: - outarray of
GVar
The same distribution represented as an array of
GVar
objects.
- If
- class lsqfitgp.bayestree.bcf(*, y, z, x_mu, x_tau=None, pihat, include_pi='mu', weights=None, fitkw={}, kernelkw_mu={}, kernelkw_tau={}, marginalize_mean=True, gpaux=None, x_aux=None, otherhp={})[source]¶
Nonparametric Bayesian regression with a GP version of BCF.
BCF (Bayesian Causal Forests) is a regression method for observational causal inference studies introduced in [1] based on a pair of BART models.
This class evaluates a Gaussian process regression with a kernel which accurately approximates BCF in the infinite trees limit of each BART model. The hyperparameters are optimized to their marginal MAP.
The model is (loosely, see notes below) \(y = \mu(x) + z\tau(x)\), so \(\tau(x)\) is the expected causal effect of \(z\) on \(y\) at location \(x\).
- Parameters:
- y(n,) array, series or dataframe
Outcome.
- z(n,) array, series or dataframe
Binary treatment status: 0 control group, 1 treatment group.
- x_mu(n, p) array, series or dataframe
Covariates for the \(\mu\) model.
- x_tau(n, q) array, series or dataframe, optional
Covariates for the \(\tau\) model. If not specified, use
x_mu
.- pihat(n,) array, series or dataframe
Estimated propensity score, i.e., P(Z=1|X).
- include_pi{‘mu’, ‘tau’, ‘both’}, optional
Whether to include the propensity score in the \(\mu\) model, the \(\tau\) model, or both. Default is
'mu'
.- weights(n,) array, series or dataframe
Weights used to rescale the error variance (as 1 / weight).
- fitkwdict
Additional arguments passed to
empbayes_fit
, overrides the defaults.- kernelkw_mu, kernelkw_taudict
Additional arguments passed to
BART
for each model, overrides the defaults.- marginalize_meanbool
If True (default), marginalize the intercept of the model.
- gpauxcallable, optional
If specified, this function is called with a pair
(hp, gp)
, wherehp
is a dictionary of hyperparameters, andgp
is aGP
object under construction, and is expected to define a new process named'aux'
withaddproc
or similar. The process is added to the regression model. The input to the process is a structured array with fields:- ‘train’bool
Indicates wether the data is training set (the one passed on initialization) or test set (the one passed to
pred
orgp
).- ‘i’int
Index of the flattened array.
- ‘z’int
Treatment status.
- ‘mu’, ‘tau’structured
The values in
x_mu
andx_tau
, converted to indices according to the BART grids. Wherepihat
has been added, there are two subfields:'x'
which contains the covariates, and'pihat'
, the latter expressed in indices as well.- ‘pihat’float
The
pihat
argument. Contrary to the subfield included under'mu'
and/or'tau'
, this field contains the original values.- ‘aux’structured
The values in
x_aux
, if specified.
- x_aux(n, k) array, series or dataframe, optional
Additional covariates for the
'aux'
process.- otherhpdictionary of gvar
A dictionary with the prior of additional hyperpameters, intended to be used by
gpaux
.
See also
Notes
The regression model is:
\[\begin{split}y_i &= m + {} \\ &\phantom{{}={}} + \lambda_\mu\,\mathrm{std}(\mathbf y) \mu(\mathbf x^\mu_i, \hat\pi_i?) + {} \\ &\phantom{{}={}} + \lambda_\tau\,\mathrm{std}(\mathbf y) \tau(\mathbf x^\tau_i, \hat\pi_i?) (z_i - z_0) + {} \\ &\phantom{{}={}} + \mathrm{aux}(i, z_i, \mathbf x^\mu_i, \mathbf x^\tau_i, \hat\pi_i, \mathbf x^\text{aux}_i) + {} \\ &\phantom{{}={}} + \varepsilon_i, \\ \varepsilon_i &\sim N(0, \sigma^2 / w_i), \\ m &\sim N( (\max(\mathbf y) + \min(\mathbf y)) / 2, (\max(\mathbf y) - \min(\mathbf y))^2 / 4 ), \\ \log \sigma^2 &\sim N( \log(\overline{w(y - \bar y)^2}), 4 ), \\ \lambda_\mu &\sim \mathrm{HalfCauchy}(2), \\ \lambda_\tau &\sim \mathrm{HalfNormal}(1.48), \\ \mu &\sim \mathrm{GP}(0, \mathrm{BART}(\alpha_\mu, \beta_\mu) ), \\ \tau &\sim \mathrm{GP}(0, \mathrm{BART}(\alpha_\tau, \beta_\tau) ), \\ \mathrm{aux} & \sim \mathrm{GP}(0, \text{<user defined>}), \\ \alpha_\mu, \alpha_\tau &\sim \mathrm{B}(2, 1), \\ \beta_\mu, \beta_\tau &\sim \mathrm{IG}(1, 1), \\ z_0 &\sim U(0, 1),\end{split}\]To make the inference, \((\mu, \tau, \boldsymbol\varepsilon, m, \mathrm{aux})\) are marginalized analytically, and the marginal posterior mode of \((\sigma, \lambda_*, \alpha_*, \beta_*, z_0)\) is found by numerical minimization, after transforming them to express their prior as a Gaussian copula. Their marginal posterior covariance matrix is estimated with an approximation of the hessian inverse. See
empbayes_fit
and use the parameterfitkw
to customize this procedure.The tree splitting grid of the BART kernel is set using quantiles of the observed covariates. This corresponds to settings
usequants=True
,numcut=inf
in the R packages BayesTree and BART. Use the parameterskernelkw_mu
andkernelkw_tau
to customize the grids.The difference between the regression model evaluated at \(Z=1\) vs. \(Z=0\) can be interpreted as the causal effect \(Z \rightarrow Y\) if the unconfoundedness assumption is made:
\[\{Y(Z=0), Y(Z=1)\} \perp\!\!\!\perp Z \mid X.\]In practical terms, this holds when:
\(X\) are pre-treatment variables, i.e., they represent quantities causally upstream of \(Z\).
\(X\) are sufficient to adjust for all common causes of \(Z\) and \(Y\), such that the only remaining difference is the causal effect and not just a correlation.
Here \(X\) consists in
x_tau
,x_mu
andx_aux
. However these arrays may also be used to pass “technical” values used to set up the model, that do not satisfy the uncounfoundedness assumption, if you know what you are doing.References
[1]P. Richard Hahn, Jared S. Murray, Carlos M. Carvalho “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion),” Bayesian Analysis 15(3), 965-1056, September 2020, https://doi.org/10.1214/19-BA1195
- Attributes:
- mfloat or gvar
The prior mean \(m\).
- sigmagvar
The error term standard deviation \(\sigma\). If there are weights, the sdev for each unit is obtained dividing
sigma
by sqrt(weight).- alpha_mu, alpha_taugvar
The numerator of the tree spawn probability \(\alpha_*\) (named
base
in R bcf).- beta_mu, beta_taugvar
The depth exponent of the tree spawn probability \(\beta_*\) (named
power
in R bcf).- sdev_mu, sdev_taugvar
The prior standard deviation \(\lambda_* \mathrm{std}(\mathbf y)\).
- z_0gvar
The treatment coding parameter.
- fitempbayes_fit
The hyperparameters fit object.
Methods
gp
(*[, hp, z, x_mu, x_tau, pihat, x_aux, ...])Create a Gaussian process with the fitted hyperparameters.
data
(*[, hp, rng])Get the data to be passed to
GP.pred
on a GP object returned bygp
.pred
(*[, hp, error, format, z, x_mu, x_tau, ...])Predict the outcome at given locations.
- data(*, hp='map', rng=None)[source]¶
Get the data to be passed to
GP.pred
on a GP object returned bygp
.- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
(default), use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- datadict
A dictionary representing
y
in the format required by theGP.pred
method.
- gp(*, hp='map', z=None, x_mu=None, x_tau=None, pihat=None, x_aux=None, weights=None, rng=None)[source]¶
Create a Gaussian process with the fitted hyperparameters.
- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
(default), use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- z(m,) array, series or dataframe, optional
Treatment status at test points. If specified, also
x_mu
,pihat
,x_tau
andx_aux
(the latter two if and only also specified at initialization) must be specified.- x_mu(m, p) array, series or dataframe, optional
Control model covariates at test points.
- x_tau(m, q) array, series or dataframe, optional
Moderating model covariates at test points.
- pihat(m,) array, series or dataframe, optional
Estimated propensity score at test points.
- x_aux(m, k) array, series or dataframe, optional
Additional covariates for the
'aux'
process.- weights(m,) array, series or dataframe, optional
Weights for the error variance on the test points.
- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- gpGP
A centered Gaussian process object. To add the mean, use the
m
attribute of thebcf
object. The keys of the GP are'Xmean'
,'Xnoise'
, and'X'
, where the “X” stands either for ‘train’ or ‘test’, and X = Xmean + Xnoise.
- pred(*, hp='map', error=False, format='matrices', z=None, x_mu=None, x_tau=None, pihat=None, x_aux=None, weights=None, rng=None)[source]¶
Predict the outcome at given locations.
- Parameters:
- hpstr or dict
The hyperparameters to use. If
'map'
(default), use the marginal maximum a posteriori. If'sample'
, sample hyperparameters from the posterior. If a dict, use the given hyperparameters.- errorbool
If
False
(default), make a prediction for the latent mean. IfTrue
, add the error term.- format{‘matrices’, ‘gvar’}
If
'matrices'
(default), return the mean and covariance matrix separately. If'gvar'
, return an array ofGVar
.- z(m,) array, series or dataframe, optional
Treatment status at test points. If specified, also
x_mu
,pihat
,x_tau
andx_aux
(the latter two if and only also specified at initialization) must be specified.- x_mu(m, p) array, series or dataframe, optional
\(\mu\) model covariates at test points.
- x_tau(m, q) array, series or dataframe, optional
\(\tau\) model covariates at test points.
- pihat(m,) array, series or dataframe, optional
Estimated propensity score at test points.
- x_aux(m, k) array, series or dataframe, optional
Additional covariates for the
'aux'
process at test points.- weights(m,) array, series or dataframe, optional
Weights for the error variance on the test points.
- rngnumpy.random.Generator, optional
Random number generator, used if
hp == 'sample'
.
- Returns:
- If
format
is'matrices'
(default): - mean, covarrays
The mean and covariance matrix of the Normal posterior distribution over the regression function at the specified locations.
- If
format
is'gvar'
: - outarray of gvar
The same distribution represented as an array of
GVar
objects.
- If