BART¶
The bayestree submodule contains classes to set up a Gaussian process
regression with the BART kernel. See the bart,
barteasy and bcf examples.
The original version of these models is implemented in the R packages BART, dbarts, and bcf. Pros of the GP version provided here:
No MCMC, the result is computed quite directly.
Fast inference on the hyperparameters.
Allows a wider region of hyperparameter space.
Cons:
Does not scale to large datasets.
Worse predictive performance at fixed hyperparameters.
Slower at fixed hyperparameters.
May use more memory.
Overall, this version should typically be a better choice than the R packages if you have at most 10000 datapoints.
Index¶
Classes¶
- class lsqfitgp.bayestree.bart(x_train, y_train, *, weights=None, fitkw={}, kernelkw={}, marginalize_mean=True)[source]¶
- Nonparametric Bayesian regression with a GP version of BART. - Evaluate a Gaussian process regression with a kernel which accurately approximates the infinite trees limit of BART. The hyperparameters are optimized to their marginal MAP. - Parameters:
- x_train(n, p) array or dataframe
- Observed covariates. 
- y_train(n,) array
- Observed outcomes. 
- weights(n,) array
- Weights used to rescale the error variance (as 1 / weight). 
- fitkwdict
- Additional arguments passed to - empbayes_fit, overrides the defaults.
- kernelkwdict
- Additional arguments passed to - BART, overrides the defaults.
- marginalize_meanbool
- If True (default), marginalize the intercept of the model. 
 
- Attributes:
- meangvar
- The prior mean \(\mu\). 
- sigmafloat or gvar
- The error term standard deviation \(\sigma\). If there are weights, the sdev for each unit is obtained dividing - sigmaby sqrt(weight).
- alphagvar
- The numerator of the tree spawn probability \(\alpha\) (named - basein BayesTree and BART).
- betagvar
- The depth exponent of the tree spawn probability \(\beta\) (named - powerin BayesTree and BART).
- meansdevgvar
- The prior standard deviation \(\lambda\) of the latent regression function. 
- fitempbayes_fit
- The hyperparameters fit object. 
 
 - Methods - gp(*[, hp, x_test, weights, rng])- Create a Gaussian process with the fitted hyperparameters. - data(*[, hp, rng])- Get the data to be passed to - GP.predon a GP object returned by- gp.- pred(*[, hp, error, format, x_test, ...])- Predict the outcome at given locations. - See also - Notes - The regression model is: \[\begin{split}y_i &= \mu + \lambda f(\mathbf x_i) + \varepsilon_i, \\ \varepsilon_i &\overset{\mathrm{i.i.d.}}{\sim} N(0, \sigma^2 / w_i), \\ \mu &\sim N( (\max(\mathbf y) + \min(\mathbf y)) / 2, (\max(\mathbf y) - \min(\mathbf y))^2 / 4 ), \\ \log \sigma^2 &\sim N( \log(\overline{w(y - \bar y)^2}), 4 ), \\ \log \lambda &\sim N( \log ((\max(\mathbf y) - \min(\mathbf y)) / 4), 4 ), \\ f &\sim \mathrm{GP}( 0, \mathrm{BART}(\alpha,\beta) ), \\ \alpha &\sim \mathrm{B}(2, 1), \\ \beta &\sim \mathrm{IG}(1, 1).\end{split}\]- To make the inference, \((f, \boldsymbol\varepsilon, \mu)\) are marginalized analytically, and the marginal posterior mode of \((\sigma, \lambda, \alpha, \beta)\) is found by numerical minimization, after transforming them to express their prior as a Gaussian copula. Their marginal posterior covariance matrix is estimated with an approximation of the hessian inverse. See - empbayes_fitand use the parameter- fitkwto customize this procedure.- The tree splitting grid of the BART kernel is set using quantiles of the observed covariates. This corresponds to settings - usequants=True,- numcut=infin the R packages BayesTree and BART. Use the- kernelkwparameter to customize the grid.- data(*, hp='map', rng=None)[source]¶
- Get the data to be passed to - GP.predon a GP object returned by- gp.- Parameters:
- hpstr or dict
- The hyperparameters to use. If - 'map', use the marginal maximum a posteriori. If- 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.
- rngnumpy.random.Generator, optional
- Random number generator, used if - hp == 'sample'.
 
- Returns:
- datadict
- A dictionary representing - y_trainin the format required by the- GP.predmethod.
 
 
 - gp(*, hp='map', x_test=None, weights=None, rng=None)[source]¶
- Create a Gaussian process with the fitted hyperparameters. - Parameters:
- hpstr or dict
- The hyperparameters to use. If - 'map', use the marginal maximum a posteriori. If- 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.
- x_testarray or dataframe, optional
- Additional covariates for “test points”. 
- weightsarray, optional
- Weights for the error variance on the test points. 
- rngnumpy.random.Generator, optional
- Random number generator, used if - hp == 'sample'.
 
- Returns:
- gpGP
- A centered Gaussian process object. To add the mean, use the - meanattribute of the- bartobject. The keys of the GP are ‘Xmean’, ‘Xnoise’, and ‘X’, where the “X” stands either for ‘train’ or ‘test’, and X = Xmean + Xnoise.
 
 
 - pred(*, hp='map', error=False, format='matrices', x_test=None, weights=None, rng=None)[source]¶
- Predict the outcome at given locations. - Parameters:
- hpstr or dict
- The hyperparameters to use. If - 'map', use the marginal maximum a posteriori. If- 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.
- errorbool
- If - False(default), make a prediction for the latent mean. If- True, add the error term.
- format{‘matrices’, ‘gvar’}
- If ‘matrices’ (default), return the mean and covariance matrix separately. If ‘gvar’, return an array of gvars. 
- x_testarray or dataframe, optional
- Covariates for the locations where the prediction is computed. If not specified, predict at the data covariates. 
- weightsarray, optional
- Weights for the error variance on the test points. 
- rngnumpy.random.Generator, optional
- Random number generator, used if - hp == 'sample'.
 
- Returns:
- If formatis ‘matrices’ (default):
- mean, covarrays
- The mean and covariance matrix of the Normal posterior distribution over the regression function at the specified locations. 
- If formatis ‘gvar’:
- outarray of GVar
- The same distribution represented as an array of - GVarobjects.
 
- If 
 
 
- class lsqfitgp.bayestree.bcf(*, y, z, x_mu, x_tau=None, pihat, include_pi='mu', weights=None, fitkw={}, kernelkw_mu={}, kernelkw_tau={}, marginalize_mean=True, gpaux=None, x_aux=None, otherhp={}, transf='standardize')[source]¶
- Nonparametric Bayesian regression with a GP version of BCF. - BCF (Bayesian Causal Forests) is a regression method for observational causal inference studies introduced in [1] based on a pair of BART models. - This class evaluates a Gaussian process regression with a kernel which accurately approximates BCF in the infinite trees limit of each BART model. The hyperparameters are optimized to their marginal MAP. - The model is (loosely, see notes below) \(y = \mu(x) + z\tau(x)\), so \(\tau(x)\) is the expected causal effect of \(z\) on \(y\) at location \(x\). - Parameters:
- y(n,) array, series or dataframe
- Outcome. 
- z(n,) array, series or dataframe
- Binary treatment status: 0 control group, 1 treatment group. 
- x_mu(n, p) array, series or dataframe
- Covariates for the \(\mu\) model. 
- x_tau(n, q) array, series or dataframe, optional
- Covariates for the \(\tau\) model. If not specified, use - x_mu.
- pihat(n,) array, series or dataframe
- Estimated propensity score, i.e., P(Z=1|X). 
- include_pi{‘mu’, ‘tau’, ‘both’}, optional
- Whether to include the propensity score in the \(\mu\) model, the \(\tau\) model, or both. Default is - 'mu'.
- weights(n,) array, series or dataframe
- Weights used to rescale the error variance (as 1 / weight). 
- fitkwdict
- Additional arguments passed to - empbayes_fit, overrides the defaults.
- kernelkw_mu, kernelkw_taudict
- Additional arguments passed to - BARTfor each model, overrides the defaults.
- marginalize_meanbool
- If True (default), marginalize the intercept of the model. 
- gpauxcallable, optional
- If specified, this function is called with a pair - (hp, gp), where- hpis a dictionary of hyperparameters, and- gpis a- GPobject under construction, and is expected to return a modified- gpwith a new process named- 'aux'defined with- defprocor similar. The process is added to the regression model. The input to the process is a structured array with fields:- ‘train’bool
- Indicates whether the data is training set (the one passed on initialization) or test set (the one passed to - predor- gp).
- ‘i’int
- Index of the flattened array. 
- ‘z’int
- Treatment status. 
- ‘mu’, ‘tau’structured
- The values in - x_muand- x_tau, converted to indices according to the BART grids. Where- pihathas been added, there are two subfields:- 'x'which contains the covariates, and- 'pihat', the latter expressed in indices as well.
- ‘pihat’float
- The - pihatargument. Contrary to the subfield included under- 'mu'and/or- 'tau', this field contains the original values.
- ‘aux’structured
- The values in - x_aux, if specified.
 
- x_aux(n, k) array, series or dataframe, optional
- Additional covariates for the - 'aux'process.
- otherhpdictionary of gvar
- A dictionary with the prior of arbitrary additional hyperpameters, intended to be used by - gpauxor- transf.
- transf(list of) str or pair of callable
- Data transformation. Either a string indicating a pre-defined transformation, or a pair - (from_data, to_data), two functions with signatures- from_data(hp, y) -> etaand- to_data(hp, eta) -> y, where- etais the value to which the model is fit, and- hpis the dictionary of hyperparameters. The functions must be ufuncs and one the inverse of the other w.r.t. the second parameter.- from_datamust be derivable with- jaxw.r.t.- y.- If a list of such specifications is provided, the transformations are applied in order, with the first one being the outermost, i.e., the one applied first to the data. - If a transformation uses additional hyperparameters, either predefined automatically or passed by the user through - otherhp, they are inferred with the rest of the hyperparameters.- The pre-defined transformations are: - ‘standardize’ (default)
- eta = (y - mean(train_y)) / sdev(train_y) 
- ‘yeojohnson’
- The Yeo-Johnson transformation [2] to reduce skewness. The \(\lambda\) parameter is bounded in \((0, 2)\) for implementation convenience, this restriction may be lifted in future versions. 
 
 
- Attributes:
- mfloat or gvar
- The prior mean \(m\). 
- sigmagvar
- The error term standard deviation \(\sigma\). If there are weights, the sdev for each unit is obtained dividing - sigmaby sqrt(weight).
- alpha_mu, alpha_taugvar
- The numerator of the tree spawn probability \(\alpha_*\) (named - basein R bcf).
- beta_mu, beta_taugvar
- The depth exponent of the tree spawn probability \(\beta_*\) (named - powerin R bcf).
- lambda_mu, lambda_taugvar
- The prior standard deviation \(\lambda_*\). 
- z_0gvar
- The treatment coding parameter. 
- fitempbayes_fit
- The hyperparameters fit object. 
 
 - Methods - gp(*[, hp, z, x_mu, x_tau, pihat, x_aux, ...])- Create a Gaussian process with the fitted hyperparameters. - data(*[, hp, rng])- Get the data to be passed to - GP.predon a GP object returned by- gp.- pred(*[, hp, error, z, x_mu, x_tau, pihat, ...])- Predict the transformed outcome at given locations. - from_data(y, *[, hp, rng])- Transforms outcomes \(y\) to the regression variable \(\eta\). - to_data(eta, *[, hp, rng])- Convert the regression variable \(\eta\) to outcomes \(y\). - See also - Notes - The regression model is: \[\begin{split}\eta_i = g(y_i; \ldots) &= m + {} \\ &\phantom{{}={}} + \lambda_\mu \mu(\mathbf x^\mu_i, \hat\pi_i?) + {} \\ &\phantom{{}={}} + \lambda_\tau \tau(\mathbf x^\tau_i, \hat\pi_i?) (z_i - z_0) + {} \\ &\phantom{{}={}} + \mathrm{aux}(i, z_i, \mathbf x^\mu_i, \mathbf x^\tau_i, \hat\pi_i, \mathbf x^\text{aux}_i) + {} \\ &\phantom{{}={}} + \varepsilon_i, \\ \varepsilon_i &\sim N(0, \sigma^2 / w_i), \\ m &\sim N(0, 1), \\ \log \sigma^2 &\sim N(\log\bar w, 4), \\ \lambda_\mu &\sim \mathrm{HalfCauchy}(2), \\ \lambda_\tau &\sim \mathrm{HalfNormal}(1.48), \\ \mu &\sim \mathrm{GP}(0, \mathrm{BART}(\alpha_\mu, \beta_\mu) ), \\ \tau &\sim \mathrm{GP}(0, \mathrm{BART}(\alpha_\tau, \beta_\tau) ), \\ \mathrm{aux} & \sim \mathrm{GP}(0, \text{<user defined>}), \\ \alpha_\mu, \alpha_\tau &\sim \mathrm{Beta}(2, 1), \\ \beta_\mu, \beta_\tau &\sim \mathrm{InvGamma}(1, 1), \\ z_0 &\sim U(0, 1),\end{split}\]- To make the inference, \((\mu, \tau, \boldsymbol\varepsilon, m, \mathrm{aux})\) are marginalized analytically, and the marginal posterior mode of \((\sigma, \lambda_*, \alpha_*, \beta_*, z_0, \ldots)\) is found by numerical minimization, after transforming them to express their prior as a Gaussian copula. Their marginal posterior covariance matrix is estimated with an approximation of the hessian inverse. See - empbayes_fitand use the parameter- fitkwto customize this procedure.- The tree splitting grid of the BART kernel is set using quantiles of the observed covariates. This corresponds to settings - usequants=True,- numcut=infin the R packages BayesTree and BART. Use the parameters- kernelkw_muand- kernelkw_tauto customize the grids.- The difference between the regression model evaluated at \(Z=1\) vs. \(Z=0\) can be interpreted as the causal effect \(Z \rightarrow Y\) if the unconfoundedness assumption is made: \[\{Y(Z=0), Y(Z=1)\} \perp\!\!\!\perp Z \mid X.\]- In practical terms, this holds when: - \(X\) are pre-treatment variables, i.e., they represent quantities causally upstream of \(Z\). 
- \(X\) are sufficient to adjust for all common causes of \(Z\) and \(Y\), such that the only remaining difference is the causal effect and not just a correlation. 
 - Here \(X\) consists in - x_tau,- x_muand- x_aux. However these arrays may also be used to pass “technical” values used to set up the model, that do not satisfy the uncounfoundedness assumption, if you know what you are doing.- References [1]- P. Richard Hahn, Jared S. Murray, Carlos M. Carvalho “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion),” Bayesian Analysis 15(3), 965-1056, September 2020, https://doi.org/10.1214/19-BA1195 [2]- Yeo, In-Kwon; Johnson, Richard A. (2000). “A New Family of Power Transformations to Improve Normality or Symmetry”. Biometrika. 87 (4): 954–959. https://doi.org/10.1093/biomet/87.4.954 - data(*, hp='map', rng=None)[source]¶
- Get the data to be passed to - GP.predon a GP object returned by- gp.- Parameters:
- hpstr or dict
- The hyperparameters to use. If - 'map'(default), use the marginal maximum a posteriori. If- 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.
- rngnumpy.random.Generator, optional
- Random number generator, used if - hp == 'sample'.
 
- Returns:
- datadict
- A dictionary representing - etain the format required by the- GP.predmethod.
 
 
 - from_data(y, *, hp='map', rng=None)[source]¶
- Transforms outcomes \(y\) to the regression variable \(\eta\). - Parameters:
- y(n,) array
- Outcomes. 
- hpstr or dict
- The hyperparameters to use. If - 'map'(default), use the marginal maximum a posteriori. If- 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.
- rngnumpy.random.Generator, optional
- Random number generator, used if - hp == 'sample'.
 
- Returns:
- eta(n,) array
- Transformed outcomes. 
 
 
 - gp(*, hp='map', z=None, x_mu=None, x_tau=None, pihat=None, x_aux=None, weights=None, rng=None)[source]¶
- Create a Gaussian process with the fitted hyperparameters. - Parameters:
- hpstr or dict
- The hyperparameters to use. If - 'map'(default), use the marginal maximum a posteriori. If- 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.
- z(m,) array, series or dataframe, optional
- Treatment status at test points. If specified, also - x_mu,- pihat,- x_tauand- x_aux(the latter two if and only also specified at initialization) must be specified.
- x_mu(m, p) array, series or dataframe, optional
- Control model covariates at test points. 
- x_tau(m, q) array, series or dataframe, optional
- Moderating model covariates at test points. 
- pihat(m,) array, series or dataframe, optional
- Estimated propensity score at test points. 
- x_aux(m, k) array, series or dataframe, optional
- Additional covariates for the - 'aux'process.
- weights(m,) array, series or dataframe, optional
- Weights for the error variance on the test points. 
- rngnumpy.random.Generator, optional
- Random number generator, used if - hp == 'sample'.
 
- Returns:
- gpGP
- A centered Gaussian process object. To add the mean, use the - mattribute of the- bcfobject. The keys of the GP are- '@mean',- '@noise', and- '@', where the “@” stands either for ‘train’ or ‘test’, and @ = @mean + @noise.- This Gaussian process is defined on the transformed data - eta.
 
 
 - pred(*, hp='map', error=False, z=None, x_mu=None, x_tau=None, pihat=None, x_aux=None, weights=None, transformed=True, samples=None, gvars=False, rng=None)[source]¶
- Predict the transformed outcome at given locations. - Parameters:
- hpstr or dict
- The hyperparameters to use. If - 'map'(default), use the marginal maximum a posteriori. If- 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.
- errorbool, default False
- If - False, make a prediction for the latent mean. If- True, add the error term.
- z(m,) array, series or dataframe, optional
- Treatment status at test points. If specified, also - x_mu,- pihat,- x_tauand- x_aux(the latter two if and only also specified at initialization) must be specified.
- x_mu(m, p) array, series or dataframe, optional
- \(\mu\) model covariates at test points. 
- x_tau(m, q) array, series or dataframe, optional
- \(\tau\) model covariates at test points. 
- pihat(m,) array, series or dataframe, optional
- Estimated propensity score at test points. 
- x_aux(m, k) array, series or dataframe, optional
- Additional covariates for the - 'aux'process at test points.
- weights(m,) array, series or dataframe, optional
- Weights for the error variance on the test points. 
- transformedbool, default True
- If - True, return the prediction on the transformed outcome \(\eta\), else the observable outcome \(y\).
- samplesint, optional
- If specified, indicates the number of samples to take from the posterior. If not, return the mean and covariance matrix of the posterior. 
- gvarsbool, default False
- If - True, return the mean and covariance matrix of the posterior as an array of- GVarvariables.
- rngnumpy.random.Generator, optional
- Random number generator, used if - hp == 'sample'or- samplesis not- None.
 
- Returns:
- If samplesisNoneandgvarsisFalse(default):
- mean, cov(m,) and (m, m) arrays
- The mean and covariance matrix of the Normal posterior distribution over the regression function or \(\eta\) at the specified locations. 
- If samplesisNoneandgvarsisTrue:
- out(m,) array of gvars
- The same distribution represented as an array of - GVarobjects.
- If samplesis an integer:
- sample(samples, m) array
- Posterior samples over either the regression function, \(\eta\), or \(y\). 
 
- If 
 
 - to_data(eta, *, hp='map', rng=None)[source]¶
- Convert the regression variable \(\eta\) to outcomes \(y\). - Parameters:
- eta(n,) array
- Transformed outcomes. 
- hpstr or dict
- The hyperparameters to use. If - 'map'(default), use the marginal maximum a posteriori. If- 'sample', sample hyperparameters from the posterior. If a dict, use the given hyperparameters.
- rngnumpy.random.Generator, optional
- Random number generator, used if - hp == 'sample'.
 
- Returns:
- y(n,) array
- Outcomes.