BART vectoriZed¶
A branchless vectorized implementation of Bayesian Additive Regression Trees (BART) in JAX.
BART is a nonparametric Bayesian regression technique. Given predictors \(X\) and responses \(y\), BART finds a function to predict \(y\) given \(X\). The result of the inference is a sample of possible functions, representing the uncertainty over the determination of the function.
This Python module provides an implementation of BART that runs on GPU, to process large datasets faster. It is also good on CPU. Most other implementations of BART are for R, and run on CPU only.
On CPU, bartz runs at the speed of dbarts (the fastest implementation I know of), but using half the memory. On GPU, the speed premium depends on sample size; with 50000 datapoints and 5000 trees, on an Nvidia Tesla V100 GPU it’s 12 times faster than a single Apple M1 CPU core, and this factor is linearly proportional to the number of datapoints.
The maximum practically realizable speedup is currently 200x.
Use this Colab notebook as a starting point to use a GPU if you don’t have your own GPU.
Links¶
Other BART packages¶
stochtree C++ library with R and Python bindings taylored to researchers who want to make their own BART variants
bnptools Feature-rich R packages for BART and some variants
dbarts Fast R package
bartMachine Fast R package, supports missing predictors imputation
SoftBART R package with a smooth version of BART
bcf R package for a version of BART for causal inference
flexBART Fast R package, supports categorical predictors
flexBCF R package, version of bcf optimized for large datasets
XBART R/Python package, XBART is a faster variant of BART
BART R package, BART warm-started with XBART
BayesTree R package, original BART implementation
bartCause R package, pre-made BART-based workflows for causal inference
lsqfitgp implements the infinite trees limit of BART
BART-BMA (superseded by bartBMAnew)