Changelog¶
0.7.0 Every woman knows the pain of deciding which predictors to throw away when her design matrix is full to the brim and the last season brought out new lagged features. Our 100% money-back guarantee Bayesian variable selection Dirichlet prior will pick out the best predictors for you automatically while the MCMC is running! (2025-07-07)¶
The highlight of this release is the implementation of variable selection.
Changes apparent through the
gbartinterface:Parameters
sparse,theta,rho,a,bto activate and configure variable selectionThe MCMC logging shows a
fillmetric which is how much the tree arrays are filled, to check the trees are not being constrained by the maximum depth limitParameter
xinfoto set manually the grid cutpoints for decision rulesFixed a stochastic bug with binary regression that would become likely with >1000 datapoints
Parameter
rm_constto decide how to handle “blocked” predictors that have no possible decision rulesThe defaults of parameters
ntreeandkeepeverynow depend on whether the regression is continuous or binary, as in the R package BARTNew attributes of
gbartobjects, matching those of R’sBART::gbart:prob_train,prob_test,prob_train_mean,prob_test_mean(for binary regression)sigmaincludes burn-in samples, andfirst_sigmais gonesigma_mean(the mean is only over kept samples)varcount,varcount_meanvarprob,varprob_mean
The
bartz.debugsubmodule is now officially public, the main functionality is:The class
debug_gbartthat adds some debugging methods togbarttrees_BART_to_bartzto read trees in the format of R’s BART packagesample_priorto sample from the BART prior
Changes to internals:
More typing in general
Changes to
run_mcmc:MCMC traces are dataclasses instead of dictionaries
Switch back to using only one callback instead of two
I realized that
jax.lax.condmakes the additional callback pointless. I previously had a cached heuristic to not uselax.condbecause it’s not efficiently vmappable, but for all practical uses in the MCMC it would be.
The callback accepts a jax random key, useful to implement additional MCMC steps
Simplified main/burn-in distinction for custom trace extractors
Changes to the MCMC internals:
min_points_per_decision_nodeintended as replacement tomin_points_per_leafmin_points_per_leafis still available to allow thegbartinterface to mimic R’s BARTThis different constraint is easier to take into account exactly in the Metropolis-Hastings ratio, while
min_points_per_leafleads to a deviation from the stated distribution
The tree structure MH should now match exactly the distributions written on paper, if
min_points_per_leafis not setThe tree structure MH never proposes zero-probability states, if
min_points_per_leafis not set
Valid usage should not produce infs or nans internally any more, so
jax.debug_infsandjax.debug_nanscan be used
0.6.0 bruv bernoulli got gauss beat any time (2025-05-29)¶
binary regression with probit link
allow to interrupt the MCMC with ^C
logging shows how much the tree heaps are filled; if it’s above 50% you should definitely increase the number of trees and/or the maximum depth
BART.gbart(..., run_mcmc_kw=dict(...))allows to pass additional arguments tomcmcloop.run_mcmcoption to disable logging
refactor internals
set offset in the MCMC state to avoid centering the responses
set leaf variance in the MCMC state to avoid rescaling responses
immutable dataclasses instead of dicts
improvements to
mcmcloop.run_mcmcsimpler to use signature with only three required parameters
the callback is allowed to carry a state and to modify the chain state (opt-in)
custom functions to change what is extracted from the state and put into the traces
two distinct callbacks, one invoked under jit and one out of it
more sophisticate default logging callback
complete type hints
improved documentation
jaxext.splitis a less error-prone alternative tojax.random.split
0.5.0 Our work promotes diversity in error variances by following heteroscedasticity best-practices such as multiplying the variance parameter by different factors (2025-05-16)¶
Heteroskedasticity with fixed weights.
The internal MCMC functions now follow the jax convention of having the
keyparameter first in the signature.Fixed a bug where the MCMC callback would hang indefinitely.
The library is now routinely tested with the least recent supported dependencies.
0.4.1 Told it was, nigh the end of times, version numbers of dependencies all would rise (2025-04-23)¶
Somehow 1 year went by before I had some time to spend on this software.
Somehow 1 year was sufficient to break most of my dependency specifications, despite having only 4 dependencies.
This release provides no new features, it’s only a quick fix to make bartz go well with the latest jax and numpy versions.
0.4.0 The real treasure was the Markov chain samples we made along the way (2024-04-16)¶
2x faster on GPU, due to parallelizing better the tree sampling step.
Uses less memory, now can do \(n=100\,000\) with \(10\,000\) trees on a V100. This was mostly an excessively large batch size for counting datapoints per leaf.
The Metropolis-Hastings ratio is saved only for the proposed move.
The grow and prune moves are merged into one object.
0.3.0¶
2-3x faster on CPU.
Uses less memory.
Add
initkwargument toBART.gbartfor advanced configuration of the MCMC initialization.Modified the automatic determination of
sigestinBART.gbartto match the one of the R package.Add
usequants=Falseoption toBART.gbart, which is now the default.New function
prepcovars.uniform_splits_from_matrix.Add
sum_trees=Falseoption togrove.evaluate_forestto evaluate separately each tree.Support non-batched arguments in
jaxext.autobatch.Fix a bug with empty arrays in
jaxext.autobatch.New option in
mcmcstep.initto save the acceptance ratios.Separate batching options for residuals and counts.
Sweeping changes to the tree move internals, more computations are parallel across trees.
Added support for
dbartsin the unit tests.
0.2.1¶
Fix a bug that prevented using bart in a compiled function.
0.2.0¶
Rename
bartz.BARTtobartz.BART.gbart.Expose submodule
bartz.jaxextwith auxiliary functions for jax.Shorter compilation time if no burnin or saved samples are drawn. This is useful when using the interface only to create the initial MCMC state, or when saving all samples to inspect the warm-up phase.
20x faster on GPU.
2x faster on CPU.
Use less temporary memory to quantilize covariates, avoiding out-of-memory errors on GPU.
0.1.0¶
Optimize the MCMC step to only traverse each tree once.
Now
bartzruns at the same speed as the R packageBART(tested at \(p=10\), \(n=100\ldots 10000\)).The MCMC functions are heavily changed, but the interface is the same.
0.0.1¶
BARThas attributesmaxdepth,sigest.Fix errors with scaling of noise variance prior.
Fix iteration report when
keepeveryis not 1.Lower required versions of dependencies to allow running on Colab.
0.0¶
First release.