Data processing

bartz.prepcovars.quantilized_splits_from_matrix(X, max_bins)[source]

Determine bins that make the distribution of each predictor uniform.

Parameters:
Xarray (p, n)

A matrix with p predictors and n observations.

max_binsint

The maximum number of bins to produce.

Returns:
splitsarray (p, m)

A matrix containing, for each predictor, the boundaries between bins. m is min(max_bins, n) - 1, which is an upper bound on the number of splits. Each predictor may have a different number of splits; unused values at the end of each row are filled with the maximum value representable in the type of X.

max_splitarray (p,)

The number of actually used values in each row of splits.

bartz.prepcovars.uniform_splits_from_matrix(X, num_bins)[source]

Make an evenly spaced binning grid.

Parameters:
Xarray (p, n)

A matrix with p predictors and n observations.

num_binsint

The number of bins to produce.

Returns:
splitsarray (p, num_bins - 1)

A matrix containing, for each predictor, the boundaries between bins. The excluded endpoints are the minimum and maximum value in each row of X.

max_splitarray (p,)

The number of cutpoints in each row of splits, i.e., num_bins - 1.

bartz.prepcovars.bin_predictors(X, splits, **kw)[source]

Bin the predictors according to the given splits.

A value x is mapped to bin i iff splits[i - 1] < x <= splits[i].

Parameters:
Xarray (p, n)

A matrix with p predictors and n observations.

splitsarray (p, m)

A matrix containing, for each predictor, the boundaries between bins. m is the maximum number of splits; each row may have shorter actual length, marked by padding unused locations at the end of the row with the maximum value allowed by the type.

**kwdict

Additional arguments are passed to jax.numpy.searchsorted.

Returns:
X_binnedint array (p, n)

A matrix with p predictors and n observations, where each predictor has been replaced by the index of the bin it falls into.