Data processing¶
- bartz.prepcovars.quantilized_splits_from_matrix(X, max_bins)[source]¶
Determine bins that make the distribution of each predictor uniform.
- Parameters:
- Xarray (p, n)
A matrix with
p
predictors andn
observations.- max_binsint
The maximum number of bins to produce.
- Returns:
- splitsarray (p, m)
A matrix containing, for each predictor, the boundaries between bins.
m
ismin(max_bins, n) - 1
, which is an upper bound on the number of splits. Each predictor may have a different number of splits; unused values at the end of each row are filled with the maximum value representable in the type ofX
.- max_splitarray (p,)
The number of actually used values in each row of
splits
.
- bartz.prepcovars.uniform_splits_from_matrix(X, num_bins)[source]¶
Make an evenly spaced binning grid.
- Parameters:
- Xarray (p, n)
A matrix with
p
predictors andn
observations.- num_binsint
The number of bins to produce.
- Returns:
- splitsarray (p, num_bins - 1)
A matrix containing, for each predictor, the boundaries between bins. The excluded endpoints are the minimum and maximum value in each row of
X
.- max_splitarray (p,)
The number of cutpoints in each row of
splits
, i.e.,num_bins - 1
.
- bartz.prepcovars.bin_predictors(X, splits, **kw)[source]¶
Bin the predictors according to the given splits.
A value
x
is mapped to bini
iffsplits[i - 1] < x <= splits[i]
.- Parameters:
- Xarray (p, n)
A matrix with
p
predictors andn
observations.- splitsarray (p, m)
A matrix containing, for each predictor, the boundaries between bins.
m
is the maximum number of splits; each row may have shorter actual length, marked by padding unused locations at the end of the row with the maximum value allowed by the type.- **kwdict
Additional arguments are passed to
jax.numpy.searchsorted
.
- Returns:
- X_binnedint array (p, n)
A matrix with
p
predictors andn
observations, where each predictor has been replaced by the index of the bin it falls into.