Tree manipulation

Functions to create and manipulate binary trees.

A tree is represented with arrays as a heap. The root node is at index 1. The children nodes of a node at index \(i\) are at indices \(2i\) (left child) and \(2i + 1\) (right child). The array element at index 0 is unused.

A decision tree is represented by tree arrays: ‘leaf’, ‘var’, and ‘split’.

The ‘leaf’ array contains the values in the leaves.

The ‘var’ array contains the axes along which the decision nodes operate.

The ‘split’ array contains the decision boundaries. The boundaries are open on the right, i.e., a point belongs to the left child iff x < split. Whether a node is a leaf is indicated by the corresponding ‘split’ element being 0.

Since the nodes at the bottom can only be leaves and not decision nodes, the ‘var’ and ‘split’ arrays have half the length of the ‘leaf’ array.

bartz.grove.make_tree(depth, dtype)[source]

Make an array to represent a binary tree.

Parameters:
depthint

The maximum depth of the tree. Depth 1 means that there is only a root node.

dtypedtype

The dtype of the array.

Returns:
treearray

An array of zeroes with shape (2 ** depth,).

bartz.grove.tree_depth(tree)[source]

Return the maximum depth of a tree.

Parameters:
treearray

A tree created by make_tree. If the array is ND, the tree structure is assumed to be along the last axis.

Returns:
depthint

The maximum depth of the tree.

bartz.grove.traverse_tree(x, var_tree, split_tree)[source]

Find the leaf where a point falls into.

Parameters:
xarray (p,)

The coordinates to evaluate the tree at.

var_treearray (2 ** (d - 1),)

The decision axes of the tree.

split_treearray (2 ** (d - 1),)

The decision boundaries of the tree.

Returns:
indexint

The index of the leaf.

bartz.grove.traverse_forest(X, var_trees, split_trees)[source]

Find the leaves where points fall into.

Parameters:
Xarray (p, n)

The coordinates to evaluate the trees at.

var_treesarray (m, 2 ** (d - 1))

The decision axes of the trees.

split_treesarray (m, 2 ** (d - 1))

The decision boundaries of the trees.

Returns:
indicesarray (m, n)

The indices of the leaves.

bartz.grove.evaluate_forest(X, leaf_trees, var_trees, split_trees, dtype=None, sum_trees=True)[source]

Evaluate a ensemble of trees at an array of points.

Parameters:
Xarray (p, n)

The coordinates to evaluate the trees at.

leaf_treesarray (m, 2 ** d)

The leaf values of the tree or forest. If the input is a forest, the first axis is the tree index, and the values are summed.

var_treesarray (m, 2 ** (d - 1))

The decision axes of the trees.

split_treesarray (m, 2 ** (d - 1))

The decision boundaries of the trees.

dtypedtype, optional

The dtype of the output. Ignored if sum_trees is False.

sum_treesbool, default True

Whether to sum the values across trees.

Returns:
outarray (n,) or (m, n)

The (sum of) the values of the trees at the points in X.

bartz.grove.is_actual_leaf(split_tree, *, add_bottom_level=False)[source]

Return a mask indicating the leaf nodes in a tree.

Parameters:
split_treeint array (2 ** (d - 1),)

The splitting points of the tree.

add_bottom_levelbool, default False

If True, the bottom level of the tree is also considered.

Returns:
is_actual_leafbool array (2 ** (d - 1) or 2 ** d,)

The mask indicating the leaf nodes. The length is doubled if add_bottom_level is True.

bartz.grove.is_leaves_parent(split_tree)[source]

Return a mask indicating the nodes with leaf (and only leaf) children.

Parameters:
split_treeint array (2 ** (d - 1),)

The decision boundaries of the tree.

Returns:
is_leaves_parentbool array (2 ** (d - 1),)

The mask indicating which nodes have leaf children.

bartz.grove.tree_depths(tree_length)[source]

Return the depth of each node in a binary tree.

Parameters:
tree_lengthint

The length of the tree array, i.e., 2 ** d.

Returns:
deptharray (tree_length,)

The depth of each node. The root node (index 1) has depth 0. The depth is the position of the most significant non-zero bit in the index. The first element (the unused node) is marked as depth 0.