Tree manipulation¶
Functions to create and manipulate binary trees.
A tree is represented with arrays as a heap. The root node is at index 1. The children nodes of a node at index \(i\) are at indices \(2i\) (left child) and \(2i + 1\) (right child). The array element at index 0 is unused.
A decision tree is represented by tree arrays: ‘leaf’, ‘var’, and ‘split’.
The ‘leaf’ array contains the values in the leaves.
The ‘var’ array contains the axes along which the decision nodes operate.
The ‘split’ array contains the decision boundaries. The boundaries are open on the right, i.e., a point belongs to the left child iff x < split. Whether a node is a leaf is indicated by the corresponding ‘split’ element being 0. Unused nodes also have split set to 0.
Since the nodes at the bottom can only be leaves and not decision nodes, the ‘var’ and ‘split’ arrays have half the length of the ‘leaf’ array.
- bartz.grove.make_tree(depth, dtype)[source]¶
Make an array to represent a binary tree.
- Parameters:
depth (int) – The maximum depth of the tree. Depth 1 means that there is only a root node.
dtype (dtype) – The dtype of the array.
- Returns:
tree (array) – An array of zeroes with shape (2 ** depth,).
- bartz.grove.tree_depth(tree)[source]¶
Return the maximum depth of a tree.
- Parameters:
tree (array) – A tree created by
make_tree
. If the array is ND, the tree structure is assumed to be along the last axis.- Returns:
depth (int) – The maximum depth of the tree.
- bartz.grove.traverse_tree(x, var_tree, split_tree)[source]¶
Find the leaf where a point falls into.
- Parameters:
x (array (p,)) – The coordinates to evaluate the tree at.
var_tree (array (2 ** (d - 1),)) – The decision axes of the tree.
split_tree (array (2 ** (d - 1),)) – The decision boundaries of the tree.
- Returns:
index (int) – The index of the leaf.
- bartz.grove.traverse_forest(X, var_trees, split_trees)[source]¶
Find the leaves where points fall into.
- Parameters:
X (array (p, n)) – The coordinates to evaluate the trees at.
var_trees (array (m, 2 ** (d - 1))) – The decision axes of the trees.
split_trees (array (m, 2 ** (d - 1))) – The decision boundaries of the trees.
- Returns:
indices (array (m, n)) – The indices of the leaves.
- bartz.grove.evaluate_forest(X, leaf_trees, var_trees, split_trees, dtype=None, sum_trees=True)[source]¶
Evaluate a ensemble of trees at an array of points.
- Parameters:
X (array (p, n)) – The coordinates to evaluate the trees at.
leaf_trees (array (m, 2 ** d)) – The leaf values of the tree or forest. If the input is a forest, the first axis is the tree index, and the values are summed.
var_trees (array (m, 2 ** (d - 1))) – The decision axes of the trees.
split_trees (array (m, 2 ** (d - 1))) – The decision boundaries of the trees.
dtype (dtype, optional) – The dtype of the output. Ignored if
sum_trees
isFalse
.sum_trees (bool, default True) – Whether to sum the values across trees.
- Returns:
out (array (n,) or (m, n)) – The (sum of) the values of the trees at the points in
X
.
- bartz.grove.is_actual_leaf(split_tree, *, add_bottom_level=False)[source]¶
Return a mask indicating the leaf nodes in a tree.
- Parameters:
split_tree (int array (2 ** (d - 1),)) – The splitting points of the tree.
add_bottom_level (bool, default False) – If True, the bottom level of the tree is also considered.
- Returns:
is_actual_leaf (bool array (2 * (d - 1) or 2 ** d,)*) – The mask indicating the leaf nodes. The length is doubled if
add_bottom_level
is True.
- bartz.grove.is_leaves_parent(split_tree)[source]¶
Return a mask indicating the nodes with leaf (and only leaf) children.
- Parameters:
split_tree (int array (2 ** (d - 1),)) – The decision boundaries of the tree.
- Returns:
is_leaves_parent (bool array (2 * (d - 1),)*) – The mask indicating which nodes have leaf children.
- bartz.grove.tree_depths(tree_length)[source]¶
Return the depth of each node in a binary tree.
- Parameters:
tree_length (int) – The length of the tree array, i.e., 2 ** d.
- Returns:
depth (array (tree_length,)) – The depth of each node. The root node (index 1) has depth 0. The depth is the position of the most significant non-zero bit in the index. The first element (the unused node) is marked as depth 0.
- bartz.grove.is_used(split_tree)[source]¶
Return a mask indicating the used nodes in a tree.
- Parameters:
split_tree (int array (2 ** (d - 1),)) – The decision boundaries of the tree.
- Returns:
is_used (bool array (2 * d,)*) – A mask indicating which nodes are actually used.
- bartz.grove.forest_fill(split_trees)[source]¶
Return the fraction of used nodes in a set of trees.
- Parameters:
split_trees (array (m, 2 ** (d - 1),)) – The decision boundaries of the trees.
- Returns:
fill (float) – The number of tree nodes in the forest over the maximum number that could be stored in the arrays.