I'm using the hierarchy.to_tree
from scipy, and I'm interested in getting a print out of all root-to-leaf paths:
10.8.3
10.8.5
10.2.2
from scipy.cluster import hierarchy
a = hierarchy.to_tree(linkage_matrix)
I've given it a try
linkage_matrix
[[2, 3, 0.06571365, 2], [0, 10, 0.07951425, 2], [5, 6, 0.09405724, 2], [11, 13, 0.10182075, 3], [1, 12, 0.12900146, 3], [14, 15, 0.13498948, 5], [8, 9, 0.16806049, 2], [7, 16, 0.1887918, 4], [17, 19, 0.2236683, 9], [18, 20, 0.29471335, 11], [4, 21, 0.45878, 12]]
from scipy.cluster import hierarchy
a = hierarchy.to_tree(linkage_matrix)
def parse_tree(tree, path):
path = path
if path ==[]:
path.append(str(tree.get_id()))
if tree.is_leaf() is False:
left = tree.get_left()
left_id = str(left.get_id())
if left.is_leaf() is False:
path.append(left_id)
parse_tree(left, path)
path.pop()
else:
parse_tree(left, path)
right = tree.get_right()
right_id = str(right.get_id())
if right.is_leaf() is False:
path.append(right_id)
parse_tree(right, path)
else:
path.append(str(tree.get_id()))
print(('.').join(path))
path.pop()
parse_tree(a, [])
But obviously my logic is completely wrong, specifically it breaks down when the left node is not a leave (22.21.20.17.15.19.7 should be 22.21.20.19.7). I'm looking for new ways, I have not considered.
For the below example tree, all root-to-leaf paths are: