In a BST, all values descending on the left side of a node are less than (or equal to, see later) the node itself. Similarly, all values descending on the right side of a node are greater than (or equal to) that node value(a).
Some BSTs may choose to allow duplicate values, hence the "or equal to" qualifiers above. The following example may clarify:
14
/ \
13 22
/ / \
1 16 29
/ \
28 29
This shows a BST that allows duplicates(b) - you can see that to find a value, you start at the root node and go down the left or right subtree depending on whether your search value is less than or greater than the node value.
This can be done recursively with something like:
def hasVal (node, srchval):
if node == NULL:
return false
if node.val == srchval:
return true
if node.val > srchval:
return hasVal (node.left, srchval)
return hasVal (node.right, srchval)
and calling it with:
foundIt = hasVal (rootNode, valToLookFor)
Duplicates add a little complexity since you may need to keep searching once you've found your value, for other nodes of the same value. Obviously that doesn't matter for hasVal
since it doesn't matter how many there are, just whether at least one exists. It will however matter for things like countVal
, since it needs to know how many there are.
(a) You could actually sort them in the opposite direction should you so wish provided you adjust how you search for a specific key. A BST need only maintain some sorted order, whether that's ascending or descending (or even some weird multi-layer-sort method like all odd numbers ascending, then all even numbers descending) is not relevant.
(b) Interestingly, if your sorting key uses the entire value stored at a node (so that nodes containing the same key have no other extra information to distinguish them), there can be performance gains from adding a count to each node, rather than allowing duplicate nodes.
The main benefit is that adding or removing a duplicate will simply modify the count rather than inserting or deleting a new node (an action that may require re-balancing the tree).
So, to add an item, you first check if it already exists. If so, just increment the count and exit. If not, you need to insert a new node with a count of one then rebalance.
To remove an item, you find it then decrement the count - only if the resultant count is zero do you then remove the actual node from the tree and rebalance.
Searches are also quicker given there are fewer nodes but that may not be a large impact.
For example, the following two trees (non-counting on the left, and counting on the right) would be equivalent (in the counting tree, i.c
means c
copies of item i
):
__14__ ___22.2___
/ \ / \
14 22 7.1 29.1
/ \ / \ / \ / \
1 14 22 29 1.1 14.3 28.1 30.1
\ / \
7 28 30
Removing the leaf-node 22
from the left tree would involve rebalancing (since it now has a height differential of two) the resulting 22-29-28-30
subtree such as below (this is one option, there are others that also satisfy the "height differential must be zero or one" rule):
\ \
22 29
\ / \
29 --> 28 30
/ \ /
28 30 22
Doing the same operation on the right tree is a simple modification of the root node from 22.2
to 22.1
(with no rebalancing required).