2
votes

Here is an exercise from SICP (Structure and Interpretation of Computer Programs):

Exercise 2.63: Each of the following two procedures converts a binary tree to a list.

(define (tree->list-1 tree)
  (if (null? tree)
      '()
      (append 
       (tree->list-1 
        (left-branch tree))
       (cons (entry tree)
             (tree->list-1 
              (right-branch tree))))))

(define (tree->list-2 tree)
  (define (copy-to-list tree result-list)
    (if (null? tree)
        result-list
        (copy-to-list 
         (left-branch tree)
         (cons (entry tree)
               (copy-to-list 
                (right-branch tree)
                result-list)))))
  (copy-to-list tree '()))

... 2. Do the two procedures have the same order of growth in the number of steps required to convert a balanced tree with n elements to a list? If not, which one grows more slowly?

Looking at both procedures and without doing any computations for the order of growth, all elementary operations of tree->list-2 are constant, while one of the operation in tree->list-1, append, is not. So it is very clear that tree->list-2 grows more slowly than tree->list-1.

Now, although the exercise did not specifically asks us to do so, I would like to find the order of growth in the number of steps of tree->list-1. The following is my attempt.

The append procedure is:

(define (append list1 list2)
  (if (null? list1)
      list2
      (cons (car list1) 
            (append (cdr list1) 
                    list2))))

From the definition, the order of growth in the number of steps grows as theta(l1) where l1 is the number of elements in the first list. If two lists has the same length, then the order of growth grows as theta(n/2) where n is the sum of the number of elements of both lists. Based on these, I try to calculate the order of growth of tree->list-1 like this:

Suppose append takes constant time (just for initial), then we will find that the order of growth of tree->list-1 grows as theta(n). Since append is the only procedure that is not constant, I believe I can safely ignore other elementary operations. With the real running time of append, I got the following observations.

Note: The trees in my observations are balanced. So I experimented with the running times by doubling the number of nodes (or incrementing the height of the tree).

0 nodes - constant
1 node - constant
3 nodes - 1 recursive call to append
7 nodes - 5 recursive calls to append (first 2 are from subtrees (above), 3 are from the left branch)
15 nodes - 17 recursive calls to append (10 are from subtrees (above), 7 are from the left branch)
31 nodes - 49 recursive calls to append (34 are from subtrees (above), 17 are from the left branch)
63 nodes - 129 recursive calls to append (98 are from subtrees (above), 31 are from the left branch) ...
n nodes - 2t+(n/2) where t is the number of steps of the subtrees and n is the number of nodes in the tree

My additional observations are: In a completely unbalanced binary tree: If all nodes are in the right branch, the number of steps grows as theta(n). If all nodes are in the left branch, the number of steps grows as something like (1+2+3+4+...+n-1) probably like n(n-1)/2 (I found this formula somewhere). Based on my data, the number of steps in a balanced binary tree grows somewhere in between.

Now, the sequence of number of steps as the number of nodes doubles is: (1, 5, 17, 49, 129). And they grow as (4, 12, 32, 80).

It seems that we get the number of steps of a balanced binary tree with n elements by: 2t+(n/2) where t is the number of steps of two subtrees and n is the number of nodes.

Now for the life of me, I cannot find the classification this order of growth belongs to (E.G. linear, logarithmic, quadratic, exponential) though I have confirmed that this order of growth grows faster than linear and grows slower than quadratic.

Is my data correct? Have I found the order of growth of tree->list-1 as n increases even though I cannot classify it?

2
Depends on tree balance. A fully unbalanced left-leaning tree will provoke quadratic behavior: every list construction step will scan the entire list built so far, and then add just one node (the list created for the entry, and its empty right subtree). If the tree is completely unbalanced but right-leaning, you have the optimal case: O(N). The left subtree at every node is empty, and so the append operations look like (append '() (cons entry ...)). Any reasonable implementation of append will efficiently return the right argument. The interesting case is then the balanced one in between. - Kaz
@Kaz Yeah, it is easy in the best and worst case. the one I am evaluating is the balanced case. I tried to determine the order of growth for a balanced tree and documented my attempt here and I would like to know if it is correct. - lightning_missile
@Kaz the second code will be linear no matter the tree's un/balancing, yes? - Will Ness
@WillNess I believe so. It perpetrates a O(N) traversal, over which it maintains an accumulator onto which it conses the nodes. If the tree right leaning, then the actions resemble those of a right-associative reduce. - Kaz

2 Answers

4
votes

If you use the definition from the book (1.2.3) then no, neither function has the same order of growth. The book requires a single function which, with the right scaling factors, can act as both an upper and lower limit for the procedure (for sufficiently large values of n).

I believe the order of growth for tree->list-1 is n*log(n).

If we treat your t as function giving the number of append steps your formular becomes

t(n) = (n-1)/2 + 2*t((n-1)/2)

using n/2 instead of (n-1)/2 for simplicity you formular approximates to:

t(n) = n/2 + 2*t(n/2)

using this simplified formular to calculate t(n/2) we get:

t(n/2) = (n/2)/2 + 2*t((n/2)/2)
       = n/4 + 2*t(n/4)

substituting this into our calculation of t(n):

t(n) = n/2 + 2*t(n/2)
     = n/2 + 2*(n/4 + 2*t(n/4))
     = n/2 + n/2 + 4*t(n/4)

repeating we get:

t(n) = n/2 + 2*t(n/2)
     = n/2 + n/2 + 4*t(n/4)
     = n/2 + n/2 + n/2 + 8*t(n/8)
     = n/2 + n/2 + n/2 + n/2 + 16*t(n/16)

I.e., we get a series containing n/2 repeated approximately log2(n) times, (the depth of the tree). that is n/2*log2(n) which has the same order as n*log(n).

This isn't a very good estimate when n is small but it does appear to work as n grows. The last column shows the error, as a proportion of the actual value, converging to zero (which I think is an equivalent definition).

depth   items           steps           n/2*log2(n)     |act-est|/act
1       1               0   
2       3               1               2               1.377
3       7               5               10              0.965
4       15              17              29              0.724
5       31              49              77              0.567
6       63              129             188             0.460
7       127             321             444             0.382
8       255             769             1,019           0.325
9       511             1,793           2,299           0.282
10      1,023           4,097           5,114           0.248
11      2,047           9,217           11,258          0.221
12      4,095           20,481          24,569          0.200
13      8,191           45,057          53,241          0.182
14      16,383          98,305          114,680         0.167
15      32,767          212,993         245,752         0.154
16      65,535          458,753         524,279         0.143
17      131,071         983,041         1,114,103       0.133
18      262,143         2,097,153       2,359,286       0.125
19      524,287         4,456,449       4,980,726       0.118
20      1,048,575       9,437,185       10,485,749      0.111
21      2,097,151       19,922,945      22,020,085      0.105
22      4,194,303       41,943,041      46,137,332      0.100
23      8,388,607       88,080,385      96,468,980      0.095
24      16,777,215      184,549,377     201,326,579     0.091
25      33,554,431      385,875,969     419,430,387     0.087
26      67,108,863      805,306,369     872,415,218     0.083
27      134,217,727     1,677,721,601   1,811,939,314   0.080
28      268,435,455     3,489,660,929   3,758,096,369   0.077
29      536,870,911     7,247,757,313   7,784,628,209   0.074
30      1,073,741,823   15,032,385,537  16,106,127,344  0.071
31      2,147,483,647   31,138,512,897  33,285,996,528  0.069
32      4,294,967,295   64,424,509,441  68,719,476,719  0.067
1
votes

My guess would be quadratic time worst case(1) / linear time best case(2) for the first code; linear time always, for the second.

(1) = left-degenerate tree (with all right branches of some bounded depth, like, no more than 1, or 2, say, so the tree is skewed to the left and all appends are re-traced over in a "triangular" fashion);

(2) = right-degenerate tree (like list; with all left branches of some bounded depth, so all appends take constant time).

The second code will be fastest for (1) and about twice slower, for (2), because it will need to grow the stack first and then unwind it, whereas in the (1) case the stack will be of constant depth.

So your analysis is correct (except that it's not theta(n) for the first code on balanced trees, but n log n as the other answer shows; otherwise known as "linearithmic" time). And n(n-1)/2 is still considered quadratic, because constants and lower order terms can be ignored.