12.4 ★ Randomly built binary search trees

We have shown that all the basic operations on a binary search tree run in O(h) time, where h is the height of the tree. The height of a binary search tree varies, however, as items are inserted and deleted. If, for example, the items are inserted in strictly increasing order, the tree will be a chain with height n - 1. On the other hand, Exercise B.5-4 shows that h≥ ⌊lg n⌋. As with quicksort, we can show that the behavior of the average case is much closer to the best case than the worst case.

Unfortunately, little is known about the average height of a binary search tree when both insertion and deletion are used to create it. When the tree is created by insertion alone, the analysis becomes more tractable. Let us therefore define a randomly built binary search tree on n keys as one that arises from inserting the keys in random order into an initially empty tree, where each of the n! permutations of the input keys is equally likely. (Exercise 12.4-3 asks you to show that this notion is different from assuming that every binary search tree on n keys is equally likely.) In this section, we shall show that the expected height of a randomly built binary search tree on n keys is O(lg n). We assume that all keys are distinct.

We start by defining three random variables that help measure the height of a randomly built binary search tree. We denote the height of a randomly built binary search on n keys by X_n, and we define the exponential height . When we build a binary search tree on n keys, we choose one key as that of the root, and we let R_n denote the random variable that holds this key's rank within the set of n keys. The value of R_n is equally likely to be any element of the set {1, 2, ..., n}. If R_n = i, then the left subtree of the root is a randomly built binary search tree on i - 1 keys, and the right subtree is a randomly built binary search tree on n - i keys. Because the height of a binary tree is one more than the larger of the heights of the two subtrees of the root, the exponential height of a binary tree is twice the larger of the exponential heights of the two subtrees of the root. If we know that R_n = i, we therefore have that

Y_n = 2 · max(Y_i-1, Y_n-i).

As base cases, we have Y₁ = 1, because the exponential height of a tree with 1 node is 2⁰ = 1 and, for convenience, we define Y₀ = 0.

Next we define indicator random variables Z_n,1, Z_n,2, ..., Z_n,n, where

Z_n,i = I{R_n = i}.

Because R_n is equally likely to be any element of {1, 2, ..., n}, we have that Pr{R_n = i} = 1/n for i = 1,2, ..., n, and hence, by Lemma 5.1,

(12.1)

for i = 1, 2, ..., n. Because exactly one value of Z_n,i is 1 and all others are 0, we also have

We will show that E[Y_n] is polynomial in n, which will ultimately imply that E[X_n] = O(lg n).

The indicator random variable Z_n,i = I{R_n = i} is independent of the values of Y_i-1 and Y_n-i. Having chosen R_n = i, the left subtree, whose exponential height is Y_i-1, is randomly built on the i - 1 keys whose ranks are less than i. This subtree is just like any other randomly built binary search tree on i - 1 keys. Other than the number of keys it contains, this subtree's structure is not affected at all by the choice of R_n = i; hence the random variables Y_i-1 and Z_n,i are independent. Likewise, the right subtree, whose exponential height is Y_n-i, is randomly built on the n - i keys whose ranks are greater than i. Its structure is independent of the value of R_n, and so the random variables Y_n-i and Z_n,i are independent. Hence,

Each term E[Y₀], E[Y₁], ..., E[Y_n-1] appears twice in the last summation, once as E[Y_i-1] and once as E[Y_n-i], and so we have the recurrence

(12.2)

Using the substitution method, we will show that for all positive integers n, the recurrence (12.2) has the solution

In doing so, we will use the identity

(12.3)

(Exercise 12.4-1 asks you to prove this identity.)

For the base case, we verify that the bound

holds. For the substitution, we have that

We have bounded E[Y_n], but our ultimate goal is to bound E[X_n]. As Exercise 12.4-4 asks you to show, the function f(x) = 2^x is convex (see page 1109). Therefore, we can apply Jensen's inequality (C.25), which says that

to derive that

Taking logarithms of both sides gives E[X_n] = O(lg n). Thus, we have proven the following:

Theorem 12.4

The expected height of a randomly built binary search tree on n keys is O(lg n).

Exercises 12.4-1

Prove equation (12.3).

Exercises 12.4-2

Describe a binary search tree on n nodes such that the average depth of a node in the tree is Θ(lg n) but the height of the tree is w(lg n). Give an asymptotic upper bound on the height of an n-node binary search tree in which the average depth of a node is Θ(lg n).

Exercises 12.4-3

Show that the notion of a randomly chosen binary search tree on n keys, where each binary search tree of n keys is equally likely to be chosen, is different from the notion of a randomly built binary search tree given in this section. (Hint: List the possibilities when n = 3.)

Exercises 12.4-4

Show that the function f(x) = 2^x is convex.

Exercises 12.4-5: ★

Consider RANDOMIZED-QUICKSORT operating on a sequence of n input numbers. Prove that for any constant k > 0, all but O(1/n^k) of the n! input permutations yield an O(n lg n) running time.

Problems 12-1: Binary search trees with equal keys

Equal keys pose a problem for the implementation of binary search trees.

What is the asymptotic performance of TREE-INSERT when used to insert n items with identical keys into an initially empty binary search tree?

We propose to improve TREE-INSERT by testing before line 5 whether or not key[z] = key[x] and by testing before line 11 whether or not key[z] = key[y].If equality holds, we implement one of the following strategies. For each strategy, find the asymptotic performance of inserting n items with identical keys into an initially empty binary search tree. (The strategies are described for line 5, in which we compare the keys of z and x. Substitute y for x to arrive at the strategies for line 11.)

Keep a boolean flag b[x] at node x, and set x to either left[x] or right[x] based on the value of b[x], which alternates between FALSE and TRUE each time x is visited during insertion of a node with the same key as x.
Keep a list of nodes with equal keys at x, and insert z into the list.
Randomly set x to either left[x] or right[x]. (Give the worst-case performance

and informally derive the average-case performance.)

Problems 12-2: Radix trees

Given two strings a = a₀a₁...a_p and b = b₀b₁...b_q, where each a_i and each b_j is in some ordered set of characters, we say that string a is lexicographically less than string b if either

there exists an integer j, where 0 ≤ j ≤ min(p, q), such that a_i = b_i for all i = 0, 1, ..., j - 1 and a_j < b_j, or
p < q and a_i = b_i for all i = 0, 1, ..., p.

For example, if a and b are bit strings, then 10100 < 10110 by rule 1 (letting j = 3) and 10100 < 101000 by rule 2. This is similar to the ordering used in English-language dictionaries.

The radix tree data structure shown in Figure 12.5 stores the bit strings 1011, 10, 011, 100, and 0. When searching for a key a = a₀a₁...a_p, we go left at a node of depth i if a_i = 0 and right if a_i = 1. Let S be a set of distinct binary strings whose lengths sum to n. Show how to use a radix tree to sort S lexicographically in Θ(n) time. For the example in Figure 12.5, the output of the sort should be the sequence 0, 011, 10, 100, 1011.

Figure 12.5: A radix tree storing the bit strings 1011, 10, 011, 100, and 0. Each node's key can be determined by traversing the path from the root to that node. There is no need, therefore, to store the keys in the nodes; the keys are shown here for illustrative purposes only. Nodes are heavily shaded if the keys corresponding to them are not in the tree; such nodes are present only to establish a path to other nodes.

Problems 12-3: Average node depth in a randomly built binary search tree

In this problem, we prove that the average depth of a node in a randomly built binary search tree with n nodes is O(lg n). Although this result is weaker than that of Theorem 12.4, the technique we shall use reveals a surprising similarity between the building of a binary search tree and the running of RANDOMIZED-QUICKSORT from Section 7.3.

We define the total path length P(T) of a binary tree T as the sum, over all nodes x in T , of the depth of node x, which we denote by d(x, T).

Argue that the average depth of a node in T is

Thus, we wish to show that the expected value of P(T) is O(n lg n).

Let T_L and T_R denote the left and right subtrees of tree T, respectively. Argue that if T has n nodes, then

P(T) = P(T_L) + P(T_R) + n - 1.
Let P(n) denote the average total path length of a randomly built binary search tree with n nodes. Show that
Show that P(n) can be rewritten as
Recalling the alternative analysis of the randomized version of quicksort given in Problem 7-2, conclude that P(n) = O(n lg n).

At each recursive invocation of quicksort, we choose a random pivot element to partition the set of elements being sorted. Each node of a binary search tree partitions the set of elements that fall into the subtree rooted at that node.

Describe an implementation of quicksort in which the comparisons to sort a set of elements are exactly the same as the comparisons to insert the elements into a binary search tree. (The order in which comparisons are made may differ, but the same comparisons must be made.)

Problems 12-4: Number of different binary trees

Let b_n denote the number of different binary trees with n nodes. In this problem, you will find a formula for b_n, as well as an asymptotic estimate.

Show that b₀ = 1 and that, for n ≥ 1,
Referring to Problem 4-5 for the definition of a generating function, let B(x) be the generating function

Show that B(x) = xB(x)² + 1, and hence one way to express B(x) in closed form is

The Taylor expansion of f(x) around the point x = a is given by

where f^(k)(x) is the kth derivative of f evaluated at x.

Show that

(the nth Catalan number) by using the Taylor expansion of around x = 0. (If you wish, instead of using the Taylor expansion, you may use the generalization of the binomial expansion (C.4) to nonintegral exponents n, where for any real number n and for any integer k, we interpret to be n(n - 1) (n - k + 1)/k! if k ≥ 0, and 0 otherwise.)
Show that