C.5 ★The tails of the binomial distribution

The probability of having at least, or at most, k successes in n Bernoulli trials, each with probability p of success, is often of more interest than the probability of having exactly k successes. In this section, we investigate the tails of the binomial distribution: the two regions of the distribution b(k; n, p) that are far from the mean np. We shall prove several important bounds on (the sum of all terms in) a tail.

We first provide a bound on the right tail of the distribution b(k; n, p). Bounds on the left tail can be determined by inverting the roles of successes and failures.

Theorem C.2

Consider a sequence of n Bernoulli trials, where success occurs with probability p. Let X be the random variable denoting the total number of successes. Then for 0 ≤ k ≤ n, the probability of at least k successes is

Proof For S ⊆ {1, 2,..., n}, we let A_S denote the event that the ith trial is a success for every i ∈ S. Clearly Pr {A_S} = p^k if |S| = k. We have

where the inequality follows from Boole's inequality (C.18).

The following corollary restates the theorem for the left tail of the binomial distribution. In general, we shall leave it to the reader to adapt the proofs from one tail to the other.

Corollary C.3

Consider a sequence of n Bernoulli trials, where success occurs with probability p. If X is the random variable denoting the total number of successes, then for

0 ≤ k ≤ n, the probability of at most k successes is

Our next bound concerns the left tail of the binomial distribution. Its corollary shows that, far from the mean, the left tail diminishes exponentially.

Theorem C.4

Consider a sequence of n Bernoulli trials, where success occurs with probability p and failure with probability q = 1 - p. Let X be the random variable denoting the total number of successes. Then for 0 < k < np, the probability of fewer than k successes is

Proof We bound the series by a geometric series using the technique from Section A.2, page 1064. For i = 1, 2,..., k, we have from equation (C.40),

If we let

it follows that

b(i - 1; n, p) < xb(i; n, p)

for 0 < i ≤ k. Iteratively applying this inequality k - i times, we obtain

b(i; n, p) < x_k-i b(k; n, p)

for 0 ≤ i < k, and hence

Corollary C.5

Consider a sequence of n Bernoulli trials, where success occurs with probability p and failure with probability q = 1 - p. Then for 0 < k ≤ np/2, the probability of fewer than k successes is less than one half of the probability of fewer than k + 1 successes.

Proof Because k ≤ np/2, we have

since q ≤ 1. Letting X be the random variable denoting the number of successes, Theorem C.4 implies that the probability of fewer than k successes is

Thus we have

since

Bounds on the right tail can be determined similarly. Their proofs are left as Exercise C.5-2.

Corollary C.6

Consider a sequence of n Bernoulli trials, where success occurs with probability p. Let X be the random variable denoting the total number of successes. Then for np < k < n, the probability of more than k successes is

Corollary C.7

Consider a sequence of n Bernoulli trials, where success occurs with probability p and failure with probability q = 1 - p. Then for (np + n)/2 < k < n, the probability of more than k successes is less than one half of the probability of more than k - 1 successes.

The next theorem considers n Bernoulli trials, each with a probability p_i of success, for i = 1, 2,...,n. As the subsequent corollary shows, we can use the theorem to provide a bound on the right tail of the binomial distribution by setting p_i = p for each trial.

Theorem C.8

Consider a sequence of n Bernoulli trials, where in the ith trial, for i = 1, 2,...,n, success occurs with probability p_i and failure occurs with probability q_i = 1 - p_i. Let X be the random variable describing the total number of successes, and let μ = E [X]. Then for r > μ,

Proof Since for any α > 0, the function e^αx is strictly increasing in x,

(C.41)

where α will be determined later. Using Markov's inequality (C.29), we obtain

(C.42)

The bulk of the proof consists of bounding E[e^{α(X - μ)}] and substituting a suitable value for α in inequality (C.42). First, we evaluate E[e^{α(X - μ)}]. Using the notation of Section 5.2, let X_i = I {the ith Bernoulli trial is a success} for i = 1, 2,...,n; that is, X_i is the random variable that is 1 if the ith Bernoulli trial is a success and 0 if it is a failure. Thus,

and by linearity of expectation,

which implies

To evaluate E[e^{α(X - μ)}], we substitute for X - μ, obtaining

which follows from (C.23), since the mutual independence of the random variables X_i implies the mutual independence of the random variables (see Exercise C.3-5). By the definition of expectation,

(C.43)

where exp(x) denotes the exponential function: exp(x) = e^x. (Inequality (C.43) follows from the inequalities α > 0, q_i ≤ 1, e^αqi ≤ e^α, and e ^αpi ≤ 1, and the last line follows from inequality (3.11)). Consequently,

(C.44)

since . Therefore, from equation (C.41) and inequalities (C.42) and (C.44), it follows that

(C.45)

Choosing α = ln(r/μ) (see Exercise C.5-7), we obtain

When applied to Bernoulli trials in which each trial has the same probability of success, Theorem C.8 yields the following corollary bounding the right tail of a binomial distribution.

Corollary C.9

Consider a sequence of n Bernoulli trials, where in each trial success occurs with probability p and failure occurs with probability q = 1 - p. Then for r > np,

Proof By equation (C.36), we have μ = E [X] = np.

Exercises C.5-1: ★

Which is less likely: obtaining no heads when you flip a fair coin n times, or obtaining fewer than n heads when you flip the coin 4n times?

Exercises C.5-2: ★

Prove Corollaries C.6 and C.7.

Exercises C.5-3: ★

Show that

for all a > 0 and all k such that 0 < k < n.

Exercises C.5-4: ★

Prove that if 0 < k < np, where 0 < p < 1 and q = 1 - p, then

Exercises C.5-5: ★

Show that the conditions of Theorem C.8 imply that

Similarly, show that the conditions of Corollary C.9 imply that

Exercises C.5-6: ★

Consider a sequence of n Bernoulli trials, where in the ith trial, for i = 1, 2,..., n, success occurs with probability p_i and failure occurs with probability q_i = 1 - p_i. Let X be the random variable describing the total number of successes, and let μ = E [X]. Show that for r ≥ 0,

(Hint: Prove that . Then follow the outline of the proof of Theorem C.8, using this inequality in place of inequality (C.43).)

Exercises C.5-7: ★

Show that the right-hand side of inequality (C.45) is minimized by choosing α = ln(r/μ).

Problems C-1: Balls and bins

In this problem, we investigate the effect of various assumptions on the number of ways of placing n balls into b distinct bins.

Suppose that the n balls are distinct and that their order within a bin does not matter. Argue that the number of ways of placing the balls in the bins is bⁿ.
Suppose that the balls are distinct and that the balls in each bin are ordered. Prove that there are exactly (b + n - 1)!/(b - 1)! ways to place the balls in the bins. (Hint: Consider the number of ways of arranging n distinct balls and b - 1 indistinguishable sticks in a row.)
Suppose that the balls are identical, and hence their order within a bin does not matter. Show that the number of ways of placing the balls in the bins is . (Hint: Of the arrangements in part (b), how many are repeated if the balls are made identical?)
Suppose that the balls are identical and that no bin may contain more than one ball. Show that the number of ways of placing the balls is .
Suppose that the balls are identical and that no bin may be left empty. Show that the number of ways of placing the balls is .