< Day Day Up > |
We shall let Σ* (read "sigma-star") denote the set of all finite-length strings formed using characters from the alphabet Σ. In this chapter, we consider only finite-length strings. The zero-length empty string, denoted ε, also belongs to Σ*. The length of a string x is denoted |x|. The concatenation of two strings x and y, denoted xy, has length |x| + |y| and consists of the characters from x followed by the characters from y.
We say that a string w is a prefix of a string x, denoted w ⊏ x, if x = wy for some string y ∈ Σ*. Note that if w ⊏ x, then |w| ≤ |x|. Similarly, we say that a string w is a suffix of a string x, denoted w ⊐ x, if x = yw for some y ∈ Σ*. It follows from w ⊐ x that |w| ≤ |x|. The empty string ε is both a suffix and a prefix of every string. For example, we have ab ⊏ abcca and cca ⊐ abcca. It is useful to note that for any strings x and y and any character a, we have x ⊐ y if and only if xa ⊐ ya. Also note that ⊏ and ⊐ are transitive relations. The following lemma will be useful later.
Suppose that x, y, and z are strings such that x ⊐ z and y ⊐ z. If |x| ≤ |y|, then x ⊐ y. If |x| ≥ |y|, then y ⊐ x. If |x| = |y|, then x = y.
Proof See Figure 32.3 for a graphical proof.
For brevity of notation, we shall denote the k-character prefix P[1 ‥ k] of the pattern P[1 ‥ m] by Pk. Thus, P0 = ε and Pm = P = P[1 ‥ m]. Similarly, we denote the k-character prefix of the text T as Tk. Using this notation, we can state the string-matching problem as that of finding all shifts s in the range 0 ≤ s ≤ n-m such that P ⊐ Ts+m.
In our pseudocode, we allow two equal-length strings to be compared for equality as a primitive operation. If the strings are compared from left to right and the comparison stops when a mismatch is discovered, we assume that the time taken by such a test is a linear function of the number of matching characters discovered. To be precise, the test "x = y" is assumed to take time Θ(t + 1), where t is the length of the longest string z such that z ⊏ x and z ⊏ y. (We write Θ(t + 1) rather than Θ(t) to handle the case in which t = 0; the first characters compared do not match, but it takes a positive amount of time to perform this comparison.)
< Day Day Up > |