Z Algorithm
Overview
In time complexity, the Z-array makes it possible for a quick string pattern search by eliminating all but the prefix substring lengths that repeat inside the string, with the help of the Z-algorithm that preprocesses the string. This is useful when it requires fast and reliable string searching.
How it works
-
Initialization:
- Let be the string for which we want to compute the Z-values.
- Let be the length of .
- Initialize an array of the same length where each value is 0.
- Set the boundaries of the initial comparison window, and .
-
Iterate over from the second character to the end:
- For each index , determine the Z-value as follows:
- If , you do not have any previous information about the substrings that start from , so compare onwards with the beginning of until you encounter a mismatch. Update and adjust and accordingly.
- If , then is within the bounds of a previously computed substring. Use previously computed Z-values to potentially skip some comparisons:
- Compute .
- If , then
- For each index , determine the Z-value as follows:
-
Using the Z-array:
- The Z-array can be used directly to solve various string matching problems, such as finding a pattern within a text.
Dry Run
We'll take the string to demonstrate how Z-values are computed step-by-step. Our goal is to compute the Z-array for . The Z-array, , will store the length of the longest substring starting from that matches a prefix of .
Iteration #1
- is greater than , so we need to compare from onwards.
- We compare with and find no match (because 'b' ≠ 'a').
- Thus, , and and remain unchanged.
Iteration #2
- is greater than .
- We start comparing from onwards. matches , matches , but does not match .
- So, , , and .
Iteration #3
- is equal to . Here, .
- Since , . No need to adjust and .
Iteration #4
- is greater than .
- Compare onwards. matches , but does not match .
- So, , , and .
Iteration #5
- is greater than .
- Compare onwards. matches , matches , matches , but the string ends there.
- So, , , and .
Iteration #6
- is within the bounds of and . Here, .
- because .
Iteration #7
- is also within the bounds. Here, .
- because .
The final Z-array is .
- Z[2] = 2: "ab" at index 2 matches the prefix "ab".
- Z[4] = 1: "a" at index 4 matches the prefix "a".
- Z[5] = 3: "aba" at index 5 matches the prefix "aba".
- Z[7] = 2: "ad" at index 7 does not match any prefix since it's at the string's end, but shows the character match count.
Complexity Analysis
The Z-algorithm has the following complexities:
Time Complexity: , where is the length of the string. This linear time complexity is achieved as the algorithm scans the string from left to right, calculating the Z-values in a single pass.
Space Complexity: , due to the storage of the Z-array which contains an entry for each position in the string, matching the string's length.
Example #1: 796. Rotate String
Given two strings s and goal, return true if and only if s can become goal after some number of shifts on s.
A shift on s consists of moving the leftmost character of s to the rightmost position.
For example, if s = "abcde", then it will be "bcdea" after one shift.
Since the string has been cyclically shifted we have to basically check if goal appears when we concat our initial string two times. One way to check the occurance of this is to run two loops and check the occurance of goal in the concatenation of inital string. Using z-algorithm we can minimize our time complexity to solve this problem in linear time.
In z-algorithm we create an initial array known as z-array which stores the length of longest prefix substring that occurs in our string.
The Z array, at each index, stores the length of the longest substring starting the string till the index, matching the prefix (starting characters) of the string.
The Z algorithm uses previous values from certain intervals to match with prefix string, to speed up its execution, and these values are used based on the current window. We check whether it is possible to have a string greater than the maximum length inside the current window. If it is not possible, we skip the calculation for the remaining values inside the window.
By doing this we basically have to check the occurance of goal in our concatenation of given string. We also add a character in the beginning of string which are not present in the string (usually #
or $
) to distinguish between the string to be searched and the given string. For Example,
=
=
As we know what the goal's length is, and it's in our z-array, therefore we've found the pattern.
- Python
class Solution:
def rotateString(self, s, goal):
l = len(s)
# Concatenation of our initial string with itself
s = s + s
# Making a new pattern
pattern = goal+'$'+s
def z_function(s):
n = len(s)
# Initializing Z - array
z = [0] * n
# We need to create an interval to store prefix substing
l, r = 0, 0
for i in range(1, n):
# If we have some substring that matches in this interval then we know that
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
if i <= r: z[i] = min(r - i + 1, z[i - l])
# Checking the prefix length of current character
while i + z[i] < n and s[z[i]] == s[i + z[i]]: z[i] += 1
# It is possible to match S[i..] to S[0..] for more than R - i + 1
# characters in this case we calculate new interval.
if i + z[i] - 1 > r: l, r = i, i + z[i] - 1
return z
# Checking if our string exist in our pattern
return len(goal) in z_function(pattern) and len(goal) == l
Example #2: 459. Repeated Substring Pattern
Given a string s, check if it can be constructed by taking a substring of it and appending multiple copies of the substring together.
As in the previous question, here we must put our special characters in a different position and determine whether we can construct our string by repeated additions of the prefix string.
- Python
class Solution(object):
def repeatedSubstringPattern(self, s):
def z_function(s):
n = len(s)
# Initializing Z - array
z = [0] * n
# We need to create an interval to store prefix substing
l, r = 0, 0
for i in range(1, n):
# If we have some substring that matches in this interval then we know that
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
if i <= r: z[i] = min(r - i + 1, z[i - l])
# Checking the prefix length of current character
while i + z[i] < n and s[z[i]] == s[i + z[i]]: z[i] += 1
# It is possible to match S[i..] to S[0..] for more than R - i + 1
# characters in this case we calculate new interval.
if i + z[i] - 1 > r: l, r = i, i + z[i] - 1
return z
for i in range(len(s) // 2):
if(len(s) % (i + 1) == 0):
z = z_function(s[:i + 1]+"$" + s[i + 1:])
count = 0
for j in z:
if (j == (i + 1)):
count += 1
if(count == len(s) // (i + 1) - 1):
return True
return False
Example #3: 2223. Sum of Scores of Built Strings
You are building a string s of length n one character at a time, prepending each new character to the front of the string. The strings are labeled from 1 to n, where the string with length i is labeled si.
For example, for s = "abaca", s1 == "a", s2 == "ca", s3 == "aca", etc. The score of si is the length of the longest common prefix between si and sn (Note that s == sn).
Given the final string s, return the sum of the score of every si.
Here we have to find the longest prefix sum of all substring present in our string, which is basically sum of all values present in our z-array and add overall length of string in our answer as prefix of whole string is our original string.
- Python
class Solution:
def sumScores(self, s):
def z_function(s):
n = len(s)
# Initializing Z - array
z = [0] * n
# We need to create an interval to store prefix substing
l, r = 0, 0
for i in range(1, n):
# If we have some substring that matches in this interval then we know that
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
if i <= r:
z[i] = min(r - i + 1, z[i - l])
# Checking the prefix length of current character
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
# It is possible to match S[i..] to S[0..] for more than R - i + 1
# characters in this case we calculate new interval.
if i + z[i] - 1 > r:
l, r = i, i + z[i] - 1
return z
return sum(z_function(s)) + len(s)