Skip to main content

Z Algorithm

Authors: @wingkwong @Cyber-Machine

Overview

In O(n)O(n) time complexity, the Z-array makes it possible for a quick string pattern search by eliminating all but the prefix substring lengths that repeat inside the string, with the help of the Z-algorithm that preprocesses the string. This is useful when it requires fast and reliable string searching.

How it works

  1. Initialization:

    • Let SS be the string for which we want to compute the Z-values.
    • Let nn be the length of SS.
    • Initialize an array ZZ of the same length nn where each value is 0.
    • Set the boundaries of the initial comparison window, l=0l = 0 and r=0r = 0.
  2. Iterate over SS from the second character to the end:

    • For each index ii, determine the Z-value as follows:
      • If i>ri > r, you do not have any previous information about the substrings that start from ii, so compare S[i]S[i] onwards with the beginning of SS until you encounter a mismatch. Update Z[i]Z[i] and adjust ll and rr accordingly.
      • If iri \leq r, then ii is within the bounds of a previously computed substring. Use previously computed Z-values to potentially skip some comparisons:
      • Compute k=ilk = i - l.
      • If Z[k]<ri+1Z[k] < r - i + 1, then Z[i]=Z[k]Z[i] = Z[k]
  3. Using the Z-array:

    • The Z-array can be used directly to solve various string matching problems, such as finding a pattern within a text.

Dry Run

We'll take the string S="abacabad"S = "abacabad" to demonstrate how Z-values are computed step-by-step. Our goal is to compute the Z-array for SS. The Z-array, ZZ, will store the length of the longest substring starting from S[i]S[i] that matches a prefix of SS.

Iteration #1

  • i=1i = 1 is greater than rr, so we need to compare from S[1]S[1] onwards.
  • We compare S[1]S[1] with S[0]S[0] and find no match (because 'b' ≠ 'a').
  • Thus, Z[1]=0Z[1] = 0, and ll and rr remain unchanged.

Iteration #2

  • i=2i = 2 is greater than rr.
  • We start comparing from S[2]S[2] onwards. S[2]S[2] matches S[0]S[0], S[3]S[3] matches S[1]S[1], but S[4]S[4] does not match S[2]S[2].
  • So, Z[2]=2Z[2] = 2, l=2l = 2, and r=3r = 3.

Iteration #3

  • i=3i = 3 is equal to rr. Here, k=il=32=1k = i - l = 3 - 2 = 1.
  • Since Z[1]=0Z[1] = 0, Z[3]=0Z[3] = 0. No need to adjust ll and rr.

Iteration #4

  • i=4i = 4 is greater than rr.
  • Compare S[4]S[4] onwards. S[4]S[4] matches S[0]S[0], but S[5]S[5] does not match S[1]S[1].
  • So, Z[4]=1Z[4] = 1, l=4l = 4, and r=4r = 4.

Iteration #5

  • i=5i = 5 is greater than rr.
  • Compare S[5]S[5] onwards. S[5]S[5] matches S[0]S[0], S[6]S[6] matches S[1]S[1], S[7]S[7] matches S[2]S[2], but the string ends there.
  • So, Z[5]=3Z[5] = 3, l=5l = 5, and r=7r = 7.

Iteration #6

  • i=6i = 6 is within the bounds of ll and rr. Here, k=il=65=1k = i - l = 6 - 5 = 1.
  • Z[6]=Z[1]=0Z[6] = Z[1] = 0 because Z[1]<ri+1Z[1] < r - i + 1.

Iteration #7

  • i=7i = 7 is also within the bounds. Here, k=il=75=2k = i - l = 7 - 5 = 2.
  • Z[7]=Z[2]=2Z[7] = Z[2] = 2 because Z[2]<ri+1Z[2] < r - i + 1.

The final Z-array is [0,0,2,0,1,3,0,2][0, 0, 2, 0, 1, 3, 0, 2].

  • Z[2] = 2: "ab" at index 2 matches the prefix "ab".
  • Z[4] = 1: "a" at index 4 matches the prefix "a".
  • Z[5] = 3: "aba" at index 5 matches the prefix "aba".
  • Z[7] = 2: "ad" at index 7 does not match any prefix since it's at the string's end, but shows the character match count.

Complexity Analysis

The Z-algorithm has the following complexities:

Time Complexity: O(n)O(n), where nn is the length of the string. This linear time complexity is achieved as the algorithm scans the string from left to right, calculating the Z-values in a single pass.

Space Complexity: O(n)O(n), due to the storage of the Z-array which contains an entry for each position in the string, matching the string's length.

Example #1: 796. Rotate String

Given two strings s and goal, return true if and only if s can become goal after some number of shifts on s.

A shift on s consists of moving the leftmost character of s to the rightmost position.

For example, if s = "abcde", then it will be "bcdea" after one shift.

Since the string has been cyclically shifted we have to basically check if goal appears when we concat our initial string two times. One way to check the occurance of this is to run two loops and check the occurance of goal in the concatenation of inital string. Using z-algorithm we can minimize our time complexity to solve this problem in linear time.

In z-algorithm we create an initial array known as z-array which stores the length of longest prefix substring that occurs in our string.

The Z array, at each index, stores the length of the longest substring starting the string till the index, matching the prefix (starting characters) of the string.

The Z algorithm uses previous values from certain intervals to match with prefix string, to speed up its execution, and these values are used based on the current window. We check whether it is possible to have a string greater than the maximum length inside the current window. If it is not possible, we skip the calculation for the remaining values inside the window.

By doing this we basically have to check the occurance of goal in our concatenation of given string. We also add a character in the beginning of string which are not present in the string (usually # or $) to distinguish between the string to be searched and the given string. For Example,

StringString = abcdeabcde

Pattern=bcdeaPattern = bcdea

NewStringNew String = abcde$bcdeabcdeaabcde\$bcdeabcdea

Z-array=0000000000400001\text{Z-array} = 0000000000400001

As we know what the goal's length is, and it's in our z-array, therefore we've found the pattern.

Written by @Cyber-Machine
class Solution:
def rotateString(self, s, goal):
l = len(s)
# Concatenation of our initial string with itself
s = s + s
# Making a new pattern
pattern = goal+'$'+s
def z_function(s):
n = len(s)
# Initializing Z - array
z = [0] * n
# We need to create an interval to store prefix substing
l, r = 0, 0
for i in range(1, n):
# If we have some substring that matches in this interval then we know that
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
if i <= r: z[i] = min(r - i + 1, z[i - l])
# Checking the prefix length of current character
while i + z[i] < n and s[z[i]] == s[i + z[i]]: z[i] += 1
# It is possible to match S[i..] to S[0..] for more than R - i + 1
# characters in this case we calculate new interval.
if i + z[i] - 1 > r: l, r = i, i + z[i] - 1
return z
# Checking if our string exist in our pattern
return len(goal) in z_function(pattern) and len(goal) == l

Example #2: 459. Repeated Substring Pattern

Given a string s, check if it can be constructed by taking a substring of it and appending multiple copies of the substring together.

As in the previous question, here we must put our special characters in a different position and determine whether we can construct our string by repeated additions of the prefix string.

Written by @Cyber-Machine
class Solution(object):
def repeatedSubstringPattern(self, s):
def z_function(s):
n = len(s)
# Initializing Z - array
z = [0] * n
# We need to create an interval to store prefix substing
l, r = 0, 0
for i in range(1, n):
# If we have some substring that matches in this interval then we know that
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
if i <= r: z[i] = min(r - i + 1, z[i - l])
# Checking the prefix length of current character
while i + z[i] < n and s[z[i]] == s[i + z[i]]: z[i] += 1
# It is possible to match S[i..] to S[0..] for more than R - i + 1
# characters in this case we calculate new interval.
if i + z[i] - 1 > r: l, r = i, i + z[i] - 1
return z
for i in range(len(s) // 2):
if(len(s) % (i + 1) == 0):
z = z_function(s[:i + 1]+"$" + s[i + 1:])
count = 0
for j in z:
if (j == (i + 1)):
count += 1
if(count == len(s) // (i + 1) - 1):
return True
return False

Example #3: 2223. Sum of Scores of Built Strings

You are building a string s of length n one character at a time, prepending each new character to the front of the string. The strings are labeled from 1 to n, where the string with length i is labeled si.

For example, for s = "abaca", s1 == "a", s2 == "ca", s3 == "aca", etc. The score of si is the length of the longest common prefix between si and sn (Note that s == sn).

Given the final string s, return the sum of the score of every si.

Here we have to find the longest prefix sum of all substring present in our string, which is basically sum of all values present in our z-array and add overall length of string in our answer as prefix of whole string is our original string.

Written by @Cyber-Machine
class Solution:
def sumScores(self, s):
def z_function(s):
n = len(s)
# Initializing Z - array
z = [0] * n
# We need to create an interval to store prefix substing
l, r = 0, 0
for i in range(1, n):
# If we have some substring that matches in this interval then we know that
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
if i <= r:
z[i] = min(r - i + 1, z[i - l])
# Checking the prefix length of current character
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
# It is possible to match S[i..] to S[0..] for more than R - i + 1
# characters in this case we calculate new interval.
if i + z[i] - 1 > r:
l, r = i, i + z[i] - 1
return z
return sum(z_function(s)) + len(s)