Skip to main content

Tim Sort

Author: @Bobliuuu
Contributor: @wingkwong

Overview

Timsort is a fast stable sorting algorithm based upon both insertion sort and merge sort. The algorithm was first created by Tim Peters in 2002, and is now being used in Python's sort() and Java's Arrays.sort(). The reason this algorithm is so fast is because it leverages the benefits of both merge sort and insertion sort. Let's see how it works!

Algorithm

Timsort works by first splitting an array into runs. A run is a subarray of data spliced from the original array. These runs are generated using merge sort (each run has a standard size of 32-64, to split the array into small enough pieces for insertion sort to be fast on each one), and insertion sort is used to sort each run. Finally, merge sort combines these sorted arrays together recursively.

Basically, to run timsort:

  • We split the array into runs.
  • For each run, run insertion sort to sort that section.
  • Merge runs together one by one using merge sort, by comparing values in each sorted list and combining them.

This algorithm works because each run is sorted using insertion sort, and merge sort makes sure that each subarray is merged to the original array in the correct position.

Example: 0287 - Find The Duplicate Number

An array of integers in the range [1, n] is given, where one integer is repeated. We have to find this repeated number.

Naive Approach: Using a Hashmap or a frequency array, we can store the number of times each element comes up. We then return an array by looping through the frequency array finding the value that appears twice. However, this requires O(n)O(n) space complexity, and the problem requires us to have O(1)O(1) space complexity.

For this sort of problem, we can use timsort to lower our space complexity!

Pseudocode

  • Sort the array using timsort
  • Loop through the array
  • If two values are the same, then that value must be repeated. Return that value.

Dry Run

Let's do a dry run of timsort with the array [5,4,3,1,2,6,7,4][5, 4, 3, 1, 2, 6, 7, 4], and a run size of 22.

  • Each run is sorted using insertion sort. The array becomes [4,5,1,3,2,6,4,7][4, 5, 1, 3, 2, 6, 4, 7].
  • The merges happen using recursion. We first attempt to split the array into two parts, down the middle.
  • First, the left part is merged, meaning the first two runs are merged. Then the array becomes [1,3,4,5,2,6,4,7][1, 3, 4, 5, 2, 6, 4, 7].
  • Then, the right part is merged, meaning the next two runs are merged. The the array becomes [1,3,4,5,2,4,6,7][1, 3, 4, 5, 2, 4, 6, 7].
  • Finally, the entire array is merged. The array is finally sorted: [1,2,3,4,4,5,6,7][1, 2, 3, 4, 4, 5, 6, 7].
Written by @Bobliuuu
class Solution {
public:
// initalize the size of each run
const int RUN = 32;
void insertionSort(vector<int>& nums, int left, int right) {
for (int i = left; i <= right; i++) {
int tmp = nums[i];
int j = i - 1;
while (j >= left && nums[j] > tmp) {
nums[j + 1] = nums[j];
j--;
}
nums[j + 1] = tmp;
}
}

void merge(vector<int>& nums, int left, int mid, int right) {
// maintain the two previous lists
vector<int> lt, rt;
int lenlt = mid - left + 1, lenrt = right - mid;
for (int i = 0; i < lenlt; i++) {
lt.push_back(nums[left + i]);
}
for (int i = 0; i < lenrt; i++) {
rt.push_back(nums[mid + 1 + i]);
}
// start recreating the correct list, putting the smaller one each time
int i = 0, j = 0, k = left;
while (i < lenlt && j < lenrt) {
if (lt[i] <= rt[j]) {
nums[k] = lt[i];
i++;
} else {
nums[k] = rt[j];
j++;
}
k++;
}
while (i < lenlt) {
nums[k] = lt[i];
k++; i++;
}
while (j < lenrt) {
nums[k] = rt[j];
k++; j++;
}
}
void timSort(vector<int>& nums) {
int n = nums.size();
// insertion sort on each run
for (int i = 0; i < n; i += RUN) {
insertionSort(nums, i, min((i + RUN-1), (n - 1)));
}
for (int size = RUN; size < n; size = 2 * size) {
for (int left = 0; left < n; left += 2 * size) {
// determine indices for each run for merging
int mid = left + size - 1, right = min((left + 2 * size - 1), (n - 1));
// merge the two runs if needed
if (mid < right) {
// use recursion to merge the array
merge(nums, left, mid, right);
}
}
}
}
int findDuplicate(vector<int>& nums) {
// use timsort to sort the array
timSort(nums);
for (int i = 0; i < nums.size() - 1; i++) {
// return the duplicate if found
if (nums[i] == nums[i + 1]) {
return nums[i];
}
}
return 0;
}
};

Complexity

The time complexity of this program is O(nlogn)O(n log n), since the merging takes lognlogn steps, and merges nn values each time.

The space complexity of this program is O(1)O(1), since we don't need any extra space other than our original array.

Suggested Problems

Problem NameDifficultySolution Link
0075 - Sort ColorsEasyView Solutions
0268 - Missing NumberEasyView Solutions
0448 - Find All Numbers Disappeared In An ArrayEasyN/A