Tim Sort
Overview
Timsort is a fast stable sorting algorithm based upon both insertion sort and merge sort. The algorithm was first created by Tim Peters in 2002, and is now being used in Python's sort()
and Java's Arrays.sort()
.
The reason this algorithm is so fast is because it leverages the benefits of both merge sort and insertion sort. Let's see how it works!
Algorithm
Timsort works by first splitting an array into runs. A run is a subarray of data spliced from the original array. These runs are generated using merge sort (each run has a standard size of 32-64, to split the array into small enough pieces for insertion sort to be fast on each one), and insertion sort is used to sort each run. Finally, merge sort combines these sorted arrays together recursively.
Basically, to run timsort:
- We split the array into runs.
- For each run, run insertion sort to sort that section.
- Merge runs together one by one using merge sort, by comparing values in each sorted list and combining them.
This algorithm works because each run is sorted using insertion sort, and merge sort makes sure that each subarray is merged to the original array in the correct position.
Example: 0287 - Find The Duplicate Number
An array of integers in the range [1, n] is given, where one integer is repeated. We have to find this repeated number.
Naive Approach: Using a Hashmap or a frequency array, we can store the number of times each element comes up. We then return an array by looping through the frequency array finding the value that appears twice. However, this requires space complexity, and the problem requires us to have space complexity.
For this sort of problem, we can use timsort to lower our space complexity!
Pseudocode
- Sort the array using timsort
- Loop through the array
- If two values are the same, then that value must be repeated. Return that value.
Dry Run
Let's do a dry run of timsort with the array , and a run size of .
- Each run is sorted using insertion sort. The array becomes .
- The merges happen using recursion. We first attempt to split the array into two parts, down the middle.
- First, the left part is merged, meaning the first two runs are merged. Then the array becomes .
- Then, the right part is merged, meaning the next two runs are merged. The the array becomes .
- Finally, the entire array is merged. The array is finally sorted: .
- C++
class Solution {
public:
// initalize the size of each run
const int RUN = 32;
void insertionSort(vector<int>& nums, int left, int right) {
for (int i = left; i <= right; i++) {
int tmp = nums[i];
int j = i - 1;
while (j >= left && nums[j] > tmp) {
nums[j + 1] = nums[j];
j--;
}
nums[j + 1] = tmp;
}
}
void merge(vector<int>& nums, int left, int mid, int right) {
// maintain the two previous lists
vector<int> lt, rt;
int lenlt = mid - left + 1, lenrt = right - mid;
for (int i = 0; i < lenlt; i++) {
lt.push_back(nums[left + i]);
}
for (int i = 0; i < lenrt; i++) {
rt.push_back(nums[mid + 1 + i]);
}
// start recreating the correct list, putting the smaller one each time
int i = 0, j = 0, k = left;
while (i < lenlt && j < lenrt) {
if (lt[i] <= rt[j]) {
nums[k] = lt[i];
i++;
} else {
nums[k] = rt[j];
j++;
}
k++;
}
while (i < lenlt) {
nums[k] = lt[i];
k++; i++;
}
while (j < lenrt) {
nums[k] = rt[j];
k++; j++;
}
}
void timSort(vector<int>& nums) {
int n = nums.size();
// insertion sort on each run
for (int i = 0; i < n; i += RUN) {
insertionSort(nums, i, min((i + RUN-1), (n - 1)));
}
for (int size = RUN; size < n; size = 2 * size) {
for (int left = 0; left < n; left += 2 * size) {
// determine indices for each run for merging
int mid = left + size - 1, right = min((left + 2 * size - 1), (n - 1));
// merge the two runs if needed
if (mid < right) {
// use recursion to merge the array
merge(nums, left, mid, right);
}
}
}
}
int findDuplicate(vector<int>& nums) {
// use timsort to sort the array
timSort(nums);
for (int i = 0; i < nums.size() - 1; i++) {
// return the duplicate if found
if (nums[i] == nums[i + 1]) {
return nums[i];
}
}
return 0;
}
};
Complexity
The time complexity of this program is , since the merging takes steps, and merges values each time.
The space complexity of this program is , since we don't need any extra space other than our original array.
Suggested Problems
Problem Name | Difficulty | Solution Link |
---|---|---|
0075 - Sort Colors | Easy | View Solutions |
0268 - Missing Number | Easy | View Solutions |
0448 - Find All Numbers Disappeared In An Array | Easy | N/A |