Top elements and -th element
Two twists on the same idea that show up frequently in coding interviews in various incarnations:
- given an unsorted array of objects of length , find the top objects based on a given comparison function
- given an unsorted array of objects of length , find -th object based on a given comparion function, that is an object that would appear at the -th place in an sorted array
You can expect , that is the length of the array is much bigger than the number of elements we are interested in.
In this article, we will go through 3 solutions to these problems:
- a trivial solution with complexity
- a medium complexity solution with complexity
- a complex solution that relies on prior knowledge of specific algorithms with complexity
All solutions are presented using an array of ints to minimize amount of boilerplate.
The trivial solution is given to us on a silver plate in the assignment text.
If the array would be sorted, returning either the -th element or the top elements would be trivial.
So let's just do that. Sort the array and simply return the -th element or the top elements of the sorted array.
Complexity analysis of trivial solution
We sort the array, which has elements resulting in complexity. Since sorting is a destructive operation we also need to create a copy of the array resulting in space complexity.
When we consider optimizations, we need to look at the unnecessary work we are doing. Since we do not care about the elements besides the top we do not need to sort them. The standard library already offers an appropriate function
Complexity analysis of partial sort solution
Time complexity of
partial_sort is .
We are still copying the array, so space complexity remains unchanged at .
Medium solution with space complexity
One problem with the previous straightforward solution is that we still need to make copy of the entire array, since
partial_sort is still destructive, leaving us with space complexity.
We must store the top elements, so the space complexity will be at least . If we scan through the array and remember the top elements upto now, we would end up with space complexity.
Datastructure we require must have the ability to insert elements, look a the smallest element and remove the smallest element all in time. Fortunately, heap provides such characteristics. In C++ we can use the
priority_queue (which is not necessarily implemented as a heap, but providing the same performance characteristics).
Complexity analysis of medium solution
We scan through all the elements of the array and at worst case we will insert each of them into the
priority_queue. This leads to total complexity ( calls of
pop(), each with complexity), ultimately collapsing to . As already discussed the space complexity of this approach is , as we only ever store elements in the
First observation is that if we know the -th element, getting the top elements is a simple scan through the array with complexity. So let's just concentrate on solving the -th element in time.
What would happen if we would just blindly guess that a particular element is the -the element. How would we verify that? If we use this element as a pivot and partition the array around it, we end up with all the bigger elements to its left and all the smaller elements to its right. If the picked element ends up on the -th position, we know that our guess was correct (if the array has all unique elements).
What happens if we weren't correct? Well, we know on which side of the pivot our -the element lies at least. If the pivot ended up on a position then we need to search the elements to the right of the pivot, if it ended up on a position , we need to search to the left.
This might sound very similar to the Quicksort algorithm and indeed, this is referred to as the Quickselect algorithm.
There is one more important step however. You might recall that Quicksort with a random pivot is in worst case. Quickselect suffers the same faith unfortunately.
To fix this we need the median of medians algorithm.
Median of medians
The source for the worst case of Quickselect is a corner case when we keep choosing either the smallest or the largest element as our pivot. In that case we only improve the situation by one element, recursing over elements which ends up in the class.
What if we could pick the pivot somewhat well. Let's say that we always pick a pivot that splits that array in such a way that the larger portion has elements. Now the recursive pattern would be a geometric progression of Sum of such progression is . For all the right side of the sum ends up a constant number as we can bound it bellow .
The core idea of the median of medians is to process the array in chunks of 5 elements and pick medians from these 5 element sub-arrays. Due to the constant size, picking each of these medians is . Once we have all the medians, we then recursively apply the Quickselect algorithm to pick a median from these elements. The median (for size 5) has a nice property of splitting the array into at worst and chunks.
Complexity analysis of complex solution
As already mentioned, due to the smart choice of our median, the time complexity is . Space complexity is as this is a destructive method and we need to make a copy of the array.
Next: data structure
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Šimon Tóth 2020 (email@example.com)