Tuesday, August 27, 2019

CS2040S: Lecture 5 - QuickSort

Quick sort

Does a clever split, a recursive solve and trivial combine
function Quicksort(A, low, high)
if
low < high 
p = Partition(A, low, high) //split
Quicksort (
A, low, p-1) //recursive
Quicksort (
A, p+1, high)

Lumoto's partition algo:

function Partition(A, low, high)   
    v = A[low]  
    m = low
     i = low
//Initialisation
       for i = (low + 1) to high 
           if A[i] < v 
                m++
                 swap(
A[i], A[m])
       swap(
A[m], A[low])
       return
m

Precondition
1. A is an array
2. 0< = low < = high < = A.length
3. A has no duplicate

Post condition
1. A[m] = v //mid
2. low<= k <= m-1 , A[k] < v //lower box
3. m+1 < = k < = high , A[k] > v  //upper box

Loop Invariances
1. A[low] = v
2. if low+1 <= k <= m, A[k] < v
3. if m+1 <= k <= i, A[k] > k


"Quick sort assumes that left is already sorted"
1. Choose a pivot (first index)
2. Iterate from pivot up
3. If element is smaller than pivot, m++, swap element with element at counter m
4. i++, repeat 3
5. Stop only when i = high

The idea is put all the smaller than pivot at the left, put all the bigger in the right.
Finally, put the median in the correct position



Correctness and Efficiency:

There are algorithms that are fast but sometimes incorrect.
Quicksort is n^2 depending on the situation.
This is due to the fact that Lumoto's algo will still partition even if the data structure is already sorted when the wrong pivot is chosen.


A deterministic quicksort is
T(n-1) + T(1) +cn
T(n-1) -> Cost of quicksort on n-1 elements
T(1) -> Cost of Quicksort on 1 element
cn -> Cost of partition on n elements

//Partition takes n time
p = Partition(A, low, high) //cn time
Quicksort (
A, low, p-1)  // n-1
Quicksort (
A, p+1, high)  //1



The problem is that we always pick A[low] for a pivot so instead of doing that,
we can pick the median value.

Better quicksort,
Our time complexity will  be 2T(n/2) + cn
T(n/2) -> Cost of Quicksort on n/2 high elements
t(n/2) - > Cost of quicksort on n/2 low elements
cn -> cost of partition of n elements

But for quick sort the worst is nlogn



Hoare's Partition algo

Unlike lumoto, Hoare's partition method is harder but is more efficient as it runs 3 times faster and does fewer sorts on average compared to Lumoto. Like Lumoto, it will be n^2 if the array is already sorted.

(Check if low< high) 
1. Iterate from the left up
2. If larger or equal to pivot then break out of the loop with i at the index
3. Iterate from the right down
4. If smaller than pivot, break out of the loop with j at the index
5. If the two pointers (i>= j), return j
else
swap elements at i and j
6. After left side and right side is partitioned, swap the pivot with A[j] ,which is smaller than pivot
7. Quick sort the lower half, and Quick sort the upper half


function Partition(A, low, high)
     v = A[low]
     i = low+1;
     j = high;

     while i < j
          while (A[i] < v) and (i <= high) i++
          while (A[j] > v) and (j >= low) j--
          if (i<j) swap(A[i], A[j])
     swap(A[j], A[low])
     return j


Pivot choices

Selecting Median

function Quicksort(A, low, high)
if 
low high 
mid = (low+ high) /2
p = Partition(A, mid, lowhigh) //split
Quicksort (
Alowp-1) //recursive
Quicksort (
Ap+1high)

This will allow the split to be 50:50


But,we don't need a 50:50 split for it to be nlogn

Because its always Cnlogn

Imagine a 90:10 Partition,
T(n) = T(n/10) + T(9n/10) + cn

Base case is 1 element,
If each time we split, we will reach 1 = n(9/10)^h where h is the total height
We use 9/10 because it have more elements, so it will spread the deepest
then h = clog(2) n

At every level, we take n effort to partition each level,
the time complexity is then nlogn

Randomized quicksort

function RandomizedPartition(A, low, high)
       r = Random(A, low, high) //median is randomised
       swap(
A[r], A[low])
       return Partition(
A, low, high)  

Randomly sort an elements
Just swap a random element with the first element.

Intuition: Random element should result in balance split on average

Paranoid Quicksort

Make sure that it will never satisfy the condition of 1:9 partition
Will continue to randomise until the low and high is not a pivot



Probability of picking a good pivot

8/10 (Pick the numbers 2-9)
Bad pivots would be 1 and 10


Problem: Duplicates

Quicksort will take longer when there is duplicates.

3 way partition + quicksort:

1. Cuts into 3 
   a. Less than v
   b. Equal to V
   c. Greater than V
2. If find a duplicate, increase partition
3. Else do the same as normal quick sort




Quick sort is unstable

Stable: Everything remains unsorted if the key is the same.
Unstable: The keys may be swapped even if they are the same

Quicksort is unstable because it just swaps the value without considering the original position.

Stable

- Mergesort
- Insertion sort
- Bubble sort

Unstable

- Selection sort (Arrays)
- Quick sort

<Prev                Next>




No comments:

Post a Comment