Monday, August 26, 2019

CS2040s: Binary search and Merge sort

Problem: Find customers spending between $a and $b where b>a

How fast can we find customer who spend $a in an unsorted array?
-> o(n) [Linear search]
We are not given any information so we have to look one by one.

BUT!!! Our array is sorted..

Strategy: Exploiting structure

If our Array is already sorted, and we want a value that's less than e.g 11,
we should search the left of 11.

Binary search

function binarySearch (A, key, n) 
 low = 0  high = n - 1  
while low <= high
    mid
= (low + high )/2
    if
key < A[mid] then
          high = mid - 1 //the key is below the mid, so our high is now mid     
    else if key > A[mid] 
               low = mid + 1   // the key is above the mid, so low is now mid 
     else
          return
mid   //we found it! やたあああああ
return not_found  //not found 残影 :(

Key is the number we want


Checking algorithm correctness:

Precondition:

Precondition: Condition that is true before some set of operation
(Like loops or function)

1. Sorted
2. A is of size n

Postconditions:

Postcondition: Condition that is true after some set of operation
(like if else)

1. A[mid] = key
2. or key is not found

Invariants

Its a condition that is true during some execution of a program or a set of operation

Loop Invariants:
A condition that is true before and after each iteration of a loop


1. The key must be between low and high if the key is in the array
2.  Work through the loop and consider each case

Loop Termination:

A[mid] = key
or high < low (not_Found)
This is the desired results

HOWEVER, THE BINARY SEARCH IS NOT EXACTLY CORRECT

Time complexity:
 O(logn)

Each iteration, we divide the search by 2
Iteration k: n/(2^k) = 1
1 is the base element
k = logn

Space Complexity:
O(1) - We are not creating new memory, its bounded by constant


Divide and Conquer

1. Split into subproblems
2. Solve recursively
3. Combine solutions for each subproblem to produce a solution to original problem

start = findIndex(customers, $a)
end = findIndex(customers, $b)
for (int i=start; i<=end; i++)
println(customers[i])
 

 This is n time complex because the for loop is n time complexity

Duplicates

There is something wrong with the algorithm.
-> IF there are people with the same amount, we will miss them out
Instead we need to find the first index rather than find index


Binary search to find First index

1. Find the index using binary search
2. If find the first number, use  binary search to decrease the high pointer till the next number
in the left is another number (Find last number)

function binarySearchFirst (A, key, n)          low = 0          high = n – 1
          result = not_found
          while low <= high
               mid
= (low + high)/2
               if
key < A[mid] then               high = mid - 1               else if key == A[mid] then                    result = mid
                    high = mid - 1
               else                    low = mid + 1           return result

What is the worst case
computational complexity of
binarySearchFirst()?
A. 


Time complexity of binary search first: O(logn)
Still binary searching for the first duplicate

Precondition

- A is of size n
- A is sorted

Post Condition

- A[mid] = key
- A[mid] is the first occurrence of key


We can find the last index in a similar way..

However, there's a bug in the binary search
(mid = (low +  high) /2)
This will overflow if low add high overflow

Merge sort

function mergeSort(A, low, high)
  if
low < high
  mid
= (high + low)/2
  mergeSort(
A, low, mid)
  mergeSort(
A, mid+1, high)
  merge(
A, low, mid, high)


function merge(int low, int mid, int high)
  for i = low to high
  buffer[i] = A[i]
   i = low
   j = mid + 1
   k = low

//Initialise

while i <= middle and j <= high
    if buffer[i] <= buffer[j]
       A[k] = buffer[i]
        i++
     else
     A[k] = buffer[j]
        j++
        k++

// Iterate through helper.
// At each iteration, compare elements in both segments and copy smaller element over

 while i <= middle
   A[k] = buffer[i]
    k++
    i++
 
//Copy remaining items from left segement
  
L is buffer[low,...,mid]
R is buffer[mid+1,...,high]
The array A contains sorted elements of L and R

The loop invariant:
The pointer must be between 0 and mid
and another pointer between mid and high 

Time complexity: O(n) //from while

Total Time Complexity: O(nlogn)
Merge sort is logn
Merge is n


Mergesort:
Recurse Downwards
n=2^h

O(nlogn) is the fastest for comparison based sort




<Prev                                 Next>

No comments:

Post a Comment