Parallel burden

Fork and Join Framework
Basic Structure:

compute(){

//do smaller situation
e.g
if(x<threshold){
return //smaller situation
}
//create instance
//instance fork left

return leftjoin + right.compute()
}


However, sometimes, if the job is small, it is better to not do parallel.
If program logic is linear we should avoid parallelising. Spawning thread is also time costly thus sometimes it is better to not parallel it.
Errors:
- Dependency:
if its stateful or one thing depends on the other it is hard to parallelize.
This is due to one thread needing to wait on other due to dependancy thus they will take a longer time as they have to wait for each other.
- Contension
If there are many threads trying to access the same but only one can access
- Race condition
When one task is finish before the other resulting in undesirable results
Best framework:
Left.fork , right.fork , right.join , left.join


Work stealing:
Workers will steal other thread's queue when they are done. They will only steal from the back of the queue. Thus the queue will have the shortest time needed for a job to be done nearer to the thread and have the larger time job be at the back of the queue.

Example
protected Boolean compute() {
if (high - low <= 1) {
return array[low] == toFind;
} else {
int middle = (low + high)/2;
if (array[middle] > toFind) {
BinSearch left = new BinSearch(low, middle, toFind, array);
left.fork();
return left.join();
} else {
BinSearch right = new BinSearch(middle, high, toFind, array);
return right.compute();
   }
}
}
}
 

Assume we are searching the array = {1,2,3,4,5,6}
The array is small thus searching would be faster if the code was not parallised
Since the array is small and referring to the code, when searching right where we look for the larger number, it calls compute instead of forking thus in this situation if we search for a larger number, it will be faster compared to if we search for a smaller integer.

Editors note:
Binary search also requires only one operation
ie: smaller or greater than.
Thus it is not viable to do parallelism in this example


2. Parallel Stream
Side effects and Stateful Operation
 Side-effects in behavioural parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
Example of side effects:

List<Integer> list = new ArrayList<>(
Arrays.asList(1, 3, 5, 7, 9, 11, 13, 15, 17, 19));
List<Integer> result =
new ArrayList<>( );

The following is erroneous
list.parallelStream() // list.stream().parallel().filter(x -> isPrime(x))
.forEach(x -> result.add(x));
 

This is due to it being parallized. Some calculation may end up finishing faster than the other thus when adding it back during the list. Thus the order of elements will not be the same as the original.

We can bypass this by Use .collect instead

result = list.parallelStream()
.filter(x -> isPrime(x))
.collect(Collectors.toList());


Rules for parallelizing:
- Combiner.apply(identity,i) must be equal to i
(Combiner operation must have an identity)
A* 1 = A = 1*A
In this case, the combiner is (x,y) - > x*y where 1 is the identity
- Combiner and accumulator must be associative
A*B = B * A
- combiner and accumulator must be compatible
i.e combiner.apply(u, accumulator.apply(identity, t))
must be equal to accumulator.apply(u,t)

When parallelizing streams that use accumulator, we must havae a combiner to combine the resulting accumulate from all the different streams


<Prev Stacks and Heaps                                      Next Runtime compile time > 

No comments:

Post a Comment