GitHub - hanluc/EECS-224-projects: This is a folder for eecs 221 group work on hpc server.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
EECS224_hw1		EECS224_hw1
.gitignore		.gitignore
README		README

Repository files navigation

Algorithm

  1.suppose the number of process is p and the numeber of elements in the array is N
  2.assign each process a consecutive N/p elements 
  3.choose a element as pivot and broadcast it to p process, which takes O(logN)  time 
  4.parallelly do local rearrangement, swap the elements smaller than pivot to 
  the front of each segment, elements larger than pivot to the end of each 
  segment, which will take O(N/p) time
  5.keep track of the number of smaller elements in a p-length array
  6.calculate the start posion of smaller and larger elements of each segment by  doing prefix sum parallel, which takes O(log  N) time and O(2*p) space    
  7.put rearranged segments to an auxiliary size-N array by referring prefix sum  arrays, which takes O(N/p) time


Time Complexity and Work Analysis 
  
  D(n) = O((N/p) * log(N/p)) + O((N/p) * logp)
  W(n) = O(N * logN)
  auxiliary storage S = N + 3 * p


Performance
  
  number of process = 8
  
  Part 1 - after parallelizing two recursive calls
    N = 10000
    serial   0.00138 s ==> 7.24638M keys/s
    parallel 0.00106 s ==> 9.43396M keys/s
    N = 100000
    serial   0.014654s ==> 6.82408M keys/s
    parallel 0.012158s ==> 8.22504M keys/s
    N = 1000000
    serial   0.174923s ==> 5.7168 M keys/s
    parallel 0.137372s ==> 7.2795 M keys/s
    N = 10000000
    serial   2.14754 s ==> 4.6565 M  keys/s
    parallel 1.61317 s ==> 6.19896M keys/s

  Part 2 - after parallelizing partition
    N = 10000
    serial   0.001376s ==> 7.26744M keys/s
    parallel 0.001178s ==> 8.48896M keys/s
    N = 100000
    serial   0.018175s ==> 5.50206M keys/s
    parallel 0.013965s ==> 7.16538M keys/s
    N = 1000000
    serial   0.175867s ==> 5.68612M keys/s
    parallel 0.17791 s ==> 5.62082M keys/s
    N = 10000000
    serial   2.1359  s ==> 4.68188M keys/s
    parallel 2.59649 s ==> 3.85135M keys/s


Analysis
  
  After parallelising two recursive calls, the executing time has been reduced by 21.5% on 
  average. This is due to the parallel two recursive calls.

  After parallelizing partition part by assigning work to p process, for smaller cases such as
  N = 10000, 100000, executing time has been reduced by 20% on average. However, when 
  N = 1000000, 10000000, parallelizing is no better or even worse than serial, this is probably due to the relatively long 
  communication time between two nodes using MPI.
  
  When N = 10000, 100000, parallelizing partition reduced executing time to 1/10 of the 
  executing time of parallelizing two recursive calls. However, when N = 1000000, 10000000,
  parallelizing partition is no better than parallelizing two recursive calls, this is probably due to the copying of the 
  whole array overhead and communication with MPI overhead.


Work Partition

  Hanlu  Chen 54086716
              1. locally rearrange the array assigned to each process
              2. determine the locations in the global rearranged array
              3. optimizing storage and time performance

  Mengqi Wang 55967217
              1. determine and broadcast the pivot
              2. perform the global rearrangement  
              3. test and analyze performance