next up previous
Next: IBM SP system Up: Parallel Implementation of the Previous: Results

PPro/Solaris system

The results for the line selection procedure on the PPro/Solaris system are shown in Fig. 1. It is apparent from the figure that the GTF approach delivers higher relative speedups that translate into smaller execution times for more than 2 PEs. For serial (1PE) and 2PE parallel runs the LTF line selection is substantially faster than the GTF algorithm. The reason for this behavior can be explained by noting that the access of the global files is done through NFS mounts that use the same network as the MPI messages. Therefore, n-1 PEs request different data blocks from the NFS server (no process was run on the NFS server itself) and send their results to the I/O PE, which writes it out to the NFS server. In the LTF algorithm, each PE reads a different input block from the NFS server and then sends its results (around the ring) to all other PEs. Upon receiving data from its left neighbor, a PE writes it to local disk. This means that the amount of data streaming over the network can be as much as twice as high for the LTF compared to the GTF algorithm. This increases the execution time for the LTF approach if the network utilization is close to the maximum bandwidth. In this argument we have ignored the time required to write the data to local disks, which would make the situation worse for the LTF approach.

The situation is very different for the calculation of the line opacities, c.f. Fig. 2. Now the LTF approach scales well (up to the maximum of 8 available machines) whereas the GTF algorithm hardly scales to more than 2 PEs. The absolute execution times for the LTF approach are up to a factor of 4 smaller (more typical are factors around 2) than the corresponding times for the GTF algorithm (the GTF run with 8PEs required roughly as much execution time as the LTF run with 1PE!). The reason for this is clearly the speed advantage of the local disk I/O compared to the NFS based I/O in the GTF code. If more PEs are used in the GTF line opacity approach, the network becomes saturated quickly and the PEs have to wait for their data (the NFS server itself was not the bottleneck). The LTF approach will be limited by the fact that as the number of PEs get larger, the efficiency of disk caching is reduced and more physical I/O operations are required. Eventually this will limit the scaling as the execution time is limited by physical I/O to local disks.


next up previous
Next: IBM SP system Up: Parallel Implementation of the Previous: Results
Peter Hauschildt
2001-04-16