Next: Bibliography Up: Parallel Implementation of the Previous: IBM SP system

Summary and Conclusions

In this paper we have discussed two algorithms for parallel spectral line selection and opacity calculations useful for direct opacity sampling models of stellar atmospheres. The GTF algorithm uses global temporary files to store the list of selected lines, whereas the LTF algorithm stores the scratch files on local disks. These two methods show very different performance characteristics on different parallel systems. On a PC cluster, each processing element (PE) being a PC with its own local disk space and operating system networked with standard IP based Ethernet, the LTF algorithm has disadvantages in the line selection procedure for larger numbers of PEs ( $\ge 4$ ), probably due to the higher demand put on the communication between nodes using the MPI library, but is slightly faster than the GTF approach if the number of PEs is smaller. However the LTF code produces far faster and better scaling line opacity calculations, which will be the more costly part of a typical atmosphere model run (the line selection is usually required only once at the beginning of a model calculation).

On the common IBM SP parallel supercomputers the situation changes significantly. On this machine, the small test runs so fast that the timing is dominated by side effects. In the large test case, the GTF line selection performs and scales far better than the LTF code. This surprising result is caused by the presence of a parallel filesystem (GPFS) on the IBM that dramatically improves performance of global I/O compared to local disk I/O. The GPFS also boosts the performance for the GTF code for the (in practical applications more important) line opacity calculations, for both the small and the large test cases.

Variations of the algorithms can be constructed, e.g., it is possible to store the master line database on each PE individually and thus totally remove global I/O to a single master line list (this requires enough local disk space to store both the master and temporary databases). Other improvements are possible, e.g., optimization of the I/O blocksize for each type of machine. However, these optimizations are system dependent (and also depend on the load of the machine in general) and thus are not discussed here.

The algorithms and the results show that parallel computing can lead to dramatic speed improvements in stellar atmosphere calculations but also that different algorithms are required for different types and capabilities of parallel machines. The speed improvements can then be used to develop physically more complex and detailed models (e.g., including massive NLTE calculations with line blanketing, models for M dwarfs with possibly billions of molecular spectra lines or detailed models for stellar winds and for hot stars with radiative levitation modeling including a large number of elements and ionization stages). This approach sends us one step further to a better physical understanding of stars and their spectra.

Both algorithms we have described here are applicable to a wide range of problems that have data requirements that are larger than the available memory and thus need to perform out-of-core calculations. They can be trivially adapted to any case in which a large amount of shared data has to be utilized by a number of processors simultaneously and where it is not easy to use, e.g., a domain-decomposition approach to allow each processor to use only a distinct, smaller subset of the data. If the exchange of data can be arranged in a ring-like topology and the communication network of the parallel computer used is fast, then the LTF algorithm should be efficient, however, if the machine has a fast parallel filesystem, then the GTF approach is both simpler to implement and more efficient.

This work was supported in part by NSF grant AST-9720704, NASA ATP grant NAG 5-8425 and LTSA grant NAG 5-3619, as well as NASA/JPL grant 961582 to the University of Georgia and in part by NSF grants AST-97314508, by NASA grant NAG5-3505 and an IBM SUR grant to the University of Oklahoma. This work was supported in part by the Pôle Scientifique de Modélisation Numérique at ENS-Lyon. Some of the calculations presented in this paper were performed on the IBM SP2 of the UGA UCNS, on the IBM SP ``Blue Horizon'' of the San Diego Supercomputer Center (SDSC), with support from the National Science Foundation, and on the IBM SP of the NERSC with support from the DoE. We thank all these institutions for a generous allocation of computer time.

Next: Bibliography Up: Parallel Implementation of the Previous: IBM SP system

Peter Hauschildt
2001-04-16