Next: Equations and Problem Description Up: Parallel Implementation of the Previous: Parallel Implementation of the

Introduction

Spectroscopy is one of the most important tools in all of astrophysics. It is through the use of spectroscopy that we have discovered the cosmological expansion and determined the elemental composition of the sun. Currently detailed spectroscopic analyses are used to date the age of the galaxy , to determine the structure, energies, and compositions of novae and supernovae to probe the conditions at the time of galaxy formation via examining damped Lyman alpha clouds at high redshift , and to confirm the reality of claims for the discovery of sub-stellar objects .

We have developed the spherically symmetric special relativistic non-LTE generalized radiative transfer and stellar atmosphere computer code PHOENIX

which can handle very large model atoms as well as line blanketing by millions of atomic and molecular lines. This code is designed to be very flexible, it is used to compute model atmospheres and synthetic spectra for, e.g., novae, supernovae, M and brown dwarfs, O to M giants, white dwarfs and accretion disks in Active Galactic Nuclei (AGN); and it is highly portable. We include a large number of line transitions and solve the radiative transfer equation for each of them without using simple approximations (like the Sobolev approximation), and therefore the line profiles must be resolved in the co-moving (Lagrangian) frame. This requires many wavelength points (we typically use 150,000 to 300,000). Since the CPU time scales linearly with the number of wavelength points, the CPU time requirements of such a calculation are large. In addition, NLTE radiative rates for both line and continuum transitions must be calculated and stored at every spatial grid point for each transition, which requires large amounts of storage and can cause significant performance degradation if the corresponding routines are not optimally coded.

In order to take advantage of the enormous computing power and vast aggregate memory sizes of modern parallel supercomputers, both potentially allowing much faster model construction as well as more sophisticated models, we have developed a parallel version of PHOENIX. Since the code uses a modular design, we have implemented different parallelization strategies for different modules in order to maximize the total parallel speed-up of the code. In addition, our implementation allows us to change the distribution of computational work onto different nodes both via input files and dynamically during a model run, which gives a high degree of flexibility to optimize the performance for both a number of different parallel supercomputers (we are currently using IBM SP2s, SGI Origin 2000s, HP/Convex SPP-2000s, and Cray T3Es) and for different model parameters.

Since we have both large CPU and memory requirements we have developed the parallel version of the code using the MPI message passing library . The single processor speed of a machine like the 2 is moderately high so that even a small number of additional processors can lead to significant speed-up. We have chosen to work with the MPI message passing interface, since it is both portable , running on dedicated parallel machines and heterogeneous workstation clusters and it is available for both distributed and shared memory architectures. For our application, the distributed memory model is in fact easier to use than a shared memory model, since then we do not have to worry about locks and synchronization, on small scales and, in addition, we retain full control over interprocess communication. This is especially clear once one realizes that it can be more cost-effective to avoid costly communication by executing identical code on many processing elements (or nodes) as long as the impact on the total CPU time is small, rather than parallelizing each individual module with the corresponding high cost of communication and loop overhead. Distributed massively parallel supercomputers also typically have more aggregate memory, which enables them to run much larger simulations than traditional serial computers. Our initial parallelization of the code was straightforward in that we distributed the computations among the different modules (task parallelism) and we were further able to sub-divide some of the modules by utilizing data parallelism in, e.g, the radial coordinate or individual spectral lines. Thus, PHOENIX uses both task and data parallelism at the same time in order to optimized performance and allow larger model calculations.

was straightforward in that we distributed the computations among the different modules (task parallelism) and we were further able to sub-divide some of the modules by utilizing data parallelism in, e.g, the radial coordinate or individual spectral lines. Thus, PHOENIX uses both task and data parallelism at the same time in order to optimized performance and allow larger model calculations.

Next: Equations and Problem Description Up: Parallel Implementation of the Previous: Parallel Implementation of the

Peter H. Hauschildt
4/27/1999