next up previous
Next: Wavelength Parallelization Up: Equations and Problem Description Previous: Definition of terms

Task and data parallel algorithms in PHOENIX

In a previous paper we described our method for parallelizing three separate modules: (1) The radiative transfer calculation itself, where we divide up the characteristic rays among nodes and use an MPI_REDUCE to send the J_$J_{\nu}$ to all the radiative transfer and NLTE rate computation tasks; (2) the line opacity which requires the calculation of about 10,000 Voigt profiles per wavelength point at each radial grid point, here we split the work amongst the processors both by radial grid point and by dividing up the individual lines to be calculated among the processors; and (3) the NLTE calculations. The NLTE calculations involve three separate parts: the calculation of the NLTE opacities, the calculation of the rates at each wavelength point, and the solution of the NLTE rate equations. In Paper I we performed all these parallelizations by distribution of the radial grid points among the different nodes or by distributing sets of spectral lines onto different nodes. In addition, to prevent communication overhead, each task computing the NLTE rates is paired on the same node with and the corresponding task computing NLTE opacities and emissivities to reduce communication. The solution of the rate equations parallelizes trivially with the use of a diagonal rate operator.

In the latest version of our code, PHOENIX 8.1, we have incorporated the additional strategy of distributing each NLTE species (the total number of ionization stages of a particular element treated in NLTE) on separate nodes. Since different species have different numbers of levels treated in NLTE (e.g. Fe II [singly ionized iron] has 617 NLTE levels, whereas H I has 30 levels), care is needed to balance the number of levels and NLTE transitions treated among the nodes to avoid unnecessary synchronization problems.

In addition to the data parallelism discussed above, the version of PHOENIX described in paper I also uses simultaneous task parallelism by allocating different tasks to different nodes. This can result in further speed-up and better scalability but requires a careful analysis of the workload between different tasks (the workload is also a function of wavelength, e.g., different number of lines that overlap at each wavelength point) to obtain optimal load balancing.

we described our method for parallelizing three separate modules: (1) The radiative transfer calculation itself, where we divide up the characteristic rays among nodes and use an MPI_REDUCE to send the J_$J_{\nu}$ to all the radiative transfer and NLTE rate computation tasks; (2) the line opacity which requires the calculation of about 10,000 Voigt profiles per wavelength point at each radial grid point, here we split the work amongst the processors both by radial grid point and by dividing up the individual lines to be calculated among the processors; and (3) the NLTE calculations. The NLTE calculations involve three separate parts: the calculation of the NLTE opacities, the calculation of the rates at each wavelength point, and the solution of the NLTE rate equations. In Paper I we performed all these parallelizations by distribution of the radial grid points among the different nodes or by distributing sets of spectral lines onto different nodes. In addition, to prevent communication overhead, each task computing the NLTE rates is paired on the same node with and the corresponding task computing NLTE opacities and emissivities to reduce communication. The solution of the rate equations parallelizes trivially with the use of a diagonal rate operator.

In the latest version of our code, PHOENIX 8.1, we have incorporated the additional strategy of distributing each NLTE species (the total number of ionization stages of a particular element treated in NLTE) on separate nodes. Since different species have different numbers of levels treated in NLTE (e.g. Fe II [singly ionized iron] has 617 NLTE levels, whereas H I has 30 levels), care is needed to balance the number of levels and NLTE transitions treated among the nodes to avoid unnecessary synchronization problems.

In addition to the data parallelism discussed above, the version of PHOENIX described in paper I also uses simultaneous task parallelism by allocating different tasks to different nodes. This can result in further speed-up and better scalability but requires a careful analysis of the workload between different tasks (the workload is also a function of wavelength, e.g., different number of lines that overlap at each wavelength point) to obtain optimal load balancing.


next up previous
Next: Wavelength Parallelization Up: Equations and Problem Description Previous: Definition of terms
Peter H. Hauschildt
4/27/1999