In the case of a static model atmosphere (), Eq. 2 can be
solved independently for each wavelength point because the non-coherent
scattering is handled in the rate-operator formalism. This means that
for static atmospheres the parallelization over wavelength is simple
and involves no communication or synchronization during the spectrum
calculations. For our large set of wavelengths points, this will lead
to good parallel performance. This is illustrated in Fig. 3
for a NLTE model atmosphere run with parameters appropriate for the A0V
star Vega (
,
, solar abundances). The model
includes about 4500 NLTE levels with nearly 51000 primary NLTE lines
(with with detailed Voigt profiles for nearly 39000 of them), about
320000 background LTE lines and 340000 secondary NLTE lines (dynamically
selected). The calculation was performed on a grid of about 270000
wavelength points. This is a typical case of a main sequence star NLTE
model. The memory requirements of this calculations are high, therefore,
we had to use at least 2 worker nodes per wavelength cluster on one of
the IBM SP2s that we were using for this test. This model is a static
atmosphere, so that different wavelength points are independent from
each other and no communication between clusters is required until the
spectrum calculation is complete. Therefore, the scalability of the
calculation is excellent, in particular on the SGI Origin 2000 and the IBM SP2 runs with a single worker node per wavelength cluster. Clearly, it is more
effective for this model type to use the minimum number of worker nodes
per wavelength cluster to minimize communication and other overheads. The
overhead due to a limited number of IO nodes and limited IO bandwidth
available on the production IBM SP2 we used for the tests reduces the
speedup for large number of nodes when nodes start to compete for the
available IO bandwidth.
Clearly, for a very small number of processors the wavelength parallelization is less effective than is the spatial parallelization. This is caused by processors competing for IO bandwidth, rather than synchronization problems. However, once the number of processors begins to increase, the wavelength parallelization clearly scales significantly better than does the spatial parallelization. Therefore, it is optimal to use the minimal number of worker nodes per wavelength cluster (defined such that the code fits completely into the memory available at each node) and use as many wavelength clusters as possible.