In the case of a static model atmosphere (), Eq. 2 can be solved independently for each wavelength point because the non-coherent scattering is handled in the rate-operator formalism. This means that for static atmospheres the parallelization over wavelength is simple and involves no communication or synchronization during the spectrum calculations. For our large set of wavelengths points, this will lead to good parallel performance. This is illustrated in Fig. 3 for a NLTE model atmosphere run with parameters appropriate for the A0V star Vega (, , solar abundances). The model includes about 4500 NLTE levels with nearly 51000 primary NLTE lines (with with detailed Voigt profiles for nearly 39000 of them), about 320000 background LTE lines and 340000 secondary NLTE lines (dynamically selected). The calculation was performed on a grid of about 270000 wavelength points. This is a typical case of a main sequence star NLTE model. The memory requirements of this calculations are high, therefore, we had to use at least 2 worker nodes per wavelength cluster on one of the IBM SP2s that we were using for this test. This model is a static atmosphere, so that different wavelength points are independent from each other and no communication between clusters is required until the spectrum calculation is complete. Therefore, the scalability of the calculation is excellent, in particular on the SGI Origin 2000 and the IBM SP2 runs with a single worker node per wavelength cluster. Clearly, it is more effective for this model type to use the minimum number of worker nodes per wavelength cluster to minimize communication and other overheads. The overhead due to a limited number of IO nodes and limited IO bandwidth available on the production IBM SP2 we used for the tests reduces the speedup for large number of nodes when nodes start to compete for the available IO bandwidth.
Clearly, for a very small number of processors the wavelength parallelization is less effective than is the spatial parallelization. This is caused by processors competing for IO bandwidth, rather than synchronization problems. However, once the number of processors begins to increase, the wavelength parallelization clearly scales significantly better than does the spatial parallelization. Therefore, it is optimal to use the minimal number of worker nodes per wavelength cluster (defined such that the code fits completely into the memory available at each node) and use as many wavelength clusters as possible.