next up previous
Next: Conclusions Up: Performance for realistic full Previous: Central stars of nova

M dwarfs

We have run a realistic NLTE M dwarf test calculation with the following model parameters: $\hbox{$\,T_{\rm eff}$}=2700\k$, $\log(g)=5.0$, and solar abundances. As NLTE species we include H I (10 levels), Na I (3 levels), Ti I (395 levels), Ti II (204 levels), C I (228 levels), C II (85 levels), N I (252 levels), N II (152 levels), O I (36 levels), and O II (171 levels), for a total of 1591 levels and 15062 primary NLTE lines (treated in detail with individual Voigt profiles). We use 113,433 wavelength points and include 288,775 background atomic LTE lines (with 28,011 of these are strong enough to be included with individual Voigt profiles) as well as 12,861,979 molecular LTE lines (including 3,753,353 with Voigt profiles).

In Table 4 we give the wall-clock times for one iteration on the 2. We have used a ``small'' code configuration with blocksizes appropriate for machines with about 128MB RAM per node although the test machine had up to 300 per node paging space and we used very large search windows for the atomic, molecular and NLTE lines in order to obtain a ``worst case'' scenario. Table 4 shows that the calculation is dominated by the LTE atomic and molecular line opacity whereas the NLTE opacities and rates are only a second order contribution to the total time per iteration. The scaling of the calculation is, therefore, very good up to the largest configuration that we have tested. We could not run the test model on a single 2 CPU due to both wall-clock time and memory restrictions, this demonstrates the importance of parallelization for practical applications.

There are possibilities to reduce the wall-clock time by, e.g., using larger blocksizes and a specially tuned load-distribution. The last 2 entry in Table 4 shows that an alternative load distribution can easily improve the overall speed although now some of the sub-tasks require more wall-clock time.

We also include the timing results of the test run that we obtained on a single processor of a Cray C90 (CPU times). The Table shows that the C90 is about as fast as 5 nodes of the 2, which is roughly the relative performance ratio of a single 2 node to a single C90 processor. The wall-clock time on the C90 was much worse than on the 2, due to the time-sharing operation of the C90 CPUs.


next up previous
Next: Conclusions Up: Performance for realistic full Previous: Central stars of nova
Peter H. Hauschildt
4/27/1999