Figure 18 shows the scaling properties of the MPI version of the 3D radiation transport code for the test case. The runs were performed on 2 parallel compute clusters, one equipped with 1.8GHz dual Opteron CPUs and an Infiniband interconnect from Delta computer and one equipped with 2.0GHz dual G5 CPUs with Gbit ethernet network from Apple computer (Xserves). The speedup we obtain in the MPI version is close to optimal, about a factor of 28 with 32 MPI processes on 32 CPUs (or 16 compute nodes). The fact that the speedup is very good shows that the load balancing is optimal and that the time spent in the MPI communication routines is negligible compared to the compute times.
With voxels, the code uses about 0.6GB of memory. With 10 CPUs and angles, the wallclock time for a formal solution is about 310sec (400sec) on 2.0GHz Xserve G5s (on 1.8GHz Opterons), 9sec (9sec) for the required MPI communication, and between 3-26sec (12-120sec) to solve the linear system. Since the linear system is solved iteratively, the time for the solution is reduces as the overall convergence limit is approached.