Figure 18 shows the scaling properties of the
MPI version of the 3D radiation transport code for the
test case. The runs were performed on 2 parallel compute clusters,
one equipped with 1.8GHz dual Opteron CPUs and an Infiniband interconnect from
Delta computer and one equipped with 2.0GHz dual G5 CPUs with Gbit ethernet
network from Apple computer (Xserves). The speedup we obtain in
the MPI version is close to optimal, about a factor of 28 with 32 MPI processes
on 32 CPUs (or 16 compute nodes). The fact that the speedup is very good
shows that the load balancing is optimal and that the time spent in the
MPI communication routines is negligible compared to the compute times.
With voxels, the code uses about 0.6GB of memory. With 10 CPUs and
angles, the wallclock time for a formal solution is about 310sec
(400sec) on 2.0GHz Xserve G5s (on 1.8GHz Opterons), 9sec (9sec) for the
required MPI communication, and between 3-26sec (12-120sec) to solve the
linear system. Since the linear system is solved iteratively, the time for the
solution is reduces as the overall convergence limit is approached.