We use the method discussed in for the numerical
solution of the special relativistic radiative transfer equation (RTE)
at every wavelength point . This
iterative scheme is based on the operator splitting approach. The RTE
is written in its characteristic form and the formal solution along the
characteristics is done using a piecewise parabolic integration . We use the exact
band-matrix subset of the discretized -operator as the
'approximate
-operator' (ALO) in the operator splitting
iteration scheme . This has the advantage of
giving very good convergence and high speed-up when compared to diagonal
ALO's.
The serial radiative transfer code has been optimized for superscalar and vector computers and is numerically very efficient. It is therefore crucial to optimize the ratio of communication to computation in the parallel implementation of the radiative transfer method. In terms of CPU time, the most costly parts of the radiative transfer code are the setup of the PPM interpolation coefficients and the formal solutions (which have to be performed in every iteration). The construction of a tri-diagonal ALO requires about the same CPU time as a single formal solution of the RTE and is thus not a very important contributor to the total CPU time required to solve the RTE at every given wavelength point.
In principle, the computation of the PPM coefficients does not require
any communication and thus could be distributed arbitrarily between
the nodes. However, the formal solution is recursive along each
characteristic. Within the formal solution, communication is only
required during the computation of the mean intensities J, as they
involve integrals over the angle at every radial point. Thus, a
straightforward and efficient way to parallelize the radiative transfer code is to
distribute sets of characteristics onto different nodes. This will
minimize the communication during the iterations, and thus optimize
the performance. Within one iteration step, the current values of the
mean intensities need to be broadcast to all radiative transfer nodes and the new
contributions of every radiative transfer node to the mean intensities at every
radius must be sent to the master node. The master radiative transfer node then
computes and broadcasts an updated J vector using the operator splitting scheme,
the next iteration begins, and the process continues until the solution
is converged to the required accuracy. The setup, i.e., the
computation of the PPM interpolation coefficients and the construction
of the ALO, can be parallelized using the same method and node
distribution. The communication overhead for the setup is roughly
equal to the communication required for a single iteration.
An important point to consider is the load balancing between the
radiative transfer nodes. The workload to compute the formal solution along
each characteristic is proportional to the number of intersection points
of the characteristic with the concentric spherical shells of the radial
grid (the `number of points' along each characteristic). Therefore, if
the total number of points is
, the optimum solution would be
to let each radiative transfer node work on
points. This optimum can,
in general, not be reached exactly because it would require splitting
characteristics between nodes (which involves both communication and
synchronization). A simple load distribution based on
, where
is the total number of characteristics, is
far from optimal because the characteristics do not have the same number
of intersection points (consider tangential characteristics with
different impact parameters). We therefore chose a compromise of
distributing the characteristics to the radiative transfer nodes so that the total
number of points that are calculated by each node is roughly the same
and that every node works on a different set of characteristics.
. This has the advantage of giving very good convergence and high speed-up when compared to diagonal ALO's.
The serial radiative transfer code has been optimized for superscalar and vector computers and is numerically very efficient. It is therefore crucial to optimize the ratio of communication to computation in the parallel implementation of the radiative transfer method. In terms of CPU time, the most costly parts of the radiative transfer code are the setup of the PPM interpolation coefficients and the formal solutions (which have to be performed in every iteration). The construction of a tri-diagonal ALO requires about the same CPU time as a single formal solution of the RTE and is thus not a very important contributor to the total CPU time required to solve the RTE at every given wavelength point.
In principle, the computation of the PPM coefficients does not require
any communication and thus could be distributed arbitrarily between
the nodes. However, the formal solution is recursive along each
characteristic. Within the formal solution, communication is only
required during the computation of the mean intensities J, as they
involve integrals over the angle at every radial point. Thus, a
straightforward and efficient way to parallelize the radiative transfer code is to
distribute sets of characteristics onto different nodes. This will
minimize the communication during the iterations, and thus optimize
the performance. Within one iteration step, the current values of the
mean intensities need to be broadcast to all radiative transfer nodes and the new
contributions of every radiative transfer node to the mean intensities at every
radius must be sent to the master node. The master radiative transfer node then
computes and broadcasts an updated J vector using the operator splitting scheme,
the next iteration begins, and the process continues until the solution
is converged to the required accuracy. The setup, i.e., the
computation of the PPM interpolation coefficients and the construction
of the ALO, can be parallelized using the same method and node
distribution. The communication overhead for the setup is roughly
equal to the communication required for a single iteration.
An important point to consider is the load balancing between the
radiative transfer nodes. The workload to compute the formal solution along
each characteristic is proportional to the number of intersection points
of the characteristic with the concentric spherical shells of the radial
grid (the `number of points' along each characteristic). Therefore, if
the total number of points is
, the optimum solution would be
to let each radiative transfer node work on
points. This optimum can,
in general, not be reached exactly because it would require splitting
characteristics between nodes (which involves both communication and
synchronization). A simple load distribution based on
, where
is the total number of characteristics, is
far from optimal because the characteristics do not have the same number
of intersection points (consider tangential characteristics with
different impact parameters). We therefore chose a compromise of
distributing the characteristics to the radiative transfer nodes so that the total
number of points that are calculated by each node is roughly the same
and that every node works on a different set of characteristics.