The calculation of using the
algorithm outlined can be vectorized and parallelized with respect to the
ray index k and the row index j for any given bandwidth of
.
In addition, quantities like
,
,
and
can be pre-calculated
and stored, a process which is fully vectorizable and parallelizable.
For each point on a ray, the computation of the specific intensity uses
about 7 floating point operations (flops), whereas the computation
of the and
takes only 1
flop per intersection point. In addition, about 3 flops are
needed for the integration over the angle coordinate
in order to
compute the mean intensities J and the
-operator. We have
to calculate the formal solution for
points, where
is the number
of discrete shells,
is the number of core intersecting
characteristics and
is the number of tangent
rays. Therefore, the number of flops required for the computation of the
specific intensities at all points is
. To estimate the number of flops required
for the calculation of a
-operator with a bandwidth of
, we assume that each point of a ray has
nearest neighbors, thus overestimating the number of operations. In
this approximation, we have to compute
auxiliary variables
or
. Therefore, about
floating point operations are
needed to compute the
-operator and the ratio of the numerical
work needed for the computation of a
-operator with a bandwidth
of
and one formal solution is of the order of
. This expression actually significantly overestimates the
number of operations required for the construction of the
operator, in particular for larger bandwidths (the effects of the
boundaries become more important for larger bandwidths). For example,
according to this estimate the computation of the full
-matrix
for
takes about the same time as 20 formal solutions,
however, the actual time used for the construction of the full
-matrix
corresponds only to about 6 formal solutions on many machines. This
indicates that the number of iterations must be rather small in order
to make ALO's with small bandwidth competitive in terms of speed for the
solution of radiative transfer problems and that the initial guess for the
source function will have a large influence on the optimum bandwidth. The
best strategy is to use monitoring to predict the ``optimum'' bandwidth
that gives the shortest time for the solution of the SSRTE at any given
wavelength point in an ``adaptive bandwidth operator splitting'' method,
see Ref. [25] for details and results for a number of machines.
In order to accelerate convergence the Ng method [30] or the Orthomin method [31] may be used (see Auer [32] for a review of different acceleration methods). These methods can cut down the number of iterations required to reach a prescribed accuracy by a factor of two or more with only a small increase in computational overhead.