The calculation of using the
algorithm outlined can be vectorized and parallelized with respect to the
ray index *k* and the row index *j* for any given bandwidth of .
In addition, quantities like , ,
and can be pre-calculated
and stored, a process which is fully vectorizable and parallelizable.

For each point on a ray, the computation of the specific intensity uses
about 7 floating point operations (flops), whereas the computation
of the and takes only 1
flop *per intersection point*. In addition, about 3 flops are
needed for the integration over the angle coordinate in order to
compute the mean intensities *J* and the -operator. We have
to calculate the formal solution for points, where is the number
of discrete shells, is the number of core intersecting
characteristics and is the number of tangent
rays. Therefore, the number of flops required for the computation of the
specific intensities at all points is . To estimate the number of flops required
for the calculation of a -operator with a bandwidth of , we assume that each point of a ray has nearest neighbors, thus *overestimating* the number of operations. In
this approximation, we have to compute auxiliary variables or . Therefore, about floating point operations are
needed to compute the -operator and the ratio of the numerical
work needed for the computation of a -operator with a bandwidth
of and one formal solution is of the order of . This expression actually *significantly* overestimates the
number of operations required for the construction of the operator, in particular for larger bandwidths (the effects of the
boundaries become more important for larger bandwidths). For example,
according to this estimate the computation of the full -matrix
for takes about the same time as 20 formal solutions,
however, the actual time used for the construction of the full -matrix
corresponds only to about 6 formal solutions on many machines. This
indicates that the number of iterations must be rather small in order
to make ALO's with small bandwidth competitive in terms of speed for the
solution of radiative transfer problems and that the initial guess for the
source function will have a large influence on the optimum bandwidth. The
best strategy is to use monitoring to predict the ``optimum'' bandwidth
that gives the shortest time for the solution of the SSRTE at any given
wavelength point in an ``adaptive bandwidth operator splitting'' method,
see Ref. [25] for details and results for a number of machines.

In order to accelerate convergence the Ng method [30] or the Orthomin method [31] may be used (see Auer [32] for a review of different acceleration methods). These methods can cut down the number of iterations required to reach a prescribed accuracy by a factor of two or more with only a small increase in computational overhead.