The calculation of using the algorithm outlined can be vectorized and parallelized with respect to the ray index k and the row index j for any given bandwidth of . In addition, quantities like , , and can be pre-calculated and stored, a process which is fully vectorizable and parallelizable.
For each point on a ray, the computation of the specific intensity uses about 7 floating point operations (flops), whereas the computation of the and takes only 1 flop per intersection point. In addition, about 3 flops are needed for the integration over the angle coordinate in order to compute the mean intensities J and the -operator. We have to calculate the formal solution for points, where is the number of discrete shells, is the number of core intersecting characteristics and is the number of tangent rays. Therefore, the number of flops required for the computation of the specific intensities at all points is . To estimate the number of flops required for the calculation of a -operator with a bandwidth of , we assume that each point of a ray has nearest neighbors, thus overestimating the number of operations. In this approximation, we have to compute auxiliary variables or . Therefore, about floating point operations are needed to compute the -operator and the ratio of the numerical work needed for the computation of a -operator with a bandwidth of and one formal solution is of the order of . This expression actually significantly overestimates the number of operations required for the construction of the operator, in particular for larger bandwidths (the effects of the boundaries become more important for larger bandwidths). For example, according to this estimate the computation of the full -matrix for takes about the same time as 20 formal solutions, however, the actual time used for the construction of the full -matrix corresponds only to about 6 formal solutions on many machines. This indicates that the number of iterations must be rather small in order to make ALO's with small bandwidth competitive in terms of speed for the solution of radiative transfer problems and that the initial guess for the source function will have a large influence on the optimum bandwidth. The best strategy is to use monitoring to predict the ``optimum'' bandwidth that gives the shortest time for the solution of the SSRTE at any given wavelength point in an ``adaptive bandwidth operator splitting'' method, see Ref.  for details and results for a number of machines.
In order to accelerate convergence the Ng method  or the Orthomin method  may be used (see Auer  for a review of different acceleration methods). These methods can cut down the number of iterations required to reach a prescribed accuracy by a factor of two or more with only a small increase in computational overhead.