next up previous
Next: NLTE calculations Up: Radiative transfer in expanding Previous: Computation of

Numerical Considerations

The calculation of $\ifmmode{\Lambda^*}\else\hbox{$\Lambda^*$}\fi$ using the algorithm outlined can be vectorized and parallelized with respect to the ray index k and the row index j for any given bandwidth of $\ifmmode{\Lambda^*}\else\hbox{$\Lambda^*$}\fi$. In addition, quantities like $\exp(-\Delta\tau^k_{i-1})$, $\alpha^k_i$, $\beta^k_i$ and $\gamma^k_i$ can be pre-calculated and stored, a process which is fully vectorizable and parallelizable.

For each point on a ray, the computation of the specific intensity uses about 7 floating point operations (flops), whereas the computation of the $\lambda^k_{i,j}$ and $\hat\lambda^k_{i,j}$ takes only 1 flop per intersection point. In addition, about 3 flops are needed for the integration over the angle coordinate $\mu$ in order to compute the mean intensities J and the $\ifmmode{\Lambda^*}\else\hbox{$\Lambda^*$}\fi$-operator. We have to calculate the formal solution for $ N_{\rm T}(N_{\rm T}+1)+N_{\rm
T}+2N_{\rm S}N_{\rm C}$ points, where $N_{\rm S}$ is the number of discrete shells, $N_{\rm C}$ is the number of core intersecting characteristics and $N_{\rm T}=N_{\rm S}-1$ is the number of tangent rays. Therefore, the number of flops required for the computation of the specific intensities at all points is $\approx 10[(N_{\rm S}+1)(N_{\rm
S}-1)+2N_{\rm S}N_{\rm C}]$. To estimate the number of flops required for the calculation of a $\ifmmode{\Lambda^*}\else\hbox{$\Lambda^*$}\fi$-operator with a bandwidth of $N_{\rm
B} \le N_{\rm S}$, we assume that each point of a ray has $N_{\rm B}$nearest neighbors, thus overestimating the number of operations. In this approximation, we have to compute $\le N_{\rm B}N_{\rm T}(N_{\rm
T}+2)+2N_{\rm B}N_{\rm S}N_{\rm C}$ auxiliary variables $\lambda^k_{i,j}$or $\hat\lambda^k_{i,j}$. Therefore, about $\le 4N_{\rm B}[(N_{\rm
S}-1)(N_{\rm S}+1)+2N_{\rm S}N_{\rm C}]$ floating point operations are needed to compute the $\ifmmode{\Lambda^*}\else\hbox{$\Lambda^*$}\fi$-operator and the ratio of the numerical work needed for the computation of a $\ifmmode{\Lambda^*}\else\hbox{$\Lambda^*$}\fi$-operator with a bandwidth of $N_{\rm B}$ and one formal solution is of the order of $2N_{\rm
B}/5$. This expression actually significantly overestimates the number of operations required for the construction of the $\ifmmode{\Lambda^*}\else\hbox{$\Lambda^*$}\fi$operator, in particular for larger bandwidths (the effects of the boundaries become more important for larger bandwidths). For example, according to this estimate the computation of the full $\Lambda$-matrix for $N_{\rm S}=50$ takes about the same time as 20 formal solutions, however, the actual time used for the construction of the full $\Lambda$-matrix corresponds only to about 6 formal solutions on many machines. This indicates that the number of iterations must be rather small in order to make ALO's with small bandwidth competitive in terms of speed for the solution of radiative transfer problems and that the initial guess for the source function will have a large influence on the optimum bandwidth. The best strategy is to use monitoring to predict the ``optimum'' bandwidth that gives the shortest time for the solution of the SSRTE at any given wavelength point in an ``adaptive bandwidth operator splitting'' method, see Ref. [25] for details and results for a number of machines.

In order to accelerate convergence the Ng method [30] or the Orthomin method [31] may be used (see Auer [32] for a review of different acceleration methods). These methods can cut down the number of iterations required to reach a prescribed accuracy by a factor of two or more with only a small increase in computational overhead.

next up previous
Next: NLTE calculations Up: Radiative transfer in expanding Previous: Computation of
Peter H. Hauschildt