Next: About this document ...
Parallel Supercomputing In Stellar Atmosphere Simulations
Peter H. Hauschildt
Department of Physics & Astronomy
and
Center for Simulational Physics
The University of Georgia
Athens, GA 30602
in collaboration with
E. Baron (Univ. of OK)
F. Allard (WSU & CNRS/Lyon)
D. Lowenthal (UGA)
Overview
- 1.
- Motivation
- 2.
- The Computational Problem
- 3.
- Solution through parallelization
- 4.
- Parallel algorithms
- 5.
- Conclusions & The Future
Motivation
- astrophysical information through spectroscopy
- velocities
- temperatures
- densities
- element abundances
- luminosities
- detailed modeling required to extract physical data
from observed spectra
- follow radiative transfer
of photons of all wavelengths through the atmosphere in great detail
- synthesize spectrum to compare to observations
- basic physical model
- spherical shell
- static (stars) or expanding (novae, winds, SNe)
- hydrostatic or hydrodynamical equilibrium
- central source provides energy
- Constraint equations:
- energy conservation
temperature structure
- momentum conservation
pressure & velocity structure
- ``Auxiliary'' equations:
Radiative Transfer
true cm
Assumptions:
- spherical symmetry,
- time independence (),
- full special relativistic treatment in the
Lagrangian frame.
Spherically symmetric, special relativistic equation
of radiative transfer
- partial integro-differential equation,
- telegrapher's equation: boundary value problem in r
and initial value problem in (certain restrictions apply)
The equation of radiative transfer
true cm
with
and
- : specific intensity scaled by r2,
- r: radial coordinate,
- : cosine of the direction angle,
- v: velocity, , ,
- :
extinction coefficient,
- : emissivity.
Example for
with
- : thermal emission
- : electron scattering
- :spectral line emissivity
Numerical solution:
- Basic idea: discretize and
treat the boundary value problem for each wavelength individually,
- Operator splitting (OS) method
- solve along characteristics of the RTE
- iterative method:
- piecewise parabolic ansatz to calculate I for given J
- iterate to self-consistent solution for J
- eigenvalues of iteration matrix close to unity
use operator splitting to reduce eigenvalues of amplification
matrix
Statistical Equilibrium Equations
true cm
Line and continuum scattering
prevents the use of the -iteration for the
solution of the rate equations!
Solution of the statistical equilibrium equations:
true cm
Operator Splitting method:
- define a ``rate operator'' in analogy to the -operator:
Rij = [Rij][n]
- define an ``approximate rate operator'' and
write the iteration scheme in the form:
=0pt
The Computational Problem
- 1.
- input data size
- number of atomic/ionic spectral lines: GB
- number of diatomic molecular lines: GB
- number of hot water vapor lines:
- 2.
- sep=0pt
- before 1994:
- 1994: GB
- 1997: GB
- expected 1998: GB
- all lines need to be accessed in a line selection
procedure
- line selection creates sub-lists that can be as large as
the original list
- poses a significant data handling problem!
- 3.
- memory/IO requirements
- line lists too large for memory
scratch files & block algorithm
trade memory for IO bandwidth
- number of individual energy levels: MB
- number of individual transitions:
MB
- EOS data storage MB
- auxiliary storage MB
- total memory requirement MB
- number of individual energy levels and transitions will increase
dramatically memory requirements GB
- 4.
- (serial) CPU time
- small for each individual point on the wavelength grid:
msec
- number of wavelength points for radiative transfer:
30,000-300,000 (can be >106)
- up to 30,000 sec to ``sweep'' once through all wavelength points
- typically, iterations (sweeps) are required to
obtain an equilibrium model
- CPU days
- there are, literally, 100's of models in a typical grid ...
Solution through parallelization
- 1.
- portability issues
- large number of simulations
- complex code (verification on several different architectures)
- need to be able to run efficiently on different parallel supercomputers
- Fortran 90 & MPI
- available on all major platforms
- public domain implementation MPICH
- 2.
- memory issues
- MPI available on distributed memory systems
- large aggregate memory of distributed memory machines
- allows reduction of memory requirements per node
- larger model calculations possible!
- 3.
- IO issues
- parallel filesystems allow file partitioning
- results in higher IO throughput
- parallel code: each node needs to read only part of the data
IO reduction on per-node basis
- but sometimes better scalability means more IO
- 4.
- scalability issues
- allows more efficient usage of multiple CPUs
- reduces wall-clock time for typical simulations
- depends very often on type of models:
some simulations (stars) allow algorithms that scale very well, but
some simulations (novae, SNe) do not
- implement several algorithms that can be selected at run-time
to obtain ``best'' overall performance while making simulations feasible!
Parallel algorithms
- 1.
- The PHOENIX code
- 2.
- Spectral line selection
- serial version can take 3h for the large line lists!
- 3 separate line lists task parallelism
- data parallelism: use client-server model
- clients: select lines in assigned chunks of the line list files
- server: collects data and creates sorted sub-list files
- potentially reduces time proportional to the number of client nodes
- creates large message traffic
- 3.
- Spectral line opacity calculations
- 4.
- Radiative transfer
- characteristic ``rays'' are independent
parallelize ``formal solution'' part of iteration
- problem: number of mesh points along each characteristic is different
- need to balance load by distributing sets of characteristics to nodes
- scalability low because of load-balance and communication overhead
problems
- 5.
- Parallelizing the wavelength loop
Conclusions & The Future
- parallelization of PHOENIX allows physically more detailed
models
- decrease in wall-clock time per model is substantial for many
types of simulations
- coding effort to implement MPI calls relatively small
(about 7400 lines)
- logic for algorithm selection and load balancing fairly complex
- parallel version of PHOENIX is regularly used in production
- depending on simulation type we use between 4 and 64 nodes (single CPU)
In the Future
- parallel asynchronous IO (MPI-2)
- radiative transfer for arbitrary velocity fields (requires 70-700 GB
of memory/disk space)
- time dependence
- 3D RT in the Lagrangian frame for optically thick moving configurations
Next: About this document ...
Peter H. Hauschildt
4/27/1999