Parallelization model in F95 Parallel DFT
A parallelization model in F95 Parallel DFT
Effective parallelization of a program for a large number of processors requires
that almost all parts of the program are parallelized. Therefore, the
algorithm
of this package combines several different steps of significant computational expenses and
all those steps demand separate parallelization and optimization strategies. In order to
facilitate the program structure for the complex algorithm, a master-driven overall
parallelization has been selected with multiple programs multiple data model( see
Figure 1. that shows a master/workers implementation of the original program by using
MPI message passing library ). We know that, in
our previous serial program of numerical dft, the
processing time is almost same from job to job due to either block or cyclic
distribution on grids doing a good job in load balance. However, our Linux cluster in our
lab is built with Intel pentium II and III processors, having heterogeneous environment.
Therefore, tasks on each processor couldn't be with same size in order to obtain
good performance. By the way, in following development of this program, we certainly
have some cases where processing time varies significantly from job to job,
and either block or cyclic distribution cann't arrive at load balance, for example,
with integrals with Gaussian basis functions.
The master controls the order of the tasks executed
by the worker by message send to them. The workers decide with the help of the message
identifier of incoming messages how to process the contents of this message and wait
for new messages from the master after finishing processing. At some point of the algorithm,
direct communication between all hosts is necessary. Such phases are started by messages from
the master to the all workers. After that point, the workers execute an algorithm that is
especially adapted to the given task and communication needs, but fall back to the normal
master-driven behavior at the end of the special algorithm.
In another consideration of the program, it is that an error handling procedure
that aborts the calculations in a controlled way at all points of the program.
In case an error is detected by a worker, the worker sends an error message to the master.
After receiving an error message or detecting an error itself, the master should output
the message, send a message to the workers that forces them to abort the program
execution and abort the program itself.
The generic programming interface for PVM and MPI in this package implements its own automatic administration
of send and receive buffers in the MPI variant like many other packages. The thing
is also similar with Global Array Tookit, which provides an efficient and portable
"shared-memory" programming interface for distributed-memory computers.
However, MPI allows closer control of the communication by the programmer,
which can be preferable, in particular when large amount of data have to be transmitted.
In Figure 2.,
a parallel performance has been shown on SP2 in Minnesota Supercomputing Institute
for the minimum energy structure of UO2+24H2O with
water cluster.
back to F95 Parallel DFT Home Page
Send us your comments to Dr. Anguang Hu, email: ahu@chem.umn.edu