Parallelization model in F95 Parallel DFT

A parallelization model in F95 Parallel DFT

Effective parallelization of a program for a large number of processors requires that almost all parts of the program are parallelized. Therefore, the algorithm of this package combines several different steps of significant computational expenses and all those steps demand separate parallelization and optimization strategies. In order to facilitate the program structure for the complex algorithm, a master-driven overall parallelization has been selected with multiple programs multiple data model( see Figure 1. that shows a master/workers implementation of the original program by using MPI message passing library ). We know that, in our previous serial program of numerical dft, the processing time is almost same from job to job due to either block or cyclic distribution on grids doing a good job in load balance. However, our Linux cluster in our lab is built with Intel pentium II and III processors, having heterogeneous environment. Therefore, tasks on each processor couldn't be with same size in order to obtain good performance. By the way, in following development of this program, we certainly have some cases where processing time varies significantly from job to job, and either block or cyclic distribution cann't arrive at load balance, for example, with integrals with Gaussian basis functions. The master controls the order of the tasks executed by the worker by message send to them. The workers decide with the help of the message identifier of incoming messages how to process the contents of this message and wait for new messages from the master after finishing processing. At some point of the algorithm, direct communication between all hosts is necessary. Such phases are started by messages from the master to the all workers. After that point, the workers execute an algorithm that is especially adapted to the given task and communication needs, but fall back to the normal master-driven behavior at the end of the special algorithm. In another consideration of the program, it is that an error handling procedure that aborts the calculations in a controlled way at all points of the program. In case an error is detected by a worker, the worker sends an error message to the master. After receiving an error message or detecting an error itself, the master should output the message, send a message to the workers that forces them to abort the program execution and abort the program itself.

The generic programming interface for PVM and MPI in this package implements its own automatic administration of send and receive buffers in the MPI variant like many other packages. The thing is also similar with Global Array Tookit, which provides an efficient and portable "shared-memory" programming interface for distributed-memory computers. However, MPI allows closer control of the communication by the programmer, which can be preferable, in particular when large amount of data have to be transmitted.

In Figure 2., a parallel performance has been shown on SP2 in Minnesota Supercomputing Institute for the minimum energy structure of UO2+24H2O with water cluster.



back to F95 Parallel DFT Home Page

Send us your comments to Dr. Anguang Hu, email: ahu@chem.umn.edu