Multithreaded Computational Engine

The objective of this project is to develop a multithreaded communication engine to support the GrACE infrastructure for AMR applications based on adaptive grid hierarchies. The overall goal is to improve performance by overlapping computations on individual grid blocks with associated (inter-level and intra-level) communications.

Dynamically adaptive methods for the solution of partial differential equations that employ locally optimal approximations can yield highly advantageous ratios for cost/accuracy when compared to methods based upon static uniform approximations. These techniques seek to improve the accuracy of the solution by dynamically refining the computational grid in regions of high local solution error.
Distributed implementations of these adaptive methods offer the potential for the accurate solution of realistic models of important physical systems. These implementations however, lead to interesting challenges in dynamic resource allocation, data-distribution and load balancing, communications and coordination, and resource management. The overall efficiency of the algorithms is limited by the ability to partition the underlying data-structures at run-time to expose all inherent parallelism, minimize communication/synchronization overheads, and balance load. This motivates the need for an efficient communication engine to minimize communication and synchronization overheads.
Performance analysis of AMR applications showed that in certain cases, up to 50% of the total execution time can be spent in synchronizing ghost regions between grid blocks at different levels of the grid hierarchies. This limited the overall scalability of these applications. This led us to the conclusion that to improve performance, synchronization time has to minimised by exploiting inherent parallelism available during computation on blocks. This can be done by incorporating multithreading into the library.

The multithreaded engine enables overlap between the computations and communications on grid blocks owned by a processor. As the processor cycles through its grid blocks sequentially, communication of ghost regions of completed blocks are scheduled concurrently. Our multithread engine consists of two classes of threads: synchronization threads and computation threads.  Synchronization threads are normally dormant and are activated only when communication is required. Computation threads are responsible for all management and computational tasks (i.e. for setting up the grid hierarchy, for data and storage management and storage and load balancing). The key motivation for such a simple models is that only one thread interacts with MPI at any time and so the implementation does not depend on thread-safe MPI which is not available on all platforms. The operation of the threaded engine is as follows:

Ghost Synchronizations:  The multithreaded engine guarantees the semantics of a computational loop followed by ghost synchronization where the all the ghost regions are updated on return form the synchronization call. When the computational loop is initiated (e.g. GrACE forall loop), the send thread and receive thread are signaled. All communication tasks are then offloaded to these two threads. At the end of the computational loop, all block communications are guaranteed to be complete.

Redistribution: During redistribution, the grid data is communicated by the communication thread  using the signaling mechanisms and mutexes available to us in the thread libraries. The computation thread signals the send and receive threads once computation has begun on the blocks. The send thread, receive thread and computation threads synchronize using condition variables and send and receive queues.

The multithreaded communication engine is being developed using the POSIX pthreads library to enable easy porting of this communication engine to different operating systems. Current implementation and experimentation is being done on Sun E10K machines that have the Sun HPC 5.0 installed on them.