Adaptive Runtime Management for Dynamic Applications

Project Director:
Manish Parashar

Visiting Scientist:
Johan Steensland

Graduate Students:
Sumir Chandra
Xiaolin Li
Li Zhang

Alumni:
Dhaval Kapadia
Sivapriya Ramanathan
Taher Saif
Swati Sharma
Shweta Sinha
Hailan Zhu

Objective:
    This projects investigates the design and evaluation of an adaptive runtime management framework for distributed adaptive grid hierarchies that underlie parallel adaptive mesh-refinement (AMR) techniques for the solution of partial differential equations. The framework uses application and system state information to select the appropriate partitioning scheme, distribution parameters (e.g. granularity per processor), communication mechanism, number and type of processors to be used, etc., at runtime. The overall objective of the framework is the design and development of policy driven tools for automated configuration and runtime management of distributed adaptive applications on dynamic and heterogeneous networked computing environments. 

Introduction:
    Dynamically adaptive methods for the solution of partial differential equations that employ locally optimal approximations can yield highly advantageous ratios for cost/accuracy when compared to methods based upon static uniform approximations. These techniques seek to improve the accuracy of the solution by dynamically refining the computational grid in regions of high local solution error. Distributed implementations of these adaptive methods offer the potential for the accurate solution of realistic models of important physical systems. These implementations, however, lead to interesting challenges in dynamic resource allocation, data-distribution and load balancing, communications and coordination, and resource management. The overall efficiency of the algorithms is limited by the ability to partition the underlying data-structures at runtime so as to expose all inherent parallelism, minimize communication/synchronization overheads, and balance load. A critical requirement while partitioning adaptive grid hierarchies is the maintenance of logical locality, both across different levels of the hierarchy under expansion and contraction of the adaptive grid structure, and within partitions of grids at all levels when they are decomposed and mapped across processors. The former enables efficient computational access to the grids while the latter minimizes the total communication and synchronization overheads. Furthermore, application adaptivity results in application grids being created, moved and deleted on-the-fly, making it is necessary to efficiently re-partition the hierarchy so that it continues to meet these goals. 

Moving these applications to dynamic and heterogeneous networked computing environments introduces a new level of complexity. These environments require selecting and configuring application components based on available resources. However, the complexity and heterogeneity of the environment make the selection of a "best" match between system resources, application algorithms, problem decompositions, mappings and load distributions, communication mechanisms, etc., non-trivial. System dynamics coupled with application adaptation makes application configuration and runtime management a significant challenge. Clearly, there is a need for "smart" tools that can automate the configuration and management process. This project is a step towards that direction.

Application sensitive adaptation uses the current state of the application to drive the runtime selection of:
1) Partitioning scheme and partitioner (SFC, Vampire, ParMetis, ITB, CGD, ...)
2) Partitioning granularity (patch size, aspect ratio, ....)
3) Number and type of processors used

Application sensitive adaptation is based on:
1) application-centric characterization of dynamic partitioning/load-balancing schemes based on a 5-component quality metric
2) characterization of application execution state based on the octant approach

System sensitive adaptation investigates the use of current system state to drive runtime adaptation. System state (computing, network) information is obtained using SNMP MIB's. Adaptation include the selection of:
1) number of processors to be used
2) work allocation per processors
3) distribution type (e.g. optimize for latency tolerance or load-balance)

Publications:

1. Characterization of Domain-based Partitioners for Parallel SAMR Applications  [PDF]
    Johan Steensland, Sumir Chandra, Manish Parashar, and Michael Thune.
    IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), Las Vegas, USA, pp. 425-430, November 2000.

2. An Evaluation of Partitioners for Parallel SAMR Applications  [PDF]
    Sumir Chandra and Manish Parashar.
    Accepted for publication in the proceedings of European Conference on Parallel Computing (Euro-Par '01), Manchester, UK, August 2001.

3. An Application-centric Characterization of Domain-based Partitioners for Parallel Adaptive Mesh Refinement  [PDF]
    Johan Steensland, Sumir Chandra, and Manish Parashar.
    Submitted to IEEE Transactions on Parallel and Distributed Systems, February 2001.

4. An Experimental Study of Adaptive Application Sensitive Partitioning Strategies for SAMR Applications  [PDF]
    Sumir Chandra, Johan Steensland, and Manish Parashar.
    Submitted to IEEE/ACM Supercomputing, May 2001.

5. System Sensitive Runtime Management of Adaptive Applications  [PDF]
    Shweta Sinha and Manish Parashar.
    Accepted at Tenth IEEE Heterogenous Computing Workshop (HCW), San Francisco, USA, 2001.