1. Motivation and introduction

High-performance simulations for physical phenomena and mathematical problems executing on distributed, heterogeneous and dynamic Grid environments are playing an increasingly critical role in science and engineering. As the size, dynamics, complexity and costs of these simulations grow, it becomes more and more important to be able to monitor, control, adapt and optimize a simulation application’s execution at runtime based on its state and the state of the computational environment. Experts should be able to define and deploy rules to enable the running simulation to be automatically monitored, to respond to specific conditions in its execution, and invoke appropriate operations on the expert’s behalf, so as to make those simulations self-healing, self-managed and self-optimized.

DIOS++, which forms the back-end of DISCOVER, is built based on DIOS. DIOS++ enables rule-based autonomic management and optimization of distributed and parallel applications. It provides abstractions for enhancing existing application objects with sensors and actuators for interrogation, a control network that connects and manages the distributed sensors and actuators, and also enables external discovery, interrogation, monitoring and manipulation of these objects at runtime, and a distributed rule engine that enables the runtime definition, deployment and execution of rules for adapting application objects.

2. Description

a. Autonomic object

Autonomic objects enhance application computational objects (data-structures, algorithms) with interaction interfaces that allow the object’s state to be externally monitored and enable application interaction and adaptation, with access policies that provide different security based on user certification and state, with rule interfaces to dynamically bind rule agents which carries rules and are responsible for rule evaluation and execution.

b. Rule

In the DIOS++ framework, rules are separated from application logic. It provides flexibility and allows users to create, delete and change rules dynamically without modifying application source code. Users use these external rules to monitor and control their applications at run time. Moreover, with this feature, rule can be added, deleted, changed on the fly without stopping and restarting applications. Rules are composed using exported applications/system views and commands and interpreted at interaction time. Rules are handled by a deductive shell, which are responsible for defining, deploying, evaluating and executing object rules and application rules. The deductive shell and rule operation will be discussed in section C Control Network.

"IF condition expression THEN then_action_list ELSE else_action_list”.

1) Object rules are applied to only one specific object. The conditions and actions are evaluated and executed within one object.

2) Application rules are defined for a group of objects. Application rules have to collect condition information from several objects, evaluate them and execute actions on several objects.

c. Control network

The DIOS++ control network is automatically configured at run time using the underlying messaging environment (e.g. MPI) and a number of available processors. Control network has a hierarchical structure composed of rule engine and gateway, computation nodes, autonomic objects and rule agents.










Gateway represents an interaction proxy for the entire application, contains registries of all the exported objects, manages a registry of the interaction interfaces for all the objects in the application, maintains a list of access policies related to each exported interface, and is responsible for interfacing with external interaction servers or brokers and transferring rule operations to the rule engine.

A embedded deductive shell is co-located with Gateway to enable rule creation, deletion, modification, and rule evaluation and execution. The deductive shell is composed of rule engine and rule agents. rule engine maintains a list of user-defined rules and rule status, coordinates the operation of rule agents and keep track of their execution status. Rule agents contain a list of object rules and decomposed application rules, evaluate and execute those rules, and keep track of rule status. Rule engine dynamically creates rule agents, assigns rules and execution script to them and delegates them to objects in the control network. Rule agents will execute rules based on their execution scripts and report status to rule engine.

3. Example

List is an instance of random number list generator; SortSelector is an instance of sorting component.

1) Object rule

a. IF List.getLength() < 2 THEN List.generate();

b. IF System.usedMemory() > 80% AND List.getLength() > 100 THEN List.generate()

Gateway receives object rules and transfer them to corresponding rule engine. Rule engine injects them into corresponding local rule agents. In this example, the object rules are injected into the rule agent in object List.

2) Application rule

IF List.getLength() < 30 THEN SortSelector.sequentialSort() ELSE SortSelector.quickSort()

Gateway receives application rules, transfers and stores them into rule engine. Rule engine breaks them into triggers and injects those triggers into the local rule agents of corresponding objects. In this example, "List.getLength() < 30" is injected into the rule agent of object List as a condition trigger. Similarly, "SortSelector.sequentialSort()" and "SortSelector.quickSort()" are injected into the rule agent of object SortSelector as action triggers.

4. Experimental results

  1. Minimal overhead: This experiment measures the runtime overhead introduced to application in DIOS++ rule execution mode. In this experiment, the application automatically updated the DISCOVER server and its connected clients with current state of autonomic objects and rule status. Explicit interaction and rule execution are disabled during the experiment. The application’s run time with and without DIOS++ are plotted in figure 1. It can be seen that the overheads due to the DIOS++ runtime are very small within the error of measurements.
  2. Comparing computation time and rule deployment time for successive interactions: In this experiment, we deployed two object rules and two application rules in 4 successive iterations. In figure 2, the experiment shows object rule needs less deployment than application rules. This is true since rule engine just creates the rule agent in destination object and transfers object rule to it, while rule engine has to decompose application rule, creates the destination rule agents and injects triggers to them.
  3. Comparing computation time, object rule execution time and application rule execution time for successive application iterations: the experiment shows application rule requires more execution time than object rule, since rule engine has to collect all the triggers execution results, check whether the conditions are true of false and invoke corresponding actions. (in figure 3)









Modified by Hua Liu