Newsletter

4/14/09 - The Spring 2009 (PDF) edition of the CAC newsletter is available.

Newsletter Archive

 

The University of Florida , the University of Arizona and Rutgers, the State University of New Jersey , have established a national research center for autonomic computing (CAC).

This center is funded by the Industry/University Cooperative Research Center program of the National Science Foundation, CAC members from industry and government, and university matching funds.

Autonomic Computing Engines

Principal researchers: Andres Quiroz, Shivangi Chaudhari, Manish Parashar (Rutgers)

Current collaborators: Brian Hammond (Microsoft)

Status: Ongoing

Consolidated and virtualized cluster-based computing centers have become dominant computing platforms in industry and research for enabling complex and compute intensive applications. However, as scales, operating costs, and energy requirements increase, maximizing efficiency, cost-effectiveness, and utilization of these systems becomes paramount. Furthermore, the complexity, dynamism, and often time critical nature of application workloads makes on-demand scalability, integration of geographically distributed resources, and incorporation of utility computing services extremely critical. Finally, the heterogeneity and dynamics of the system, application, and computing environment require context-aware dynamic scheduling and runtime management.

This project envisions an autonomic computing engine capable of: (1) Supporting dynamic utility-driven on-demand scale-out of resources and applications, where organizations incorporate computational resources based on perceived utility. These include resources within the enterprise and across virtual organizations, as well as from emerging utility computing clouds. (2) Enabling complex and highly dynamic application workflows consisting of heterogeneous and coupled tasks/jobs through programming and runtime support for a range of computing patterns (e.g., master-slave, pipelined, data-parallel, asynchronous, system-level acceleration). (3) Integrated runtime management (including scheduling and dynamic adaptation) of the different dimensions of application metrics and execution context. Context awareness includes system awareness to manage heterogeneous resource costs, capabilities, availabilities, and loads, application awareness to manage heterogeneous and dynamic application resources, data and interaction/coordination requirements, and ambient-awareness to manage the dynamics of the execution context such as heat/temperature and power.

As part of this project we have developed the Comet computing substrate, which provides a foundation and core capabilities for the envisioned autonomic computing engine. Comet supports different programming abstractions for parallel computing, including master/worker, data parallel and asynchronous iterations, in a dynamic and widely distributed environment. It provides the abstraction of virtual semantic shared spaces that forms the basis for flexible scheduling, associative coordination, and content-based asynchronous and decoupled interactions. Comet builds on a self-organizing and fault-tolerant dynamic overlay of computing resources. It is currently deployed on a range of platforms, including local clusters, campus Grids, and wide-area computing platforms (e.g., PlanetLab) and supports several computational applications from science, engineering, and finance. We have also designed the Rudder autonomic coordination middleware that builds on Comet and supports composition, interaction, and management of dynamic application workflows. Through the use of context-aware agents, Rudder enables dynamic discovery, configuration and composition of workflow components at run time for application scale-out, and provides for dynamic switching of workflows, components, and component interaction patterns, for high resource utilization.

Ongoing efforts are focused on application-level autonomics by building an autonomic computational engine to on the Microsoft Windows Compute Cluster Server (CCS) platform. A key aspect of the design is the use of advanced networking support provided by CCS such as RDMA (Winsock Direct) and offloading (TCP Chimney), to enable low latency communications and latency hiding techniques. The effort is driven by scientific and business applications.

Reference

Z. Li and M. Parashar, ˇ°A Computational Infrastructure for Grid-based Asynchronous Parallel Applications,ˇ± Proceedings of the 16th International Symposium on High-Performance Distributed Computing (HPDC), Monterey, CA, USA, pp. 229, June 2007.

Z. Li and M. Parashar, ˇ°Rudder: An Agent-based Infrastructure for Autonomic Composition of Grid Applications,ˇ± Multiagent and Grid System - An International Journal, IOS Press, Vol. 1, No. 4, pp. 183 - 195, 2005.