Educational Activities

The GreenHPC initiative provides educational opportunities for students at different levels, including graduate and undergraduate students at Rutgers University.

Active and motivated students are also supported for conducting research in the different activities of the GreenHPC initiative at The Applied Software Systems Laboratory.

The main educational activities of the GreenHPC initiative are in the following forms:

  • Master's Thesis
    • Sharat Chandra, Study of Application Aware techniques for System and Runtime Power Management, Fall 2012
    • Karthik Elangovan, Adaptive Memory Power Management Techniques for HPC Workloads, Fall 2011
    • Siddharth Wagh, Towards Autonomic Virtual Machine Management, Fall 2010
  • Special Problems (both graduate and undergraduate students)
  • Projects related to the topic of Energy Efficiency in the following courses taught at the Department of Electrical and Computer Engineering, Rutgers University:
    • 14:332:451 - Introduction to Parallel and Distributed Programming (undergraduate)
    • 16:332:566 - Introduction to Parallel and Distributed Programming (graduate)
    • 14:332:438 - Capstone Design in Software Systems (senior undergraduate students)
    • 16:332:572 - Parallel and Distributed Computing (graduate)
  • Specific programs involving high school students (e.g., New Jersey Governor's School of Engineering and Technology at Rutgers University)

Summer 2011: New Jersey Governor's School of Engineering and Technology


The New Jersey Governor's School of Engineering and Technology at Rutgers University is an intensive residential summer program that brings together some of New Jersey's most talented and motivated high school students.


The 2011 Governor's School of Engineering and Technology Research Symposium was held on Friday, July 22nd, 2011.

Prof. Manish Parashar and Dr. Ivan Rodero from CAC/ECE mentored a group of four of New Jersey's most talented and motivated high school students (Sarah Anne Coe, Eric Principato, Omar Rizwan and Katherine Ye) who developed the project "Autonomic Data Center Thermal Management". This project, which addresses one of the most important problems in the management of modern data centers, is part of the educational activities of the GreenHPC initiative (http://nsfcac.rutgers.edu/GreenHPC) at The Applied Software Systems Laboratory and the at Cloud and Autonomic Computing Center.

The students were introduced to the fundamentals of autonomic computing, virtualization and thermal management for virtualized data centers. They developed an autonomic scheduler to manage virtual machines and distribute the input workload to minimize the head production.

Empirical experimentation conducted in a research testbed proved that autonomic management techniques, a software solution to the problem of data center thermal management, can be effective in reducing average server temperatures. The students concluded that effective thermal management using autonomic computing can significantly improve datacenter reliability and productivity therefore can reduce data center costs through both energy savings and maintenance savings.


Autonomic Data Center Thermal Management

Rationale

Data centers play the critical role of providing online services such as hosting company data. As data centers have increased in size and scope, they have also started to generate increasing amounts of heat, thus requiring increasing amounts of power for self-cooling. Data center cooling now demands increased efficiency and autonomy in order to better protect the environment, improve server stability, and minimize costs. Autonomic computing provides a viable solution: by programming computers to self-manage and self-optimize their processes, they can maximize their own efficiency without human intervention. To explore autonomic approaches to data center environmental management, an autonomic scheduler to manage virtual machines was developed and evaluated.


Algorithms

Three algorithms were implemented and evaluated:

  • Sequential algorithm: Non-autonomic algorithm, which does not account for machine load or monitoring data. Tasks are scheduled node-by-node, with one VM assigned to each submitted task.
  • Random algorithm: Non-autonomic algorithm, which schedules tasks to random nodes.
  • Autonomic algorithm: prioritizes task scheduling to low-temperature nodes.

The autonomic algorithm manages virtual machines and tasks in several different ways. It continually analyzes temperature reports from the monitoring module. Prior to scheduling tasks, the scheduler ensures the safe operation of all the servers. If any node is operating at a higher temperature than a defined threshold (e.g., 55 degrees Celcius), the scheduler immediately disables a virtual machine on that node to reduce load. As long as a node's temperature is too high, the scheduler continues pausing virtual machines on each temperature check. Once the temperature has decreased to a safe level, the scheduler unpauses the virtual machines on the node. After this safety check, it identifies the coolest node on which to schedule this task and select a random VM on that node. It always schedules tasks to the coolest node to avoid overheating servers. If the VM chosen on that node is occupied, or if the node is too hot, it waits and tries this process again after five seconds.

Results

The sequential scheduler, biased scheduling toward lower-numbered nodes, since tasks are scheduled to the first available VM on the first available node. Assigning tasks sequentially to VMs, as expected, resulted in significant increases in heat across all nodes, as shown in Figure 1 (top-left).

The random scheduler created a more equal distribution of tasks across nodes. The more equal distribution of tasks did not, however, lower the average temperature. In fact, random scheduling generally produced higher temperatures on the servers as shown in Figure 1 (top-right).

Finally, the autonomic scheduler distributed tasks fairly equally among nodes. Autonomic scheduling reduced the peak temperature of the servers, as indicated by Figure 1 (bottom). Instead, tasks took longer to complete, although the servers usually ran at lower temperatures.



Figure 1. Server temperatures from different scheduling algorithms


The autonomic scheduler hollowed out the maximum average temperature, as observed in Figure 2, since it immediately stopped any servers which were running at excessively high temperatures.


Figure 2. Average temperatures of servers


Additional information can be found in their project report.