This work investigates the use of clouds and autonomic cloud-bursting to support a medical image registration. The goal is to enable a virtual computational cloud that integrates local computational environments and public cloud services on-the-fly, and support image registration requests from different distributed researcher groups with varied computational requirements and QoS constraints. The virtual cloud essentially implements shared and coordinated task-spaces, which coordinates the scheduling of jobs submitted by a dynamic set of research groups to their local job queues. A policy-driven scheduling agent uses the QoS constraints along with performance history and the state of the resources to determine the appropriate size and mix of the public and private cloud resource that should be allocated to a specific request. The virtual computational cloud and the medical image registration service have been developed using the CometCloud engine and have been deployed on a combination of private clouds at Rutgers University and the Cancer Institute of New Jersey and Amazon EC2. An experimental evaluation is presented and demonstrates the effectiveness of autonomic cloudbursts and policy-based autonomic scheduling for this application.
Nonlinear image registration is the process to determine the mapping T between two images of the same object or similar objects acquired at different time, in different position or using different acquisition parameters or modalities. Both intensity/area based and landmark based methods have been reported to be effective in handling various registration tasks. Hybrid methods that integrate both techniques have demonstrated advantages in the literature. In general, intensity/area based methods are widely accepted for fully automatic registration. But landmark based methods, though also commonly used, sometimes still rely on human intervention in selecting landmark points and/or performing point matching. Point matching in medical images is particularly challenging due to the variability in image acquisition and anatomical structures.
We developed alternative landmark point detection and matching method as a part of our hybrid image registration algorithm for both 2D and 3D images. The algorithm starts with automatic detection of a set of landmarks in both fixed and moving images, followed by a coarse to fine estimation of the nonlinear mapping using the landmarks. For 2D images, multiple resolution oriental histograms and intensity template are combined to obtain fast affine invariant local descriptor of the detected landmarks. For 3D volumes, considering both speed and accuracy, the global registration is first applied to pre-align two 3D images. Intensity template matching is further used to obtain the point correspondence between landmarks in the fixed and moving images. Because there is a large portion of outliers in the initial landmark correspondence, a robust estimator, RANSAC, is applied to reject outliers. The final refined inliers are used to robustly estimate a Thin Spline Transform (TPS) to complete the final nonlinear registration. The proposed algorithm can handle much larger transformation and deformation compared with common image registration methods such as finite element method (FEM) or BSpline fitting, while still provide good registration results. The flowchart of the hybrid image registration algorithm is shown below
An overview of the operation of the CometCloud-based medical image recognition application scenario is presented below.
In this application scenario, there are multiple (possibly distributed) job queues from where users insert image registration requests to the CometCloud. Each of these entry points represents a research site in research collaboration, and maintains its own storage where medical images are stored. Each site generates its own requests with its own policies and QoS constraints. Note that a site can join the collaboration and CometCloud at anytime (provided it has the right credentials) and can submit requests. The requests (tasks) generated by the different sites are logged in the CometCloud virtual shared space that spans master nodes at each of the sites. These tasks are then consumed by workers, which may run on local computational nodes at the site, a shared datacenter or on a public cloud infrastructure. These workers can access the space using appropriate credentials, access authorized tasks (i.e., image registration request) and return results back to the appropriate master indicated in the task itself.
The virtual cloud environment used for the experiments consisted of two research sites located at Rutgers University and University of Medicine and Dentistry of New Jersey, one public cloud, i.e., Amazon Web Service (AWS) EC2 , and one private datacenter at Rutgers, i.e., TW. The two research sites hosted their own image servers and job queues, and workers running on EC2 or TW access these image servers to get the image described in the task assigned to them (see Figure below).
Each image server has 250 images resulting in a total of 500 tasks. Each image is two dimensional and its size is between 17KB and 65KB. On EC2, we used standard small instances with a computing cost of $0.10/hour, data transfer costs of $0.10/GB for inward transfers and $0.17/GB for outward transfers.
Costs for the TW datacenter included hardware investment, software, electricity etc., and were estimated to $1.37/hour per rack. In the experiments we set the maximum number of available nodes to 25 for TW and 100 for EC2. Note that TW nodes outperform EC2 nodes, but are more expensive. We used budget-based policy for scheduling where the scheduling agent tries to complete tasks as soon as possible without violating the budget. We set the maximum available budget in the experiments to $3 to complete all tasks. The motivation for this choice is as follows. If the available budget was sufficiently high, then all the available nodes on TW will be allocated, and tasks would be assigned until the all the tasks were completed. If the budget is too small, the scheduling agent would not be able to complete all the tasks within the budget. Hence, we set the budget to an arbitrary value in between. Finally, the monitoring component of the scheduling agent evaluated the performance every 1 minute. The results from the experiments are shown below.
Figure (b) shows the average cost per task in each scheduling period for TW and EC2.Figure (a) shows the scheduled number of workers on TW and
Note that since the scheduling interval is 1 min, the X-axis corresponds to both time (in minutes) and the scheduling iteration number. Initially, the CometCloud scheduling agent does not know the cost of completing a task. Hence, it initially allocated 10 nodes each from TW and EC2.
In the beginning, since the budget is sufficient, the scheduling agent tries to allocate TW nodes even though they cost more than EC2 node. In the 2nd scheduling iteration, there are 460 tasks still remaining, and the agent attempts to allocate 180 TW nodes and 280 EC2 nodes to finish all tasks as soon as possible within the available budget. If TW and EC2 could provide the requested nodes, all the tasks would be completed by next iteration. However, since the maximum available TW node is only 25, it allocates these 25 TW nodes and estimates that a completion time of 7.2 iterations. The agent then decides on the number of EC2 workers to be used based on the estimated rounds.
In case of the EC2, it takes around 1 minutes to launch (from the start of virtual machine to ready state for consuming tasks), and as a results, by the 4th iteration the cost per task for EC2 increases. At this point, the scheduling agent decides to decrease the number of TW nodes, what are expensive, and instead, decides to increase the number of EC2 nodes using the available budget. By the 9th iteration, 22 tasks are still remaining. The scheduling agent now decides to release 78 EC2 nodes because they will not have jobs to execute. The reason why the remaining jobs have not completed at the 10th iteration (i.e., 10 minutes) even though there are 22 nodes still working is that there was an unexplainable decrease in EC2 performance during our experiments. The variations in the cost per task in Figure (b) are because the task completions are not uniformly distributed across the time intervals. Since the cost per interval is fixed (defined by AWS) the cost per tasks varies, depending on the number of task completed in a particular time interval.
Figure (c) shows the used budget over time. It shows all the tasks were completed within the budget and took around 13 minutes.
This figure shows a comparison of execution time and used budget with/without the CometCloud scheduling agent. In the case where only EC2 nodes are used, when the number of EC2 nodes is decreased from 100 to 50 and 25, the execution time increases and the budget used decreases as shown (a) and (b). Comparing the same number of EC2 and TW nodes (25 EC2 and 25 TW), the execution time for 25 TW nodes is approximately half that for 25 EC2 nodes, however the costs for 25 TW nodes is significantly more than that for 25 EC2 nodes. When the CometCloud autonomic scheduling agent is used, the execution time is close to that obtained using 25 TW nodes, but the cost is much smaller and the tasks complete within the budget. The reason why the execution time in this case is larger than that for 100 EC2 node case is as follows: the cost peaks at time = 11 mins as seen in Figure (b), and this causes the autonomic scheduler to reduce the number of EC2 nodes to approximately 20 (see Figure (a)), causing the execution time to increase.
An interesting observation from the plots is that if you don’t have any limits on the number of EC2 nodes used, then a better solution is to allocate as many EC2 nodes as you can. However, if you only have limited number of EC2 nodes and want to be guaranteed that your job is completed within a limited budget, then the autonomic scheduling approach achieves an acceptable tradeoff. Since different cloud service will have different performance and cost profiles, the scheduling agent will have to use historical data and more complex models to compute schedules, as we extend CometCloud to include other service providers.