High-Throughput Asynchronous Data Transfers
Principal researchers: Ciprian Docan, Manish Parashar (Rutgers University)Current collaborators: Geoff Jiang (NEC)
Scott Klasky (Oak Ridge National Laboratory)
Status: Ongoing
Summary:
Large scale applications, such as financial analytics, engineering simulations, process monitoring and control or enterprise system management, which typically run on distributed platforms such as large datacenters, clusters and HPC systems, generate massive amounts of data. This data must be extracted from the system and transported in a timely manner, to remote consumers for online processing, analysis, monitoring, decision-making, etc. However, managing and transporting this data is becoming a significant bottleneck, imposing considerable overheads on the applications and leading to inefficient resource utilization and frequent QoS violations. Advanced interconnect architectures and innovative communication protocols, such as for example, customized high speed interconnection buses, one-sided remote direct memory access with zero-copy and OS and application bypass data transfers, have been introduced to address these challenges. However, these advances also significantly increase the complexity exposed to the applications and applications must be adapted and managed at runtime to effectively use these capabilities.
In this project, we developed DART (Decoupled and Asynchronous Remote Data Transfer), an autonomic data management and transport substrate that builds on communication technologies such as RDMA, to provide applications with high-throughput and low-overhead data extraction and transport capabilities. The key objectives of DART are (1) to offload the expensive I/O operations from the noades running the simulation to dedicated I/O nodes, and thus to enable an application to do useful computational work, (2) to minimize the impact of the I/O operations on the application, (3) to maximize data throughput from the application, and (4) to minimize data transfer latency. DART provides the application layer with a simple and asynchronous API. These mechanisms autonomically adapt to heterogeneous and dynamic data types, data volumes, data rates and application loads. DART has three key components (1) a DART Client that runs in-line with the application and provides communication calls similar to file operations for ease of use, (2) DART Seraver that runs as a service independent of the application and coordinates, schedules and extracts data from the application, and (3) a DART Receiver that transports data to a remote location.
An initial implementation of DART on a CaRAY XT3/XT4 machine at Oak Ridge National Lab and using the Portals RDMA library demonstrated that DART can efficiently extract large volumes of data from a live simulation and stream it to a remote downstream node or to save it to a local storage system, achieving effective throughputs of over 1 Terabyte per hour from 2048 cores. Evaluations used a synthetic benchmark application as well as with two different scientific applications simulating complex plasma phenomena. The application ran on 1024 and 2048 nodes and produced 500GB and 1TB of data. The I/O overhead on the applications in these exaperiments was 0.4% and 0.6% respectively. The results of the initial experiments, show that DART is a viable transfer method, and we can explore other transfer problems, e.g., data or services replication for load balancing, transparent OS image migration for on-line services, e.g., Amazon cloud.
References:
- C. Docan, S. Klasky, and M. Parashar. High Speed Asynchronous Data Transfers on the Cray XT3. Technical Report TR-284, May 2007.
- C. Docan, S. Klasky, and M. Parashar. Enabling High Speed Asynchronous Data Extraction and Transfer Using DART. HPDC, 2008.


