Archive for October, 2017

How RDMA is Solving AI’s Scalability Problem

October 25th, 2017


Artificial Intelligence (AI) is already impacting many aspects of our day-to-day lives. Through the use of AI, we have been introduced to autonomous vehicles, real-time fraud detection, public safety, advanced drug discovery, cancer research and much more. AI has already enabled scientific achievements once thought impossible, while also delivering on the promise of improving humanity.

Today, AI and machine learning is becoming completely intertwined in society and the way we interact with computers, but the real barrier to tackling even bigger challenges of tomorrow is scalable performance. As future research, development and simulations require the processing of larger data sets, the key to unlocking the performance barrier associated with highly parallelized computing and the communication overhead associated with it will undoubtedly be the interconnect.  The more parallel processes we add to solve a complex problem, the more communication and data movement is needed.  Remote Direct Memory Access (RDMA) fabrics, such as InfiniBand and RDMA over Converged Ethernet (RoCE), are key to unlocking scalable performance for the most demanding AI applications being developed and deployed today.

The InfiniBand Trade Association’s (IBTA) InfiniBand Roadmap lays out a clear and attainable path for performance gains, detailing 1x, 4x and 12x port widths with bandwidths reaching 600Gb/s this year and further outlining plans for future speed increases. For those already deploying InfiniBand in their HPC and AI systems, the roadmap provides specific milestones around expected performance improvements to ensure their investment is protected, and with the assurance of backwards and forwards compatibility across the generations. While high bandwidth is very important, the low latency benefits of RDMA are equally essential for the advancement of machine learning and AI. The ultra-low latency provided by RDMA enables minimal processing overhead and greatly accelerates overall application performance, which AI requires when moving massive amounts of data, exchanging messages and computing results.  InfiniBand’s low latency and high bandwidth characteristics will undoubtedly address AI scalability and efficiency needs as systems tackle challenges involving even larger and more complex data sets.

The InfiniBand Architecture Specification is an open standard developed in a vendor-neutral, community-centric manner. The IBTA has a long history of addressing HPC and enterprise application requirements for I/O performance and scalability – providing a reliable ecosystem for end users through promotion of open standards and roadmaps, compliant and interoperable products, as well as success stories and educational resources. Furthermore, many institutions advancing AI research and development leverage InfiniBand and RoCE  as they satisfy both performance needs and requirements for non-proprietary, open technologies.

One of the most critical elements when creating a cognitive computing application involves deep learning. It takes a considerable amount of time to find a solution in creating a data model with the highest degree of accuracy. While this could be done over a traditional network such as Ethernet, the time required to train the models is considerably time consuming and not practical.  Today, all major frameworks (i.e. TensorFlow, Microsoft Cognitive Toolkit, Baidu’s PaddlePaddle and others) and even communications libraries such as NVIDIA’s NCCL library are natively enabled to take advantage of the low level verb implementation of the InfiniBand standard.  This greatly improves the overall accuracy in training, but also considerably reduces amount of time needed to deploy the solution (as highlighted in a recent IBM PowerAI DDL demonstration).

The supercomputing industry has been aggressively marching towards Exascale. RDMA is the core offload technology that is able to solve the scalability issues hindering the advancements of HPC.  Since machine learning shares the same underlying hardware and interconnect needs as HPC, RDMA is unlocking the power of AI through the use of InfiniBand.  As machine learning demands advance even further, InfiniBand will continue to lead and drive the industries who rely them.

Be sure to check back in on the IBTA blog for future posts on RDMA’s role in AI and machine learning.

Scot Schultz, Sr. Director of HPC/AI & Techincal Computing at Mellanox

Scot Schultz, Sr. Director of HPC/AI & Technical Computing at Mellanox

Author: admin Categories: Uncategorized Tags: