Archive

Archive for the ‘RDMA’ Category

Latest InfiniBand and RoCE Developments a Major Focus at the OpenFabrics Alliance Workshop 2018

May 31st, 2018

openfabriclogo

The annual OpenFabrics Alliance (OFA) Workshop is a premier means of fostering collaboration among those in the OpenFabrics community and advanced networking industry as a whole. Known for being the only event of its kind, the OFA Workshop allows attendees to discuss emerging fabric technologies, collaborate on future industry requirements, and address remaining challenges. The week-long event is made up of sessions covering a wide range of pressing topics, including talks related to InfiniBand and RDMA over Converged Ethernet (RoCE).

This year’s agenda featured sessions highlighting a variety of InfiniBand and RoCE updates and emerging applications. Below is a list of all OFA Workshop 2018 sessions covering RDMA technologies and the associated presentations.


RoCE Containers - Status update

Parav Pandit, Mellanox Technologies

Using RDMA in containerized environment in a secure manner is desired. RDMA over Converged Ethernet (RoCE) needs to operate and honor net namespace other than default init_net. This session focused on recent and upcoming enhancements for functionality and security for RoCE. Various modules of the InfiniBand stack including connection manager, user verbs, core, statistics, resource tracking, device discovery and visibility to applications, net device migration across namespaces at minimum are the key areas to address for supporting RoCE devices in container environment.

Building Efficient Clouds for HPC, Big Data, and Neuroscience Applications over SR-IOV-enabled InfiniBand Clusters

Xiaoyi Lu, The Ohio State University

Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high performance interconnects such as InfiniBand. SR-IOV can deliver near-native performance but lacks locality-aware communication support. This talk presented an efficient approach to building HPC clouds based on MVAPICH2 and RDMA-Hadoop with SR-IOV. The talk highlighted high-performance designs of the virtual machine and container aware MVAPICH2 library over SR-IOV enabled HPC Clouds. This talk also presented a high-performance virtual machine migration framework for MPI applications on SR-IOV enabled InfiniBand clouds. The presenter discussed how to leverage the high-performance networking features (e.g., RDMA, SR-IOV) on cloud environments to accelerate data processing through RDMA-Hadoop package. To show the performance benefits of the proposed designs, they co-designed a scalable and distributed tool with MVAPICH2 for statistical evaluation of brain connectomes in the Neuroscience domain, which can run on top of container-based cloud environments with natively utilizing RDMA interconnects and delivering near-native performance.

Non-Contiguous Memory Registration

Tzahi Oved, Mellanox Technologies

Memory registration enables contiguous memory regions to be accessed with RDMA. In this talk, they showed how this could be extended beyond access rights, for describing complex memory layouts. Many HPC applications receive regular structured data, such as a column of a matrix. In this case, the application would typically receive a chunk of data and scatter it by the CPU, or use multiple RDMA writes to transfer each element in-place. Both options introduce significant overhead. By using a memory region that specifies strided access, this overhead could be completely eliminated: the initiator posts a single RDMA write and the target HCA scatters each element into place. Similarly, standard memory regions cannot describe non-contiguous memory allocations, forcing applications to generate remote keys for each buffer. However, by allowing a non-contiguous memory region to span multiple address ranges, an application may scatter remote data with a single remote key. Using non-contiguous memory registration, such memory layouts may be created, accessed, and invalidated using efficient, non-privileged, user-level interfaces.

Dynamically-Connected Transport

Alex Rosenbaum, Mellanox Technologies

Dynamically-Connected (DC) transport is a combination of features from the existing UD and RC transports: DC can send every message to a different destination, like UD does, and is also a reliable transport - supporting RDMA and Atomic operations as RC does. The crux of the transport is dynamically connecting and disconnecting on-the-fly in hardware when changing destinations. As a result, a DC endpoint may communicate with any peer, providing the full RC feature set, and maintain a fixed memory footprint regardless of the size of the network. In this talk, we present the unique characteristics of this new transport, and show how it could be leveraged to reach peek all-to all communication performance. We will review the DC transport objects and their semantics, the Linux upstream DC API and its usage.

T10-DIF offload

Tzahi Oved, Mellanox Technologies

T10-DIF is a standard that defines how to protect the integrity of storage data blocks. Every storage block is proceeded by a Data Integrity Field (DIF). This field contains CRC of the preceding block, the LBA (block number within the storage device) and an application tag. Normally the DIF will be saved in the storage device along with the data block itself, so that in the future it will be used to verify the data integrity.

Modern storage systems and adapters allow creating, verifying and stripping those DIFs while reading and writing data to the storage device, as requested by the user and supported by the OS. The T10-DIF offload RDMA feature brings this capability to the RDMA based storage protocols. Using this feature, RDMA based protocols can request the RDMA device to generate, strip and/or verify DIF while sending or receiving a message. DIF operation is configured in a new Signature Memory-Region. Every memory access using this MR (local or remote) results in DIF operation done on the data as it moves between wire and memory. This session will describe how the configuration and operation of this feature should be done using verbs API.

NVMf Target Offload

Liran Liss, Mellanox Technologies

NVMe is a standard that defines how to access a solid-state storage device over PCI in a very efficient way. It defines how to create and use multiple submission and completion queues between software and the device over which storage operations are carried and completed.

NVMe-over-Fabric is a newer standard that maps NVMe to RDMA to allow remote access to storage devices over an RDMA fabric using the same NVMe language. Since NVMe queues look and act very much like RDMA queues, it is a natural application to bridge between the two. In fact, a couple of software packages today implement an NVMe-over-Fabric to local NVMe target.

The NVMe-oF Target Offload feature is such an implementation that is done in hardware. A supporting RDMA device is configured with the details of the queues of an NVMe device. An incoming client RDMA connection (QP) is then bound to those NVMe queues. From that point on, every IO request arriving over the network from the client is submitted to the respective NVMe queue without any software intervention using PCI peer-to-peer access. This session will describe how the configuration and operation of such feature should be done using verbs.

High-Performance Big Data Analytics with RDMA over NVM and NVMe-SSD

Xiaoyi Lu, The Ohio State University

The convergence of Big Data and HPC has been pushing the innovation of accelerating Big Data analytics and management on modern HPC clusters. Recent studies have shown that the performance of Apache Hadoop, Spark, and Memcached can be significantly improved by leveraging the high performance networking technologies, such as Remote Direct Memory Access (RDMA). Most of these studies are based on `DRAM+RDMA’ schemes. On the other hand, Non-Volatile Memory (NVM)and NVMe-SSD technologies can support RDMA access with low-latency, high-throughput, and persistence on HPC clusters. NVMs and NVMe-SSDs provide the opportunity to build novel high-performance and QoS-aware communication and I/O subsystems for data-intensive applications. In this talk, we proposed new communication and I/O schemes for these data analytics stacks, which are designed with RDMA over NVM and NVMe-SSD. Our studies show that the proposed designs can significantly improve the communication, I/O, and application performance for Big Data analytics and management middleware, such as Hadoop, Spark, Memcached, etc. In addition, we will also discuss how to design QoS-aware schemes in these frameworks with NVMe-SSD.

Comprehensive, Synchronous, High Frequency Measurement of InfiniBand Networks in Production HPC Systems

Michael Aguilar, Sandia National Laboratories

In this presentation, we showed InfiniBand performance information gathered from a large Sandia HPC system, Skybridge. We showed detection of network hot spots that may affect data exchanges for tightly coupled parallel threads. We quantified the overhead cost (application impact) when data is being collected.

At Sandia Labs, we are continuing to develop an InfiniBand fabric switch port sampler that can used to gather remote data from InfiniBand switches. Using coordinated InfiniBand switch and HCA port samplers, a real-time snapshot of InfiniBand traffic can be retrieved from the fabric on a large-scale HPC computing platform. Due to the time-stamped and light-weight data retrieval with LDMS, production job runs can be instrumented to provide research data that can be used to specify computing platforms with improved data performance.

Our implementation of synchronous monitoring of large-scale HPC systems provides insights into how to improve computing performance. Our sampler takes advantage of the OpenFabrics software stack for metric gathering. The OFED stack supports a common inter-operable software stack that provides the inherent ability to gather traffic metrics from selected connection points within a network fabric. We use OFED MAD and UMAD to collect the remote switch port traffic metrics.

The OFA Workshop is extremely valuable to InfiniBand Trade Association members and the fabrics community as a whole with an aim to identify, discuss and overcome the industry’s most significant challenges. We look forward to participating again next year. Videos of each presentation from the OFA Workshop 2018 are now available online on insideHPC.com.

Bill Lee

Author: admin Categories: InfiniBand, OpenFabrics Alliance, RDMA, RoCE Tags:

InfiniBand Leads the TOP500 List, is Preferred Fabric of Leading AI and Deep Learning Systems

December 7th, 2017

top500The latest iteration of the bi-annual TOP500 List reveals that InfiniBand not only powers the world’s first and fourth fastest supercomputers, but it is also the preferred interconnect for Artificial Intelligence (AI) and Deep Learning systems. Furthermore, the latest results show InfiniBand continues to be the most used high-speed interconnect in the TOP500, reinforcing its status as the industry’s leading high performance interconnect technology.

As High Performance Computing (HPC) demands evolve, especially in the case of emerging AI and Deep Learning applications, the industry can rely on InfiniBand to meet their rigorous network performance requirements and scalability needs. System architects will continue to turn to the unmatched combination of scalable network bandwidth, low latency and efficiency that InfiniBand offers.

Top of the List:

  • InfiniBand accelerates two of the top five systems – including the first (China) and fourth (Japan) fastest supercomputers
  • InfiniBand connects 77% of new HPC systems
  • InfiniBand is the most used high-speed interconnect on the TOP500 List
  • InfiniBand is the preferred interconnect for leading AI and Deep Learning systems

· All 23 systems running Ethernet at 25Gb/s or higher are RoCE capable

InfiniBand continues to prove that it can deliver on the increasing demands for performance, scalability and speed that are required of today’s HPC systems, efficiently tackling challenges involving even larger and more complex data sets. Read the full IBTA announcement for more information on InfiniBand and RoCE’s status in the world’s top supercomputers.

Bill Lee

Author: admin Categories: RDMA, TOP500 Tags:

RoCE Initiative Launches New Online Product Directory for CIOs and IT Professionals

May 24th, 2017

roce-logo

The RoCE Initiative is excited to announce the launch of the online RoCE Product Directory, the latest technical resource to supplement the IBTA’s RoCE educational program. The new online resource is intended to inform CIOs and enterprise data center architects about their options for deploying RDMA over Converged Ethernet (RoCE) technology within their Ethernet infrastructure.

The directory is comprised of a growing catalogue of RoCE-enabled solutions provided by IBTA members, including Broadcom, Cavium, Inc. and Mellanox Technologies. The new online tool allows users to search by product type and/or brand, connecting them directly to each item’s specific product page. The product directory currently boasts over 65 products and counting that accelerate performance over Ethernet networks while lowering latency.

For more information on the RoCE Product Directory and members currently involved, view the press release here.

Explore the RoCE Product Directory on the RoCE Initiative Product Search page here.

Bill Lee

Author: admin Categories: InfiniBand, RDMA, RoCE Tags:

IBTA to Feature Optimized Testing, Debugging Procedures Onsite at Plugfest 31

March 19th, 2017

il-interop

The IBTA boasts one of the industry’s top compliance and interoperability programs, which provides device and cable vendors the opportunity to test their products for compliance with the InfiniBand architecture specification as well as interoperability with other InfiniBand and RoCE products. The IBTA Integrators’ List program produces two lists, the InfiniBand Integrators’ List and the RoCE Interoperability List, which are updated twice a year following bi-annual plugfests.

We’re pleased to announce that the results from Plugfest 29 are now available on the IBTA Integrators’ List webpage, while Plugfest 30 results will be made available in the coming weeks. These results are designed to support data center managers, CIOs and other IT decision makers with their planned deployment of InfiniBand and RoCE solutions in both small clusters and large-scale clusters of 1,000 nodes or more.

Changes for Plugfest 31

IBTA Plugfest 31, taking place April 17-28 at the University of New Hampshire Interoperability Lab, is just around the corner and we are excited to announce some significant updates to our testing processes and procedures. These changes originated from efforts at last year’s plugfests and will be fully implemented onsite for the first time at Plugfest 31.

Changes:

  1. We will no longer be testing QDR but we are adding HDR (200 GHz) testing.
  2. Keysight VNA testing is now performed using a 32 port VNA to enable testing of all 8 lanes.
  3. Software Forge (SFI) has developed all new MATLAB code that will allow real time processing of the 32 port s-parameter files generated by the Keysight VNA. This allows us to test and post process VNA results in less than 2 minutes per cable.
  4. Anritsu, Keysight and Software Forge have teamed together to bring hardware and software solutions that allow for real time VNA & ATD testing. This allows direct vendor participation and validation during the Plugfest.

Benefits:

  1. Anritsu and Keysight bring the best leading edge equipment to the Plugfest twice per year.
    1. See the Methods of Implementation for details.
  2. The IBTA also has access to SFI software that allows the Plugfest engineers to post process the results in real time. Therefore we are now able to do real time interactive testing and debugging while your engineers are at the Plugfest.
  3. We are offering a dedicated guaranteed 5 hour time slot for each vendor to debug and review their test results. Additional time will be available but will be allocated during the Plugfest after all vendors are allocated the initial 5 hours. See the registration to choose your time slot.
  4. Arbitration will occur during the Plugfest and not afterwards. This is because we only have access to the EDR and HDR test equipment at the bi-annual IBTA Plugfests.
  5. Results from the IBTA Plugfest will now be available much more quickly since the post processing time has been reduced so dramatically.
  6. We are strongly encouraging vendors to send engineers to this event so that you can compare your results with ours and do any necessary debugging and validation. This interactive debugging and testing opportunity is the best in any of the high speed industries and is provided to you as part of your IBTA Membership. Please take advantage of it.
  7. We will be providing both InfiniBand and RoCE Interoperability testing at PF31.

Interested in attending IBTA Plugfest 31? Registration can be completed on the IBTA Plugfest page. The March 20 registration deadline is fast approaching, so don’t delay!

Rupert Dance, IBTA CIWG

Rupert Dance, IBTA CIWG

Author: admin Categories: IBTA Plugfest, Plugfest, RDMA, RoCE Tags:

Incorporate Networking into Hyperconverged Integrated Systems to Gain a Market Advantage

August 22nd, 2016

gartner-logo
The concept of hyperconverged integrated systems (HCIS) emerged as data centers considered new ways to increase resource utilization by reducing infrastructure inefficiencies and complexities. HCIS is primarily a software-defined platform that integrates compute, storage, networking resources. The HCIS market is expected to grow 79 percent to reach almost $2 billion this year, driving it into mainstream use in the next five years, according to Gartner.

Since this market is growing so rapidly, Gartner released an exciting new report, “Use Networking to Differentiate Your Hyperconverged System.” In the report, Gartner advises HCIS vendors to focus on networking to gain competitive market advantage by integrating use-case-specific guidelines and case studies in go-to-market efforts.

According to the report, more than 10 percent of HCIS deployments will suffer from avoidable network-induced performance problems by 2018, up from less than one percent today. HCIS vendors can help address expected challenges and add value for buyers by considering high performance networking protocols, such as InfiniBand and RDMA over Converged Ethernet (RoCE), during the system design stage.

The growing scale of HCIS clusters creates challenges such as expanding workload coverage and diminishing competitive product differentiation. This will force HCIS vendors to alter their product lines and marketing efforts to help their offerings stand out from the rest. Integrating the right networking capabilities will become even more important as a growing number of providers look to differentiate their products. The Gartner report states that by 2018, 60 percent of providers will start to offer integration of networking services, together with compute and storage services, inside of their HCIS products.

Until recently, HCIS vendors have often treated networking simply as a “dumb” interconnect. However, when clusters grow beyond a handful of nodes and higher workloads are introduced, issues begin to arise. This Gartner report stresses that treating the network as “fat dumb pipes” will make it harder to troubleshoot application performance problems from an end-to-end perspective. The report also determines that optimizing the entire communications stack is key to driving latency down and it names InfiniBand and RoCE as important protocols to implement for input/output (I/O)-intensive workloads.

As competition in the HCIS market continues to grow, vendors must change their perception of networking and begin to focus on how to integrate it in order to keep a competitive edge. To learn more about how HCIS professionals can achieve this market advantage, download the full report from the InfiniBand Reports page.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and is used herein with permission. All rights reserved.

Bill Lee

Dive into RDMA’s Impact on NVMe Devices at the 2016 Flash Memory Summit

August 5th, 2016

fms

Next week, storage experts will gather at the 2016 Flash Memory Summit (FMS) in Santa Clara, CA, to discuss the current state of flash memory applications and how these technologies are enabling new designs for many products in the consumer and enterprise markets. This year’s program will include three days packed with sessions, tutorials and forums on a variety of flash storage trends, including new architectures, systems and standards.

NVMe technology, and its impact on enterprise flash applications, is among the major topics that will be discussed at the show. The growing industry demand to unlock flash storage’s full potential by leveraging high performance networking has led to the NVMe community to develop a new standard for fabrics. NVMe over Fabrics (NVMe/F) allows flash storage devices to communicate over RDMA fabrics, such as InfiniBand and RDMA over Converged Ethernet (RoCE), and thereby enabling all flash arrays to overcome existing performance bottlenecks.

Attending FMS 2016?

If you’re attending FMS 2016 and are interested in learning more about the importance of RDMA fabrics for NVMe/F solutions, I recommend the following two educational sessions:

NVMe over Fabrics Panel – Which Transport Is Best?
Tuesday, August 9, 2016 (9:45-10:50 a.m.)

Representatives from the IBTA will join a panel to discuss the value of RDMA interconnects for the NVMe/F standard. Attendees can expect to receive an overview of each RDMA fabric and the benefits they bring to specific applications and workloads. Additionally, the session will cover the promise that NVMe/F has for unleashing the potential performance of NVMe drives via mainstream high performance interconnects.

Beer, Pizza and Chat with the Experts
Tuesday, August 9, 2016 (7-8:30 p.m.)

This informal event encourages attendees to “sit and talk shop” with experts about a diverse set of storage and networking topics. As IBTA’s Marketing Work Group Co-Chair, I will be hosting a table focused on RDMA interconnects. I’d love to meet with you to answer questions about InfiniBand and RoCE and discuss the advantages they provide the flash storage industry.

Additionally, there will be various IBTA member companies exhibiting on the show floor, so stop by their booths to learn about the new InfiniBand and RoCE solutions:

·HPE (#600)

· Keysight Technologies (#810)

· Mellanox Technologies (#138)

· Tektronix (#641)

· University of New Hampshire InterOperability Lab (#719)

For more information on the FMS 2016 program and exhibitors, visit the event website.

Bill Lee

InfiniBand Experts Discuss Latest Trends and Opportunities at OFA Workshop 2016

May 24th, 2016

ofaworkshop2016

Each year, OpenFabrics Software (OFS) users and developers gather at the OpenFabrics Alliance (OFA) Workshop to discuss and tackle the most recent challenges facing the high performance storage and networking industry. OFS is an open-source software that enables maximum application efficiency and performance agnostically over RDMA fabrics, including InfiniBand and RDMA over Converged Ethernet (RoCE). The work of the OFA supports mission critical applications in High Performance Computing (HPC) and enterprise data centers, but is also quickly becoming significant in cloud and hyper-converged markets.

In our previous blog, we showcased an IBTA sponsored session that provided an update on InfiniBand virtualization support. In addition to our virtualization update, there were a handful of other notable sessions that highlighted the latest InfiniBand developments, case studies and tutorials. Below is a collection of notable InfiniBand focused sessions that we recommend you check out:

InfiniBand as Core Network in an Exchange Application
Ralph Barth, Deutsche Börse AG; Joachim Stenzel, Deutsche Börse AG

Group Deutsche Boerse is a global financial service organization covering the entire value chain from trading, market data, clearing, settlement to custody. While reliability has been a fundamental requirement for exchanges since the introduction of electronic trading systems in the 1990s, since about 10 years also low and predictable latency of the entire system has become a major design objective. Both issues have been important architecture considerations, when Deutsche Boerse started to develop an entirely new derivatives trading system T7 for its options market in the US (ISE) in 2008. As the best fit at the time a combination of InfiniBand with IBM® WebSphere® MQ Low Latency Messaging (WLLM) as the messaging solution was determined. Since then the same system has been adopted for EUREX, one of the largest derivatives exchanges in the world, and is now also extended to cover cash markets. The session presents the design of the application and its interdependence with the combination of InfiniBand and WLLM. Also practical experiences with InfiniBand in the last couple of years will be reflected upon.

Download: Slides / Video


Experiences in Writing OFED Software for a New InfiniBand HCA
Knut Omang, Oracle

This talk presents experiences, challenges and opportunities as lead developer in initiating and developing OFED stack support (kernel and user space driver) for Oracles InfiniBand HCA integrated in the new SPARC Sonoma SoC CPU. In addition to the physical HCA function SR/IOV is supported with vHCAs visible to the interconnect as connected to virtual switches. Individual driver instances for the vHCAs maintains page tables set up for the HCAs MMU for memory accessible from the HCA. The HCA is designed to scale to a large number of QPs. For minimal overhead and maximal flexibility, administrative operations such as memory invalidations also use an asynchronous work request model similar to normal InfiniBand traffic.

Download: Slides / Video

Fabrics and Topologies for Directly Attached Parallel File Systems and Storage Networks
Susan Coulter, Los Alamos National Laboratory

InfiniBand fabrics supporting directly attached storage systems are designed to handle unique traffic patterns, and they contain different stress points than other fabrics. These SAN fabrics are often expected to be extensible in order to allow for expansion of existing file systems and addition of new file systems. The character and lifetime of these fabrics is distinct from those of internal compute fabrics, or multi-purpose fabrics. This presentation covers the approach to InfiniBand SAN design and deployment as experienced by the High Performance Computing effort at Los Alamos National Laboratory.

Download: Slides / Video


InfiniBand Topologies and Routing in the Real World
Susan Coulter, Los Alamos National Laboratory; Jesse Martinez, Los Alamos National Laboratory

As with all sophisticated and multifaceted technologies - designing, deploying and maintaining high-speed networks and topologies in a production environment and/or at larger scales can be unwieldy and surprising in their behavior. This presentation illustrates that fact via a case study from an actual fabric deployed at Los Alamos National Laboratory.

Download: Slides / Video


InfiniBand Routers Premier
Mark Bloch, Mellanox Technologies; Liran Liss, Mellanox Technologies

InfiniBand has gone a long way in providing efficient large-scale high performance connectivity. InfiniBand subnets have shown to scale to tens of thousands of nodes, both in raw capacity and in management. As demand for computing capacity increases, future clusters sizes might exceed the number of addressable endpoints in a single IB subnet (around 40K nodes). To accommodate such clusters, a routing layer with the same latencies and bandwidth characteristics as switches is required.

In addition, as data center deployments evolve, it becomes beneficial to consolidate resources across multiple clusters. For example, several compute clusters might require access to a common storage infrastructure. Routers can enable such connectivity while reducing management complexity and isolating intra-subnet faults. The bandwidth capacity to storage may be provisioned as needed.

This session reviews InfiniBand routing operation and how it can be used in the future. Specifically, we will cover topology considerations, subnet management issues, name resolution and addressing, and potential implications for the host software stack and applications.

Download: Slides

Bill Lee

Author: admin Categories: InfiniBand, RDMA Tags: , ,

OpenFabrics Software Users and Developers Receive InfiniBand Virtualization Update at the 2016 OFA Workshop

April 26th, 2016

capture

The InfiniBand architecture is a proven network interconnect standard that provides benefits for bandwidth, efficiency and latency, while also boasting an extensive roadmap of future performance increases. Initially adopted by the High Performance Computing industry, a growing number of enterprise data centers are demanding the performance capabilities that InfiniBand has to offer. InfiniBand data center use cases vary widely, ranging from physical network foundations transporting compute and storage traffic to enabling Platform-as-a-Service (PaaS) in cloud service providers.

Today’s enterprise data center and cloud environments are also seeing an increased use of virtualized workloads. Using virtualized servers allows data center managers to create a common shared pool of resources from a single host. Virtualization support in the Channel Adapter enables different software entities to interact independently with the fabric. This effectively creates an efficient service-centric computing model capable of dynamic resource utilization and scalable performance, while reducing overhead costs.

Earlier this month at the OpenFabrics Alliance (OFA) Workshop 2016 in Monterey, CA, Liran Liss of member company Mellanox Technologies provided an update on the IBTA’s ongoing work to standardize InfiniBand virtualization support. He explained that the IBTA Management Working Group’s goals include making the InfiniBand Virtualization Annex scalable, explicit, backward compatible and, above all, simple in both implementation and management. Liss specifically covered the concepts of InfiniBand Virtualization, and its manifestation in the host software stack, subnet management and monitoring tools.

The IBTA effort to support virtualization is nearing completion as the annex enters its final review period from other working groups. If you were unable to attend the OFA Workshop 2016 and would like to learn more about InfiniBand virtualization, download the official slides or watch a video of the presentation via insideHPC.

Bill Lee

Changes to the Modern Data Center – Recap from SDC 15

October 19th, 2015

sdc15_logo
The InfiniBand Trade Association recently had the opportunity to speak on RDMA technology at the 2015 Storage Developer Conference. For the first time, SDC15 introduced Pre-conference Primer Sessions covering topics such as Persistent Memory, Cloud and Interop and Data Center Infrastructure. Intel’s David Cohen, System Architect and Brian Hausauer, Hardware Architect spoke on behalf of IBTA in a pre-conference session and discussed “Nonvolatile Memory (NVM), four trends in the modern data center and implications for the design of next generation distributed storage systems.”

Below is a high level overview of their presentation:

The modern data center continues to transform as applications and uses change and develop. Most recently, we have seen users abandon traditional storage architectures for the cloud. Cloud storage is founded on data-center-wide connectivity and scale-out storage, which delivers significant increases in capacity and performance, enabling application deployment anytime, anywhere. Additionally, job scheduling and system balance capabilities are boosting overall efficiency and optimizing a variety of essential data center functions.

Trends in the modern data center are appearing as cloud architecture takes hold. First, the performance of network bandwidth and storage media is growing rapidly. Furthermore, operating system vendors (OSV) are optimizing the code path of their network and storage stacks. All of these speed and efficiency gains to network bandwidth and storage are occurring while single processor/core performance remains relatively flat.

Data comes in a variety of flavors, some of which is accessed frequently for application I/O requests and others that are rarely retrieved. To enable higher performance and resource efficiency, cloud storage uses a tiering model to access data based on what is accessed most often. Data that is regularly accessed is stored on expensive, high performance media (solid-state drives). Data that is hardly or never retrieved is relegated to less expensive media with the lowest $/GB (rotational drives). This model follows a Hot, Warm and Cold data pattern and allows you faster access to what you use the most.

The growth of high performance storage media is driving the need for innovation in the network, primarily addressing application latency. This is where Remote Direct Memory Access (RDMA) comes into play. RDMA is an advanced, reliable transport protocol that enhances the efficiency of workload processing. Essentially, it increases data center application performance by offloading the movement of data from the CPU. This lowers overhead and allows the CPU to focus its processing power on running applications, which in turn reduces latency.

Demand for cloud storage is increasing and the need for RDMA and high performance storage networking grows as well. With this in mind, the InfiniBand Trade Association is continuing its work developing the RDMA architecture for InfiniBand and Ethernet (via RDMA over Converged Ethernet or RoCE) topologies.

Bill Lee

RoCE Benefits on Full Display at Ignite 2015

May 27th, 2015

ignite-2015

On May 4-8, IT professionals and enterprise developers gathered in Chicago for the 2015 Microsoft Ignite conference. Attendees were given a first-hand glimpse at the future of a variety of Microsoft business solutions through a number of sessions, presentations and workshops.

Of particular note were two demonstrations of RDMA over Converged Ethernet (RoCE) technology and the resulting benefits for Windows Server 2016. In both demos, RoCE technology showed significant improvements over Ethernet implementations without RDMA in terms of throughput, latency and processor efficiency.

Below is a summary of each presentation featuring RoCE at Ignite 2015:

Platform Vision and Strategy (4 of 7): Storage Overview
This demonstration highlighted the extreme performance and scalability of Windows Server 2016 through RoCE enabled servers populated with NVMe and SATA SSDs. It simulated application and user workloads using SMB3 servers with Mellanox ConnectX-4 100 GbE RDMA enabled Ethernet adapters, Micron DRAM and enterprise NVMe SSDs for performance and SATA SSDs for capacity.

During the presentation, the use of RoCE compared to TCP/IP showcased drastically different performance. With RDMA enabled, the SMB3 server was able to achieve about twice the throughput, half the latency and around 33 percent less CPU overhead than that attained by TCP/IP.

Check out the video to see the demonstration in action.

Enabling Private Cloud Storage Using Servers with Local Disks

Claus Joergensen, a principal program manager at Microsoft, demonstrated a Windows Server 2016’s Storage Spaces Direct with Mellanox’s ConnectX-3 56Gb/s RoCE with Micron RAM and M500DC local SATA storage.

The goal of the demo was to highlight the value of running RoCE on a system as it related to performance, latency and processor utilization. The system was able to achieve a combined 680,000 4KB IOPS and 2ms latency when RoCE was disabled. With RoCE enabled, the system increased the 4KB IOPS to about 1.1 million and reduced the latency to 1ms. This translated roughly to a 40 percent increase in performance with RoCE enabled, all while utilizing the same amount of CPU resources.

For additional information, watch a recording of the presentation (demonstration starts at 57:00).

For more videos from Ignite 2015, visit Ignite On Demand.

Bill Lee