Archive

Archive for March, 2017

IBTA to Feature Optimized Testing, Debugging Procedures Onsite at Plugfest 31

March 19th, 2017

il-interop

The IBTA boasts one of the industry’s top compliance and interoperability programs, which provides device and cable vendors the opportunity to test their products for compliance with the InfiniBand architecture specification as well as interoperability with other InfiniBand and RoCE products. The IBTA Integrators’ List program produces two lists, the InfiniBand Integrators’ List and the RoCE Interoperability List, which are updated twice a year following bi-annual plugfests.

We’re pleased to announce that the results from Plugfest 29 are now available on the IBTA Integrators’ List webpage, while Plugfest 30 results will be made available in the coming weeks. These results are designed to support data center managers, CIOs and other IT decision makers with their planned deployment of InfiniBand and RoCE solutions in both small clusters and large-scale clusters of 1,000 nodes or more.

Changes for Plugfest 31

IBTA Plugfest 31, taking place April 17-28 at the University of New Hampshire Interoperability Lab, is just around the corner and we are excited to announce some significant updates to our testing processes and procedures. These changes originated from efforts at last year’s plugfests and will be fully implemented onsite for the first time at Plugfest 31.

Changes:

  1. We will no longer be testing QDR but we are adding HDR (200 GHz) testing.
  2. Keysight VNA testing is now performed using a 32 port VNA to enable testing of all 8 lanes.
  3. Software Forge (SFI) has developed all new MATLAB code that will allow real time processing of the 32 port s-parameter files generated by the Keysight VNA. This allows us to test and post process VNA results in less than 2 minutes per cable.
  4. Anritsu, Keysight and Software Forge have teamed together to bring hardware and software solutions that allow for real time VNA & ATD testing. This allows direct vendor participation and validation during the Plugfest.

Benefits:

  1. Anritsu and Keysight bring the best leading edge equipment to the Plugfest twice per year.
    1. See the Methods of Implementation for details.
  2. The IBTA also has access to SFI software that allows the Plugfest engineers to post process the results in real time. Therefore we are now able to do real time interactive testing and debugging while your engineers are at the Plugfest.
  3. We are offering a dedicated guaranteed 5 hour time slot for each vendor to debug and review their test results. Additional time will be available but will be allocated during the Plugfest after all vendors are allocated the initial 5 hours. See the registration to choose your time slot.
  4. Arbitration will occur during the Plugfest and not afterwards. This is because we only have access to the EDR and HDR test equipment at the bi-annual IBTA Plugfests.
  5. Results from the IBTA Plugfest will now be available much more quickly since the post processing time has been reduced so dramatically.
  6. We are strongly encouraging vendors to send engineers to this event so that you can compare your results with ours and do any necessary debugging and validation. This interactive debugging and testing opportunity is the best in any of the high speed industries and is provided to you as part of your IBTA Membership. Please take advantage of it.
  7. We will be providing both InfiniBand and RoCE Interoperability testing at PF31.

Interested in attending IBTA Plugfest 31? Registration can be completed on the IBTA Plugfest page. The March 20 registration deadline is fast approaching, so don’t delay!

Rupert Dance, IBTA CIWG

Rupert Dance, IBTA CIWG

Author: admin Categories: IBTA Plugfest, Plugfest, RDMA, RoCE Tags:

InfiniBand and RoCE to Make Their Mark at OFA Workshop 2017

March 16th, 2017

openfabriclogo

The OpenFabrics Alliance (OFA) workshop is an annual event devoted to advancing the state of the art in networking. The workshop is known for showcasing a broad range of topics all related to network technology and deployment through an interactive, community-driven event. The comprehensive event includes a rich program made up of more than 50 sessions covering a variety of critical networking topics, which range from current deployments of RDMA to new and advanced network technologies.

To view the full list of abstracts, visit the OFA Workshop 2017 Abstracts and Agenda page.

This year’s workshop program will also feature some notable sessions that showcase the latest developments happening for InfiniBand and RoCE technology. Below are is the collection of OFA Workshop 2017 sessions that we recommend you check out:

Developer Experiences of the First Paravirtual RDMA Provider and Other RDMA Updates
Presented by Adit Ranadive, VMware

VMware’s Paravirtual RDMA (PVRDMA) device is a new NIC in vSphere 6.5 that allows VMs in a cluster to communicate using Remote Direct Memory Access (RDMA), while maintaining latencies and bandwidth close to that of physical hardware. Recently, the PVRDMA driver was accepted as part of the Linux 4.10 kernel and our user-library was added as part of the new rdma-core package.

In this session, we will provide a brief overview of our PVRDMA design and capabilities. Next, we will discuss our development approach and challenges for joint device and driver development. Further, we will highlight our experience for upstreaming the driver and library with the new changes to the core RDMA stack.

We will provide an update on the performance of the PVRDMA device along with upcoming updates to the device capabilities. Finally, we will provide new results on the performance achieved by several HPC applications using VM DirectPath I/O.

This session seeks to engage the audience in discussions on: 1) new RDMA provider development and acceptance, and 2) hardware support for RDMA virtualization.

Experiences with NVMe over Fabrics
Presented by Parav Pandit, Mellanox

NVMe is an interface specification to access non-volatile storage media over PCIe buses. The interface enables software to interact with devices using multiple, asynchronous submission and completion queues, which reside in memory. Consequently, software may leverage the inherent parallelism and low latency of modern NMV devices with minimal overhead. Recently, the NMVe specification has been extended to support remote access over fabrics, such as RDMA and Fibre Channel. Using RDMA, NVMe over Fabrics (NVMe-oF) provides the high BW and low-latency characteristics of NVMe to remote devices. Moreover, these performance traits are delivered with negligible CPU overhead as the bulk of the data transfer is conducted by RDMA.

In this session, we present an overview of NVMe-oF and its implementation in Linux. We point out the main design choices and evaluate NVMe-oF performance for both InfiniBand and RoCE fabrics.

Validating RoCEv2 for Production Deployment in the Cloud Datacenter
Presented by Sowmini Varadhan, Oracle

With the increasing prevalence of ethernet switches and NICs in Data Center Networks, we have been experimenting with the deployment of RDMA over Commodity Ethernet (RoCE) in our DCN. RDMA needs a lossless transport, and, in theory, this can be achieved on ethernet by using priority based PFC (IEEE 802.1qbb) and ECN (IETF RFC 3168).

We describe our experiences in trying to deploy these protocols in a RoCEv2 testbed running @ 100 Gbit/sec consisting of a multi-level CLOS network.

In addition to addressing the documented limitations around PFC/ECN (livelock, pause-frame-storm, memory requirements for supporting multiple priority flows), we also hope to share some of the performance metrics gathered, as well as some feedback on ways to improve the tooling for observability and diagnosability of the system in a vendor-agnostic, interoperable way.

Host Based InfiniBand Network Fabric Monitoring
Presented by Michael Aguilar, Sandia National Laboratories

Synchronized host based InfiniBand network counter monitoring of local connections at 1Hz can provide a reasonable system snapshot understanding of traffic injection/ejection into/from the fabric. This type of monitoring is currently used to enable understanding about the data flow characteristics of applications and inference about congestion based on application performance degradation. It cannot, however, enable identification of where congestion occurs or how well adaptive routing algorithms and policies react to and alleviate it. Without this critical information the fabric remains opaque and congestion management will continue to be largely handled through increases in bandwidth. To reduce fabric opacity, we have extended our host based monitoring to include internal InfiniBand fabric network ports. In this presentation we describe our methodology along with preliminary timing and overhead information. Limitations and their sources are discussed along with proposed solutions, optimizations, and planned future work.

IBTA TWG - Recent Topics in the IBTA, and a Look Ahead
Presented by Bill Magro, Intel on behalf of InfiniBand Trade Association

This talk discusses some recent activities in the IBTA including recent specification updates. It also provides a glimpse into the future for the IBTA.

InfiniBand Virtualization
Presented by Liran Liss, Mellanox on behalf of InfiniBand Trade Association

InfiniBand Virtualization allows a single Channel Adapter to present multiple transport endpoints that share the same physical port. To software, these endpoints are exposed as independent Virtual HCAs (VHCAs), and thus may be assigned to different software entities, such as VMs. VHCAs are visible to Subnet Management, and are managed just like physical HCAs. This session provides an overview of the InfiniBand Virtualization Annex, which was released on November 2016. We will cover the Virtualization model, management, addressing modes, and discuss deployment considerations.

IPoIB Acceleration
Presented by Tzahi Oved, Mellanox

The IPoIB protocol encapsulates IP packets over InfiniBand datagrams. As a direct RDMA Upper Layer Protocol (ULP), IPoIB cannot support HW features that are specific to the IP protocol stack. Nevertheless, RDMA interfaces have been extended to support some of the prominent IP offload features, such as TCP/UDP checksum and TSO. This provided reasonable performance for IPoIB.

However, new network interface features are one of the most active areas of the Linux kernel. Examples include TSS and RSS, tunneling offloads, and XDP. In addition, the basic IP offload features are insufficient to cope with the increasing network bandwidth. Rather than continuously porting IP network interface developments into the RDMA stack, we propose adding abstract network data-path interfaces to RDMA devices.

In order to present a consistent interface to users, the IPoIB ULP continues to represent the network device to the IP stack. The common code also manages the IPoIB control plane, such as resolving path queries and registering to multicast groups. Data path operations are forwarded to devices that implement the new API, or fallback to the standard implementation otherwise. Using the forgoing approach, we show how IPoIB closes the performance gap compared to state-of-the-art Ethernet network interfaces.

Packet Processing Verbs for Ethernet and IPoIB
Presented by Tzahi Oved, Mellanox

As a prominent user-level networking API, the RDMA stack has been extended to support packet processing applications and user-level TCP/IP stacks, initially focusing on Ethernet. This allowed delivering low latency and high message-rate to these applications.

In this talk, we provide an extensive introduction to both current and upcoming packet processing Verbs, such as checksum offloads, TSO, flow steering, and RSS. Next, we describe how these capabilities may also be applied to IPoIB traffic.

In contrast to Ethernet support, which was based on Raw Ethernet QPs that receive unmodified packets from the wire, IPoIB packets are sent over a “virtual wire”, managed by the kernel. Thus, processing selective IP flows from user-space requires coordination with the IPoIB interface.

The Linux SoftRoCE Driver
Presented by Liran Liss, Mellanox

SoftRoCE is a software implementation of the RDMA transport protocol over Ethernet. Thus, any host to conduct RDMA traffic without necessitating a RoCE-capable NIC, allowing RDMA development anywhere.

This session presents the Linux SoftRoCE driver, RXE, which was recently accepted to the 4.9 kernel. In addition, the RXE user-level driver is now part of rdma-core, the consolidated RDMA user-space codebase. RXE is fully interoperable with HW RoCE devices, and may be used for both testing and production. We provide an overview of the RXE driver, detail its configuration, and discuss the current status and remaining challenges in RXE development.

Ubiquitous RoCE
Presented by Alex Shpiner, Mellanox

In recent years, the usage of RDMA in datacenter networks has increased significantly, with RoCE (RDMA over Converged Ethernet) emerging as the canonical approach to deploying RDMA in Ethernet-based datacenters.

Initially, RoCE required a lossless fabric for optimal performance. This is typically achieved by enabling Priority Flow Control (PFC) on Ethernet NICs and switches. The RoCEv2 specification introduced RoCE congestion control, which allows throttling transmission rate in response to congestion. Consequently, packet loss may be minimized and performance is maintained even if the underlying Ethernet network is lossy.

In this talk, we discuss the details of latest developments in the RoCE congestion control. Hardware congestion control reduces the latency of the congestion control loop; it reacts promptly in the face of congestion by throttling the transmission rate quickly and accurately; when congestion is relieved, bandwidth is immediately recovered. The short control loop also prevents network buffers from overfilling in many congestion scenarios.

In addition, fast hardware retransmission complements congestion control in heavy congestion scenarios, by significantly reducing the penalty of packet drops.

Keep an eye out as videos of the OFA Workshop 2017 sessions will be published on both the OFA website and insideHPC. Interested in attending? Registration for the 13th Annual OFA Workshop will be available online and onsite up until the opening day of the event, March 27. Visit the OFA Workshop 2017 Registration page for more information.

Bill Lee

Author: admin Categories: InfiniBand, OpenFabrics Alliance, RoCE Tags: