Archive for February, 2012

RoCE and InfiniBand: Which should I choose?

February 13th, 2012

The IBTA wrapped up the four part fall Webinar Series in December, and if you didn’t have the opportunity to attend these events live, there is a recorded version available on the IBTA’s website.  In the webinar series, we suggested the idea that it makes sense to take a fresh look at I/O in light of recent developments in I/O and data center architecture. We took a high level look at two RDMA technologies which were InfiniBand and a relative new comer called RoCE - RDMA over Converged Ethernet . 

RDMA is an interesting network  technology that has been dominant in the HPC marketplace for quite a while and is now finding increasing application in modern commercial data centers, especially in performance sensitive environments or environments that depend on an agile, cost constrained approach to computing, for example almost any form of cloud computing.  So it’s no surprise that several questions arose during the webinar series about the differences between a “native” InfiniBand RDMA fabric and one based on RoCE.   In a nutshell, the questions boiled down to this:  What can InfiniBand do that RoCE cannot?  If I start down the path of deploying RoCE, why not simply stick with it, or should I plan to migrate to IB?”

As a quick review, RoCE is a new technology that is best thought of as a network that delivers many of the advantages of RDMA, such as lower latency or improved CPU utilization, but using a Ethernet switched fabric instead of InfiniBand adapters and switches.  This is illustrated in the diagram below.  Conceptually, RoCE is simple enough, but there is a subtlety that is easy to overlook.  Many of us, when we think of Ethernet, naturally envision the complete IP architecture consisting of TCP, IP and Ethernet.  But the truth is that RoCE bears no relationship to traditional TCP/IP/Ethernet, even though it uses an Ethernet layer.  The diagram also compares the two RDMA technologies to traditional TCP/IP/Ethernet.   As the drawing makes clear, RoCE and InfiniBand are sibling technologies, but are only distant cousins to TCP/IP/Ethernet.   Indeed, RoCE’s heritage is found in the basic InfiniBand architecture and is fully supported by the open source software stacks provided by the Open Fabrics Alliance.  So if it’s possible to use Ethernet and still harvest the benefits of RDMA, what’s to choose between the two?   Naturally, there are trade-offs to be made.



During the webinar we presented the following chart as a way to illustrate some of the trade-offs that one might encounter in choosing an I/O architecture.  The first column shows a pure Ethernet approach, as is common in most data centers today.  In this scenario, the data center rides the wave of improvements in Ethernet speeds.  Naturally, using traditional TCP/IP/Ethernet, you don’t get any of the RDMA advantages.   For this blog, our interest is mainly in the middle and right hand columns which focus on the two alternate implementations of RDMA technology.  


From the application perspective both RoCE and native InfiniBand present the same API and provide about the same sets of services.  So what are the differences between them?  They really break down into four distinct areas. 

  • Wire speed and the bandwidth roadmap. The roadmap for Ethernet is maintained by the IEEE and is designed to suit the needs of a broad range of applications ranging from home networks to corporate LANs to data center interconnects and even wide area networking. Naturally, each type of application has unique requirements and different speed requirements. For example, client networking does not have the speed requirements that are typical of a data center application. Of this wide range of applications the Ethernet roadmap naturally tends to reflect the bulk of its intended market, even though speed grades more representative of data center needs (40 and 100GbE) have recently been introduced. The InfiniBand roadmap on the other hand, is maintained by the InfiniBand Trade Association and has one focus, which is to be the highest performance data center interconnect possible. Commodity InfiniBand components (NICs and switches) at 40Gb/s have been in wide distribution for several years now, and a new 56Gb/s speed grade has recently been announced. Although the InfiniBand and Ethernet roadmaps are slowly converging, it is still true that the InfiniBand bandwidth roadmap leads the Ethernet roadmap. So if bandwidth is a serious concern, you would probably want to think about deploying an InfiniBand fabric.  ib-roadmap

                      InfiniBand Speed Roadmap

  •  Adoption curve. Historically, next generation Ethernet has been deployed first as a backbone (switch-to-switch) technology and eventually trickled down to the end nodes. 10GbE was ratified in 2002, but until 2007 almost all servers connected to the Ethernet fabric using 1GbE, with 10GbE reserved for the backbone. The same appears to be true for 40 and 100GbE; although the specs were ratified by the IEEE in 2010, an online search for 40GbE NICs reveals only one 40GbE NIC product in the marketplace today. Server adapters for InfiniBand on the other hand, are ordinarily available coincident with the next announced speed bump allowing servers to connect to an InfiniBand network at the very latest speed grades right away. 40Gb/s InfiniBand HCAs, known as QDR, have been available for a number of years now, and new adapter products matching the next roadmap speed bump, known as FDR, were announced at SC11 this past fall. The important point here is that one trade-off to be made in deciding between RoCE and native InfiniBand is that RoCE allows you to preserve your familiar Ethernet switched fabric, but at the price of a slower adoption curve compared to native InfiniBand.
  • Fabric management. RoCE and InfiniBand both offer many of the features of RDMA, but there is a fundamental difference between an RDMA fabric built on Ethernet using RoCE and one built on top of native InfiniBand wires. The InfiniBand specification describes a complete management architecture based on a central fabric management scheme which is very much in contrast to traditional Ethernet switched fabrics, which are generally managed autonomously. InfiniBand’s centralized management architecture, which gives its fabric manager a broad view of the entire layer 2 fabric, allows it to provide advanced fabric features such as support for arbitrary layer 2 topologies, partitioning, QoS and so forth. These may or may not be important in any particular environment, but by avoiding the limitations of the traditional spanning tree protocol, InfiniBand fabrics can maximize bi-sectional bandwidth and thereby take full advantage of the fabric capacity. That’s not to say that there are not proprietary solutions in the Ethernet space, or that there is no work underway to improve Ethernet management schemes, but again, if these features are important in your environment, that may impact your choice of native InfiniBand compared to an Ethernet-based RoCE solution. So when choosing between an InfiniBand fabric and a RoCE fabric, it makes sense to consider the management implications.
  • Link level flow control vs. DCB. RDMA, whether native InfiniBand or RoCE, works best when the underlying wires implement a so-called lossless fabric. A lossless fabric is one where packets on the wire are not routinely dropped. By comparison, traditional Ethernet is considered a lossy fabric since it frequently drops packets, relying on the TCP transport layer to notice these lost packets and to adjust for them. InfiniBand, on the other hand, uses a technique known as link level flow control, which ensures that packets are not dropped in the fabric except in the case of serious errors. This technique helps explain much of InfiniBand’s traditionally high bandwidth utilization efficiency. In other words, you get all the bandwidth for which you’ve paid. When using RoCE, you can accomplish almost the same thing by deploying the latest version of Ethernet sometimes known as Data Center Bridging, or DCB. DCB comprises five new specifications from the IEEE which taken together provide almost the same lossless characteristic as InfiniBand’s link level flow control. But there’s a catch; to get the full benefit of DCB requires that your switches and NICs implement the important parts of these new IEEE specifications. I would be very interested to hear from anybody who has experience with these new features in terms of how complex they are to implement in products, how well they work in practice, and if there are any special management challenges.

As we pointed out in the webinars, there are many practical routes to follow on the path to an RDMA fabric.  In some environments, it is entirely likely that RoCE will be the ultimate destination, providing many of the benefits of RDMA technology while preserving major investments in existing Ethernet.  In some other cases, RoCE presents a great opportunity to become familiar with RDMA on the way toward implementing the highest performance solution based on InfiniBand.  Either way, it makes sense to understand some of these key differences in order to make the best decision going forward.

If you didn’t get a chance to attend any of the webinars or missed one of the parts, be sure to check out the recording here on the IBTA website.  Or, if you have any lingering questions about the webinars or InfiniBand and RoCE, email me at






Paul Grun
System Fabric Works

Author: admin Categories: Uncategorized Tags:

InfiniBand: What about it ?

February 10th, 2012

Recently, there has been a lot of conversation around InfiniBand. Members of the IBTA often take our knowledge of InfiniBand technology for granted, which is why we are happy to see more exploratory discussion and education conversations happening. If you’re interested in finding out more about InfiniBand the IBTA has a number of resources for you to check out, including a product roadmap, put together by the IBTA’s members.

Additionally, we wanted to share a recent blog post by Oracle’s Neeraj Gupta, which succinctly introduces the InfiniBand technology to those who may be unfamiliar with it.

Looking forward to more discussion and education on InfiniBand in the coming weeks.

Brian Sparks and Skip Jones
IBTA Marketing Working Group Co-Chairs

InfiniBand: What about it ?

Heard the buzz word - InfiniBand ? And wondering what it is ? Here is some information to get you started.

I am quite sure that you are already familiar with more common networking technologies like Ethernet and various Wireless media these days. InfiniBand is yet another but it does not reach out to us in our daily lives as much as others and probably thats the reason you are still interested in reading about it here :)

InfiniBand is meant to provide interconnect for high end computing environments by providing high bandwidth under extremely low latency. In other words, it enables computing end points to exchange more data, faster. Lets compare InfiniBand with Ethernet based on various product offerings today.

Ethernet most commonly offer 1Gb/sec and 10Gb/sec bandwidth. InfiniBand offer upto 40Gb/s bandwidth with lower latency then observed on Ethernet media.

I would like to point out that these are raw bandwidths and the actual throughput is usually lower which depends on messaging protocols across end points. I will talk about this more later.

In recent years of technology evolution, computing platforms’ capabilities have reached a point where they can use a better and higher speed network to communicate with peer platforms more efficiently. We refer to the term - bottlenecks, when such scenarios occur. In high demanding computing environments, InfiniBand solves this problem by allowing computers to exchange more data faster.

So, what do you need to get on this high speed data highway ? Not likely that same equipment will work. You are right !

InfiniBand requires specialized hardware equipment. Each computing end point needs an I/O card that we call as Host Channel Adapter or HCA. They connect to InfiniBand Switches using special cables that are engineered to carry your data at this high rate with precision.

Oh wow ! So, do I need to re-write my applications here ? I do not have time to do that !

I know you will ask this at this point. The answer is “no”. Before I go any further, let me just state that InfiniBand follows well known industry standard for networking and this is known as Open Systems Interconnect or OSI. This model offers seven layers and just like ethernet, they apply to InfiniBand as well. Now, let me come back to the original point. We dont need to re-write our entire applications because InfiniBand technology enables very seamless integration.

The new hardware that we just talked about integrates and presents itself to your application in a very similar way as Ethernet. Your view into the network remains same and you continue to interact with sockets comprised of IP addresses and ports.

Thats all for this blog. I will come back with more information on this later and open up the topic in details. Thanks for reading !

*Reposted with permission from Oracle’s Networking Blog and Neeraj Gupta

Author: admin Categories: Uncategorized Tags:

Statement from QLogic

February 6th, 2012

Recently, QLogic Corporation sold Intel Corporation the QLogic InfiniBand business unit. QLogic continues its focus and expansion of its Ethernet product-line, as well as RoCE and iWARP protocols, amongst others, over Ethernet. QLogic remains committed to its close relationships and positions within the IBTA.








Skip Jones
IBTA Marketing Working Group Co-Chair

Author: admin Categories: Uncategorized Tags: