Infiniband – SNIA on Storage

Last month, the SNIA Networking Storage Forum (NSF) hosted a webcast on how increases in networking speeds are impacting storage. If you missed the live webcast, New Landscape of Network Speeds, it’s now available on-demand. We received several interesting questions on this topic. Here are our experts’ answers: Q. What are the cable distances for 2.5 and 5G Ethernet? A. 2.5GBASE-T and 5GBASE-T Ethernet are designed to run on existing UTP cabling, so it should reach 100 meters on both Cat5e and Cat6 cabling. Reach of 5GBASE-T on Cat 5e may be less under some conditions, for example if many cables are bundled tightly together. Cabling guidelines and field test equipment are available to aid in the transition. Read More

At our recent live Webcast “Benefits of RDMA in Accelerating Ethernet Storage Connectivity” experts from Emulex, Intel and Microsoft had an insightful discussion on the ways RDMA is having an impact on Ethernet storage. The live event was attended by nearly 200 people and feedback was overwhelming positive with several attendees thanking us for our vendor neutral presentation and one attendee commenting that it was, “Probably the most clearly comprehensible yet comprehensive webinar I’ve attended in some time.” If you missed the Webcast, it’s now available on demand. We did not have time to get to everyone’s questions, so as promised, below are answers to all of them. If you have additional questions, please ask them in the comments section in this blog and we’ll get back to you as soon as possible.

Q. Is RDMA over RoCEv2 in production?

A. The IBTA released the RoCEv2 Specification in September 2014. In order to support that specification changes may be required across the RDMA stack, including firmware, drivers & operating systems. Schedules for implementation of that specification will vary by operating system. For example, the OpenFabrics Alliance (OFA) has not released an Open Fabrics Enterprise Distribution (OFED) version that implements that standard yet, although it is in process now. Once OFA completes their OFED stack implementation, the Linux distribution vendors will then incorporate and support the updated OFED stack. Implementations provided prior to full OFA and Distro vendor support would be preliminary, potentially incompatible with the OFED release, and require confirmation by the distro vendor with regard to the nature/level of support they would be providing

Q. I would have liked a list of Windows applications that take advantage of SMB Direct – both in a Hyper-V host or bare metal.

A. In Windows, any file-based application can make use of SMB3 and SMB Direct due to the native file-based programming interface support. No application changes are required. For certain enterprise applications such as Hyper-V and SQL Server, SMB3 is officially supported, and more information can be found in the product catalog at www.microsoft.com.

Q. Are there any particular benefits in using one network protocol over another for SMB Direct/RDMA (iWARP vs. RoCE vs. IB)?

A. There are no hard and fast rules; any adapter or protocol can be suitable for many scenarios. Of the Ethernet-based protocols we considered in today’s webcast

iWARP offers the benefit of operation over TCP with its reliability and routability, well-suited to a broad range of installed infrastructure.
RoCE offers a lightweight, efficient protocol when a DCB-enabled switched fabric is available. RoCE, however, is not routable.
RoCEv2 offers similar properties to RoCE, with the possibility to scale to larger routed and DCB-enabled fabrics.

Q. Who are the vendors offering iWARP capable RNICs?

A. Chelsio Communications has production iWARP adapters today, and both Intel and Qlogic have publicly committed to future iWARP controllers.

Q. How much testing has been done with SMB3, and in particular SMB direct, over WAN connections?

A. The SMB2 protocol was originally designed to adapt to WAN scenarios, and supports a credit-based management of large amounts of data to be outstanding, to make best use of WAN-type long pipes. The SMB3 protocol retains these design attributes, and the SMB Direct protocol also supports similar deep pipelining. The iWARP protocol, being layered on standard TCP, is well suited to such deployments, and RoCE WAN adapters are potentially available. Please contact the respective technology vendors for information on any available testing results.

Q. I love a future webcast for RDMA enabled distributed filesystems.

A. Thanks for the suggestion! We’re always looking for ideas for future webcasts and SNIA-ESF will consider this as a potential follow-on.

Q. Is Live Migration the scenario where “packet size” is 1MB?

A. All SMB Direct scenarios have workloads that range anywhere up to 8MB. For large file copies, most SMB3 clients request from 1MB to 8MB per operation, for Hyper-V live migration, transfers are typically similar, during the bulk transfer phase.

Q. SMB3 is being compared to FC for enterprise. If Ethernet based protocols are of interest, wouldn’t FCoE give the same performance as FC (same stack) vs. SMB3?

A. SMB3 with SMB Direct enables many workloads not possible with Fibre Channel over Ethernet, and performance comparisons are therefore difficult. Perhaps another SNIA webcast could investigate this!

Q. Regarding your SMB direct example with lots of small operations, how do you deal with the overhead of registering and unregistering buffers for the RDMA operations?

A. As answered later in the session, the registration and unregistration is not a protocol matter, but in the case of the Windows implementation, it is strictly performed for the specific buffers of each operation, which is critical for security, data integrity, and system protection. The standard “Fast Register Work Request” method is used, and careful implementation has shown that the overhead does not negatively impact performance, even for small I/O (4KB/operation). Check out Jose Barreto’s blog, which contains many benchmark results.

Q. But isn’t Live Migration done in 1MB “chunks”? So not “small” I/Os?

A. As answered later in the session, Hyper-V Live Migration is done in several phases, the first phase is the initial bulk copy of memory, done in large chunks, but immediately after it a second phase of copying individual pages which were dirtied by the live-running VM is performed. These operations are typically 4KB. Note: The faster the initial phase goes, the less work there is in this second phase, but in both phases, the faster, the better, and RDMA accelerates both.

Q. Are iSER and iWARP alternatives to one another?

A. iWARP is an RDMA protocol, and iSER is a mapping of iSCSI to iWARP, as well as RoCE/InfiniBand.

Q. What’s Intel’s roadmap for RoCE and/or iWARP?

A. Intel is committed to iWARP and plans to incorporate it in future server chipsets and SOCs. See http://www.intel.com/content/www/us/en/ethernet-products/accelerating-ethernet-iwarp-video.html for more information.

Q. Is there any other Transport being used other than IB to create a reliable transport for RoceV2? Puristically it is possible?

A. RoCE was developed to leverage Infiniband as much as possible. For that reason, the Infiniband transport was chosen when the RoCE standard was developed. As the RoCEv2 standard was developed, the underlying Infiniband network protocol was replaced with IPv4 / IPv6 in order to provide the layer 3 routability and UDP to provide stateless encapsulation (and indication) of the Infiniband transport header that was retained. While it may be possible to develop a reliable transport to replace Infiniband, the RoCE standards body has elected not to go that route as of this writing.

Category: Infiniband

Storage Congestion on the Network Q&A

Network Speeds Questions Answered

Intro to Incast, Head of Line Blocking, and Congestion Management

The Impact of New Network Speeds on Storage

Benefits of RDMA in Accelerating Ethernet Storage Q&A