Category: RoCE
RoCE vs. iWARP Q&A
Dive into NVMe at Storage Developer Conference – a Chat with SNIA Technical Council Co-Chair Bill Martin
The SNIA Storage Developer Conference (SDC) is coming up September 24-27, 2018 at the Hyatt Regency Santa Clara CA. The agenda is now live!
SNIA on Storage is teaming up with the SNIA Technical Council to dive into major themes of the 2018 conference. The SNIA Technical Council takes a leadership role to develop the content for each SDC, so SNIA on Storage spoke with Bill Martin, SNIA Technical Council Co-Chair and SSD I/O Standards at Samsung Electronics, to understand why SDC is bringing NVMe and NVMe-oF to conference attendees.
SNIA On Storage (SOS): What is NVMe and why is SNIA emphasizing it as one of their key areas of focus for SDC?
Bill Martin (BM): NVMeTM, also known as NVM ExpressR, is an open collection of standards and information to fully expose the benefits of non-volatile memory (NVM) in all types of computing environments from mobile to data center.
SNIA is very supportive of NVMe. In fact, earlier this year, SNIA, the Distributed Management Task Force (DMTF), and the NVM Express organizations formed a new alliance to coordinate standards for managing solid state drive (SSD) storage devices. This alliance brings together multiple standards for managing the issue of scale-out management of SSDs. It’s designed to enable an all-inclusive management experience by improving the interoperable management of information technologies.
With interest both from within and outside of SNIA from architects, developers, and implementers on how these standards work, the SNIA Technical Council decided to bring even more sessions on this important area to the SDC audience this year. We are proud to include 16 sessions on NVMe topics over the four days of the conference.
SOS: What will I learn about NVMe at SDC? Read More
Ethernet Networked Storage – FAQ
At our SNIA Ethernet Storage Forum (ESF) webcast “Re-Introduction to Ethernet Networked Storage,” we provided a solid foundation on Ethernet networked storage, the move to higher speeds, challenges, use cases and benefits. Here are answers to the questions we received during the live event.
Q. Within the iWARP protocol there is a layer called MPA (Marker PDU Aligned Framing for TCP) inserted for storage applications. What is the point of this protocol?
A. MPA is an adaptation layer between the iWARP Direct Data Placement Protocol and TCP/IP. It provides framing and CRC protection for Protocol Data Units. MPA enables packing of multiple small RDMA messages into a single Ethernet frame. It also enables an iWARP NIC to place frames received out-of-order (instead of dropping them), which can be beneficial on best-effort networks. More detail can be found in IETF RFC 5044 and IETF RFC 5041.
Q. What is the API for RDMA network IPC?
The general API for RDMA is called verbs. The OpenFabrics Verbs Working Group oversees the development of verbs definition and functionality in the OpenFabrics Software (OFS) code. You can find the training content from OpenFabrics Alliance here. General information about RDMA for Ethernet (RoCE) is available at the InfiniBand Trade Association website. Information about Internet Wide Area RDMA Protocol (iWARP) can be found at IETF: RFC 5040, RFC 5041, RFC 5042, RFC 5043, RFC 5044.
Q. RDMA requires TCP/IP (iWARP), InfiniBand, or RoCE to operate on with respect to NVMe over Fabrics. Therefore, what are the advantages of disadvantages of iWARP vs. RoCE?
A. Both RoCE and iWARP support RDMA over Ethernet. iWARP uses TCP/IP while RoCE uses UDP/IP. Debating which one is better is beyond the scope of this webcast, but you can learn more by watching the SNIA ESF webcast, “How Ethernet RDMA Protocols iWARP and RoCE Support NVMe over Fabrics.”
Q. 100Gb Ethernet Optical Data Center solution?
A. 100Gb Ethernet optical interconnect products were first available around 2011 or 2012 in a 10x10Gb/s design (100GBASE-CR10 for copper, 100GBASE-SR10 for optical) which required thick cables and a CXP and a CFP MSA housing. These were generally used only for switch-to-switch links. Starting in late 2015, the more compact 4x25Gb/s design (using the QSFP28 form factor) became available in copper (DAC), optical cabling (AOC), and transceivers (100GBASE-SR4, 100GBASE-LR4, 100GBASE-PSM4, etc.). The optical transceivers allow 100GbE connectivity up to 100m, or 2km and 10km distances, depending on the type of transceiver and fiber used.
Q. Where is FCoE being used today?
A. FCoE is primarily used in blade server deployments where there could be contention for PCI slots and only one built-in NIC. These NICs typically support FCoE at 10Gb/s speeds, passing both FC and Ethernet traffic via connect to a Top-of-Rack FCoE switch which parses traffic to the respective fabrics (FC and Ethernet). However, it has not gained much acceptance outside of the blade server use case.
Q. Why did iSCSI start out mostly in lower-cost SAN markets?
A. When it first debuted, iSCSI packets were processed by software initiators which consumed CPU cycles and showed higher latency than Fibre Channel. Achieving high performance with iSCSI required expensive NICs with iSCSI hardware acceleration, and iSCSI networks were typically limited to 100Mb/s or 1Gb/s while Fibre Channel was running at 4Gb/s. Fibre Channel is also a lossless protocol, while TCP/IP is lossey, which caused concerns for storage administrators. Now however, iSCSI can run on 25, 40, 50 or 100Gb/s Ethernet with various types of TCP/IP acceleration or RDMA offloads available on the NICs.
Q. What are some of the differences between iSCSI and FCoE?
A. iSCSI runs SCSI protocol commands over TCP/IP (except iSER which is iSCSI over RDMA) while FCoE runs Fibre Channel protocol over Ethernet. iSCSI can run over layer 2 and 3 networks while FCoE is Layer 2 only. FCoE requires a lossless network, typically implemented using DCB (Data Center Bridging) Ethernet and specialized switches.
Q. You pointed out that at least twice that people incorrectly predicted the end of Fibre Channel, but it didn’t happen. What makes you say Fibre Channel is actually going to decline this time?
A. Several things are different this time. First, Ethernet is now much faster than Fibre Channel instead of the other way around. Second, Ethernet networks now support lossless and RDMA options that were not previously available. Third, several new solutions–like big data, hyper-converged infrastructure, object storage, most scale-out storage, and most clustered file systems–do not support Fibre Channel. Fourth, none of the hyper-scale cloud implementations use Fibre Channel and most private and public cloud architects do not want a separate Fibre Channel network–they want one converged network, which is usually Ethernet.
Q. Which storage protocols support RDMA over Ethernet?
A. The Ethernet RDMA options for storage protocols are iSER (iSCSI Extensions for RDMA), SMB Direct, NVMe over Fabrics, and NFS over RDMA. There are also storage solutions that use proprietary protocols supporting RDMA over Ethernet.
Q&A on Exactly How iSCSI has Evolved
Our recent SNIA ESF Webcast, “The Evolution of iSCSI” drew a big and diverse group of attendees. From beginners looking for iSCSI basics, to experts with a lot of iSCSI deployment experience, there were plenty of good questions. Our presenters, Andy Banta and Fred Knight, did a great job answering as many as they could during the live event, but we didn’t have time to get to them all. So here are answers to them all. And by the way, if you missed the Webcast, it’s now available on-demand.
Q. What are the top 3 reasons to choose iSCSI over FC SAN?
A. 1. Use of commodity equipment and protocols. It means that you don’t have to set up a completely separate network. It means you don’t have to buy separate HBAs. 2. Inherent networking capability. Built on top of TCP/IP, it benefits from any networking technology to come along. These include routing, tunneling, authentication, encryption, etc. 3. Ease of automation and configuration. In it’s simplest form, an iSCSI host only needs to know the IP address of the target system. In more complex systems, hosts and storage provide APIs to allow automation through scripting or management tools.
Q. Please comment on why SCSI went from being a widely used protocol for all sorts of devices to being focused as only essentially a storage protocol?
A. SCSI was originally designed as both a protocol and a bus (original Parallel SCSI). Because there were no other busses, the SCSI bus did it all; disks, tapes, scanners, printers, Optical (CDs), media changers, etc. As other busses came onto the market (think USB), many of those devices moved to the new bus (CDs, printers, scanners, etc.) Commodity devices used commodity busses (IDE, SATA, USB), and enterprise devices used enterprise busses (FC, SAS); and so, disks, tapes, and media changers mostly stayed on SCSI.
The name SCSI can be confusing for some, as the term originally was used for both the SCSI protocol and the SCSI bus. The term for the SCSI protocol is all that remains today; the SCSI bus (the old SCSI parallel bus) is no longer in wide use. Today, the FC bus, or the SAS bus, or the SoP bus, or the SRP bus are used to carry the SCSI protocol. The SCSI Architecture Model (SAM) describes a very distinct separation between the device layer (the SCSI protocol) and the transport layer (the bus).
And, the SCSI command set has become the basis for many subsequent command sets. The JEDEC group used the SCSI command set as a model (JEDEC devices are in your cell phone), the ATAPI devices used SCSI commands, and many SCSI commands and SATA commands have a common heritage. The Mt. Fuji group (a standards group in Japan) also uses SCSI as the basis for new DVD and BlueRay devices. So, while not widely known, the SCSI command family has grown well beyond what is managed by the ANSI/INCITS T10 committee that originally defined SCSI in to a broad set of capabilities that are used across the industry, by a broad group of organizations. But, that all said, scanners and printers are still on USB, and SCSI is almost all about storage in one form or another.
Q. How does iSCSI support software-defined storage?
A. Answered during the talk. SDS provides more automation and knobs on the storage capabilities. But SDS still needs a way to transport the storage and iSCSI works perfectly fine for that. They are complementary technologies, not competing.
Q. With 40Gb and faster coming soon to a server near you, what kind of impact will that have on CPU utilization? Will smaller servers be able to push that much traffic?
A. More throughput simply requires more CPU. With good multithreaded drivers available, this can mean simply adding cores to keep the pipe as full as possible. As we mentioned near the end, using iSCSI with RDMA lightens the load on the CPU even more, so you’ll probably be seeing more of that.
Q. Is IPSec commonly supported on iSCSI targets?
A. Yes, IPsec is required to be implemented on an iSCSI target to be a compliant device. However, it is not commonly enabled by customers. If they MUST provide IPsec there are a lot of non-compliant initiators and targets on the market.
Q. I’m told direct connect with iSCSI is discouraged, that there should be a switch in place to handle the buffering, latency, acknowledgement etc….. Is this true or a best practice to make sure switches are part of the design?
A. If you have no need to connect to multiple targets or multiple initiators, there’s no harm in direct connections.
Q. Ethernet was not designed to support storage traffic. The TCP/IP protocol suite was not designed to support storage traffic. SCSI was not designed to be encapsulated. So TCP/IP FTW? I think not. The reason iSCSI is exists is [perceived] cost savings. I get fed up with people constantly looking for ways to squeeze another penny out of something. To me it illustrates that they’re not very creative. Fibre Channel is a stupid name, but it is a purpose built protocol that works as designed to.
A. Ethernet is a general purpose network. It is capable of handling lots of different traffic (including storage). By putting iSCSI onto an existing Ethernet infrastructure, it can (as you point out) create a substantial cost savings over installing a FC network (although that infrastructure savings comes with other costs – such as the impact of a shared wire). However, installing a dedicated Ethernet network provides many of the advantages of a dedicated FC network, but at an added cost over that of a shared Ethernet infrastructure. While most consider FC a purpose-built storage network, it is worth pointing out that some also consider it a general purpose network (for example FC-Avionics is built into Fighter Jets, and it’s not for storage). And while not designed to be encapsulated, (it was designed for a parallel bus), SCSI today is encapsulated on every transport that carries it (yes, that includes FCP and SAS).
There are many kinds of storage at different price points, USB storage, SATA devices, rotating media (at different RPMs), SSD devices, SAS devices, FC devices, single spindles, arrays, cloud, drop boxes, etc., all with the corresponding transport wires. iSCSI is one of those wires. Each protocol and wire offer specific advantages and disadvantages. There can be a lot of confusion about which to use, but just as everyone does not drive the same type car (a FORD FUSION for example), everyone does not need the same type of storage (FC devices/arrays). Yes, I drive a FORD FUSION, and I like FC storage, but I use a USB stick on my laptop, and I pray my bank never puts my financial records out in the cloud. Selecting the right storage (and wire) for the job at hand can be one of a system administrators most interesting problems to solve.As for the name – that is often what happens in committees…
Q. As a best practice for Windows servers, disable hardware acceleration features in NICs (TOE etc.)? Are any NIC features valuable given modern multicore CPUs?
A. Yes. Typically the only reason to disable TOE is that multiple or virtual TCP/IP stacks are going to be using the same NIC. TSO, LRO and jumbo frames will benefit any OS that can take advantage of them.
Q. What is the advantage of iSCSI when compared with NVMe?
A. NVMe and iSCSI are very different protocols. NVMe started life as a direct attach protocol to communicate to native PCIe devices (not even outside the box). iSCSI was a network protocol from day one. iSCSI has to deal with the potential for long network induced delays, and complex out of order error recovery issues. NVMe operates over an interlocked bus, and as such, does not have those issues.
But, NVMe is now being extended over fabrics. NVMe over a RoCE V1 transport will be a data center network (since there is no IP routing). NVMe over a RoCE V2 transport or an iWARP transport will have the same routing capabilities that iSCSI has. When it comes to the raw command set, they are very similar (but there are some differences). SCSI is a more full featured command set than NVMe – it has been developed over a span of over 25 years, and has developed solutions for all the problems that have been discovered during that time span. NVMe has a more limited (or more focused) command set (for example, there are no tape commands in the NVMe command set). iSCSI is available today, as is direct attach NVMe, but NVMe over Fabrics is still in the development phases (the specification is expected to be available the first week of June, 2016). NVMe products will take some time to mature and to develop solutions for the problems they have not discovered yet. Another example of this is the ability to support shared storage – it existed on day one in iSCSI, but did not exist in the first NVMe specification. To support shared storage in NVMe over Fabrics, that capability has since been added, and it was done using a SCSI compatible method (to make it easier for host S/W that already performs this function).
There is a large community working to develop NVMe over Fabrics. As memory based storage device get cheaper, and the solution space matures, NVMe will become more attractive.
Q. How often do iSCSI installations provide encryption of data in flight? How: IPsec, IKEv2-SCSI + ESP-SCSI, etc.?
A. Rarely. More often than not, if in-flight data security is needed, it will be run on an isolated network. Well under 100% of installations are 100% compliant. VMware never qualified IPsec with iSCSI and didn’t have any obvious switch to turn it on. Side note: We standards guys can be overly picky about words. Since the question is “provide” the answer is – 100% of compliant installations PROVIDE encryption (IPsec V2 – see above), however, in practice, installations that require that type of security typically run on isolated networks, rather than turn on encryption.
Q. How do multiple independent applications inside the same initiator map to iSCSI sessions to the same target? E.g., iSCSI session one-to-one with application?
A. There is no relationship between applications and sessions. When an iSCSI initiator discovers a target, the initiator logs in and establishes a session. If iSCSI MCS (multi connection session) is being used, multiple TCP connections may be established and used in parallel to process operations for that session.
Applications send reads and writes to the operating system. Those IO requests make their way through the file system and caching layers into the device driver. The device driver issues the IO request to the device (over the iSCSI session) and retains information about that IO. When a completion is received from a device (the WRITE command or READ command completed), it is matched up with the request. That completion status (success or error) is passed back through the operating system (file system, etc.) to the application. So it is the responsibility of the device driver to mux/demux the requests from all the applications out over the iSCSI session and track the responses as the operations are completed.
When an operating system is using MPIO (multi-pathing), then the device driver may create multiple sessions between the initiator and the target. This is where operating system MPIO policies such as round-robin, shortest queue, LRU, etc. come into play. In this case, the MPIO driver will send an IO operation to the device using what it considers to be the most appropriate path (based on the selected policy). But again, there is no relationship between the application and the path used for IO (any application can have it’s IO send via any path).
Today, MPIO is used more commonly than MCS.
Q. Will Microsoft iSCSI implement iSER?
A. This is a question for Microsoft or iSER-capable NIC vendor that provides Microsoft drivers.
Q.Zadara has some iSER deployments using Linux and VMware clients going to the Zadara cloud storage.
A. There’s an answer, all by itself.
Q. In the case of iWARP, the TCP layer takes care of out-of-order IP packet receptions. What layer does the out-of-order management of packets in ROCE ?
A. RoCE headers contain a 24 bit “Packet Sequence Number” that is used to validate the required ordering and detect lost packets. As such, ordering still occurs, just in a different way.
Q. Correction: RoCE is over Ethernet packets and is not routable. RoCEv2 is the one over UDP/IP and *is* routable.
A. You are correct. RoCE is not routable by IP. RoCE transmits raw Ethernet frames with just Ethernet MAC headers and no IP headers, and as such, it is not routable by IP. RoCE V2 puts the information into UDP packets (with appropriate IP headers), and therefore it is routable by IP.
Q. How prevalent is iSER today in deployment? And what are some of the typical applications that leverage iSER?
A. Not terribly prevalent today, but higher speed Ethernet might drive more adoption, due to the CPU savings demonstrated.
How Ethernet RDMA Protocols iWARP and RoCE Support NVMe over Fabrics
NVMe (Non-Volatile Memory Express) over Fabrics is of tremendous interest among storage vendors, flash manufacturers, and cloud and Web 2.0 customers. Because it offers efficient remote and shared access to a new generation of flash and other non-volatile memory storage, it requires fast, low latency networks, and the first version of the specification is expected to take advantage of RDMA (Remote Direct Memory Access) support in the transport protocol.
Many customers and vendors are now familiar with the advantages and concepts of NVMe over Fabrics but are not familiar with the specific protocols that support it. Join us on January 26th for this live Webcast that will explore and compare the Ethernet RDMA protocols and transports that support NVMe over Fabrics and the infrastructure needed to use them. You’ll hear:
- Why NVMe Over Fabrics requires a low-latency network
- How the NVMe protocol is mapped to the network transport
- How RDMA-capable protocols work
- Comparing available Ethernet RDMA transports: iWARP and RoCE
- Infrastructure required to support RDMA over Ethernet
- Congestion management methods
The event is live, so please bring your questions. We look forward to answering them.
Benefits of RDMA in Accelerating Ethernet Storage Q&A
At our recent live Webcast “Benefits of RDMA in Accelerating Ethernet Storage Connectivity” experts from Emulex, Intel and Microsoft had an insightful discussion on the ways RDMA is having an impact on Ethernet storage. The live event was attended by nearly 200 people and feedback was overwhelming positive with several attendees thanking us for our vendor neutral presentation and one attendee commenting that it was, “Probably the most clearly comprehensible yet comprehensive webinar I’ve attended in some time.” If you missed the Webcast, it’s now available on demand. We did not have time to get to everyone’s questions, so as promised, below are answers to all of them. If you have additional questions, please ask them in the comments section in this blog and we’ll get back to you as soon as possible.
Q. Is RDMA over RoCEv2 in production?
A. The IBTA released the RoCEv2 Specification in September 2014. In order to support that specification changes may be required across the RDMA stack, including firmware, drivers & operating systems. Schedules for implementation of that specification will vary by operating system. For example, the OpenFabrics Alliance (OFA) has not released an Open Fabrics Enterprise Distribution (OFED) version that implements that standard yet, although it is in process now. Once OFA completes their OFED stack implementation, the Linux distribution vendors will then incorporate and support the updated OFED stack. Implementations provided prior to full OFA and Distro vendor support would be preliminary, potentially incompatible with the OFED release, and require confirmation by the distro vendor with regard to the nature/level of support they would be providing
Q. I would have liked a list of Windows applications that take advantage of SMB Direct – both in a Hyper-V host or bare metal.
A. In Windows, any file-based application can make use of SMB3 and SMB Direct due to the native file-based programming interface support. No application changes are required. For certain enterprise applications such as Hyper-V and SQL Server, SMB3 is officially supported, and more information can be found in the product catalog at www.microsoft.com.
Q. Are there any particular benefits in using one network protocol over another for SMB Direct/RDMA (iWARP vs. RoCE vs. IB)?
A. There are no hard and fast rules; any adapter or protocol can be suitable for many scenarios. Of the Ethernet-based protocols we considered in today’s webcast
- iWARP offers the benefit of operation over TCP with its reliability and routability, well-suited to a broad range of installed infrastructure.
- RoCE offers a lightweight, efficient protocol when a DCB-enabled switched fabric is available. RoCE, however, is not routable.
- RoCEv2 offers similar properties to RoCE, with the possibility to scale to larger routed and DCB-enabled fabrics.
Q. Who are the vendors offering iWARP capable RNICs?
A. Chelsio Communications has production iWARP adapters today, and both Intel and Qlogic have publicly committed to future iWARP controllers.
Q. How much testing has been done with SMB3, and in particular SMB direct, over WAN connections?
A. The SMB2 protocol was originally designed to adapt to WAN scenarios, and supports a credit-based management of large amounts of data to be outstanding, to make best use of WAN-type long pipes. The SMB3 protocol retains these design attributes, and the SMB Direct protocol also supports similar deep pipelining. The iWARP protocol, being layered on standard TCP, is well suited to such deployments, and RoCE WAN adapters are potentially available. Please contact the respective technology vendors for information on any available testing results.
Q. I love a future webcast for RDMA enabled distributed filesystems.
A. Thanks for the suggestion! We’re always looking for ideas for future webcasts and SNIA-ESF will consider this as a potential follow-on.
Q. Is Live Migration the scenario where “packet size” is 1MB?
A. All SMB Direct scenarios have workloads that range anywhere up to 8MB. For large file copies, most SMB3 clients request from 1MB to 8MB per operation, for Hyper-V live migration, transfers are typically similar, during the bulk transfer phase.
Q. SMB3 is being compared to FC for enterprise. If Ethernet based protocols are of interest, wouldn’t FCoE give the same performance as FC (same stack) vs. SMB3?
A. SMB3 with SMB Direct enables many workloads not possible with Fibre Channel over Ethernet, and performance comparisons are therefore difficult. Perhaps another SNIA webcast could investigate this!
Q. Regarding your SMB direct example with lots of small operations, how do you deal with the overhead of registering and unregistering buffers for the RDMA operations?
A. As answered later in the session, the registration and unregistration is not a protocol matter, but in the case of the Windows implementation, it is strictly performed for the specific buffers of each operation, which is critical for security, data integrity, and system protection. The standard “Fast Register Work Request” method is used, and careful implementation has shown that the overhead does not negatively impact performance, even for small I/O (4KB/operation). Check out Jose Barreto’s blog, which contains many benchmark results.
Q. But isn’t Live Migration done in 1MB “chunks”? So not “small” I/Os?
A. As answered later in the session, Hyper-V Live Migration is done in several phases, the first phase is the initial bulk copy of memory, done in large chunks, but immediately after it a second phase of copying individual pages which were dirtied by the live-running VM is performed. These operations are typically 4KB. Note: The faster the initial phase goes, the less work there is in this second phase, but in both phases, the faster, the better, and RDMA accelerates both.
Q. Are iSER and iWARP alternatives to one another?
A. iWARP is an RDMA protocol, and iSER is a mapping of iSCSI to iWARP, as well as RoCE/InfiniBand.
Q. What’s Intel’s roadmap for RoCE and/or iWARP?
A. Intel is committed to iWARP and plans to incorporate it in future server chipsets and SOCs. See http://www.intel.com/content/www/us/en/ethernet-products/accelerating-ethernet-iwarp-video.html for more information.
Q. Is there any other Transport being used other than IB to create a reliable transport for RoceV2? Puristically it is possible?
A. RoCE was developed to leverage Infiniband as much as possible. For that reason, the Infiniband transport was chosen when the RoCE standard was developed. As the RoCEv2 standard was developed, the underlying Infiniband network protocol was replaced with IPv4 / IPv6 in order to provide the layer 3 routability and UDP to provide stateless encapsulation (and indication) of the Infiniband transport header that was retained. While it may be possible to develop a reliable transport to replace Infiniband, the RoCE standards body has elected not to go that route as of this writing.
New ESF Webcast: Benefits of RDMA in Accelerating Ethernet Storage Connectivity
We’re kicking off our 2015 ESF Webcasts on March 4th with what we believe is an intriguing topic – how RDMA technologies can accelerate Ethernet Storage. Remote Direct Memory Access (RDMA) has existed for many years as an interconnect technology, providing low latency and high bandwidth in computing clusters. More recently, RDMA has gained traction as a method for accelerating storage connectivity and interconnectivity on Ethernet. In this Webcast, experts from Emulex, Intel and Microsoft will discuss:
- Storage protocols that take advantage of RDMA
- Overview of iSER for block storage
- Deep dive of SMB Direct for file storage.
- Benefits of available RDMA technologies to accelerate your Ethernet storage connectivity, both iWARP and RoCE
Register now. This live Webcast will provide attendees with a vendor-neutral look at RDMA technologies and should prove to be an interactive and informative event. I hope you’ll join us!
Expanding Your Data Center with FCoE – Q&A
At our recent live ESF Webcast, “Expert Insights: Expanding the Data Center with FCoE,” we examined the current state of FCoE and looked at how this protocol can expand the agility of the data center if you missed it, it’s now available on-demand. We did not have time to address all the questions, so here are answers to them all. If you think of additional questions, please feel free to comment on this blog.
Q. You mentioned using 40 and 100G for inter-switch links. Are there use cases for end point (FCoE target and initiator) 40 and 100G connectivity?
A. Today most end points are only supporting 10G, but we are starting to see 40G server offerings enter the market, and activity among the storage vendors designing these 40G products into their arrays.
Q. What about interoperability between FCoE switch vendors?
A. Each switch vendor has his own support matrix, and would need to be examined independently.
Q. Is FCoE supported on copper cable?
A. Yes, FCoE supports “Twin Ax” copper and is widely used for server to top of rack switch connections to seven meters. In fact, Converged Network Adapters are now available that support 10GBASE-T copper cables with the familiar RJ-45 jack. At least one major switch vendor has qualified FCoE running over 10GBASE-T to 30 meters.
Q. What distance does FCoE support?
A. Distance limits are dependent on the hardware in use and the buffering available for Priority Flow Control. The lengths can vary from 3m up to over 80km. Top of rack switches would fall into the 3m range while larger class switch/directors would support longer lengths.
Q. Can FCoE take part in management/orchestration by OpenStack Neutron?
A. As of this writing there are no OpenStack extensions in Neutron for FCoE-specific plugins.
Q. So how is this FC-BB-6 different than FIP snooping?
A. FIP Snooping is a part of FC-BB-5 (Appendix D), which allows switch devices to identify an FCoE Frame format and create a forwarding ACL to a known FCF. FC-BB-6 creates additional architectural elements for deployments, including a “switch-less” environment (VN2VN), and a distributed switch architecture with a controlling FCF. Each of these cases is independent from the other, and you would choose one instead of the others. You can learn more about VN2VN from our SNIA-ESF Webcast, “How VN2VN Will Help Accelerate Adoption of FCoE.”
Q. You mentioned DCB at the beginning of the presentation. Are there other purposes for DCB? Seems like a lot of change in the network to create a DCB environment for just FCoE. What are some of the other technologies that can take advantage of DCB?
A. First, DCB is becoming very ubiquitous. Unlike the early days of the standard, where only a few switches supported it, today most enterprise switches support DCB protocols. As far as other use cases for DCB, iSCSI benefits from DCB, since it eliminates dropped packets and the TCP/IP protocol’s backoff algorithm when packets are dropped, smoothing out response time for iSCSI traffic. There is a protocol known as RoCE or RDMA over Converged Ethernet. RoCE requires the lossless fabric DCB creates to achieve consistent low latency and high bandwidth. This is basically the InfiniBand API running over Ethernet. Microsoft’s latest version of file serving protocol, SMB Direct, and the Hyper-V Live Migration can utilize RoCE, and there is an extension to iSCSI known as iSER, which replaces TCP/IP with RDMA for the iSCSI datamover; enabling all iSCSI reads and writes to be done as RDMA operations using RoCE.
Q. Great point about RoCE. iSCSI RDMA (iSER) is required from DCB if the adapters support RoCE, right?
A. Agreed. Please see the answer above to the DCB question.
Q. Did that Boeing Aerospace diagram still have traditional FC links, and if yes, where?
A. There was no Fibre Channel storage attached in that environment. Having the green line in the ledger was simply to show that Fibre Channel would have it’s own color should there be any links.
Q. What is the price of a 10 Gbp CNA compare to a 10Gbps NIC ?
A. Price is dependent on vendor and economics. But, there are several approaches to delivering the value of FCoE which can influence pricing:
- Purpose built silicon that offloads the FC and Ethernet protocol functions offer a number of advantages including high performance, low CPU overhead, advanced features, etc., though even this depends on the vendor’s implementation. But, these added features come with the expectation of additional cost. But, the processing of the protocols has to be done somewhere, and if you need your server CPUs to process applications instead of network protocols, then the value is justified.
- With the introduction of Open FCoE drivers with DCB supported NICs, new options are available for customers to deploy the value of FCoE at the host. Open FCoE offloads the FC processing onto the host CPU and standard 10GbE NICs with DCB support can be used to manage the Ethernet transport functions. Where you have excess CPU capacity on your server, you might be in a position to reduce costs and deploy a software driver with a 10GbE or faster NIC enhanced with the limited set of hardware offloads necessary to achieve full performance with Open FCoE. However, Open FCoE isn’t available with every OS or every NIC, so you need to consider OS support and availability.
- A third consideration is that most enterprise servers include some form of advanced 10GbE networking on the motherboard that either supports purpose built silicon or DCB enabled silicon. So, depending upon which server and OS you deploy, you may have several options via embedded silicon.