Simon Gordon – SNIA on Storage

Storage traffic running over Ethernet-based networks has been around for as long as we have had Ethernet-based networks. Of course sometimes it is technically not accurate to think of the protocols as fundamentally Ethernet protocols – whilst FCoE, by definition, only runs on Ethernet, iSCSI, SMB, and NFS, they are, in reality, IP-based storage protocols and whilst most commonly run on Ethernet, could run on any network that supports IP. That notwithstanding, it is increasingly important to understand the real nature of Ethernet, and in particular, the nature of the new enhancements that we put under the umbrella of Data Center Bridging (DCB).

Although there is a great deal of information around DCB, there is also a lot of confusion where even the best articles miss describing a number of its elements. As such, with 10GbE ramping now is a good time to try to clarify the reality of what DCB does and does not do.

Perhaps the first and most important point is that DCB is, in reality, a task group in IEEE responsible for the development enhancements to the 802.1 bridge specifications that apply specifically to Ethernet Switching (or as IEEE say Bridging) in the datacenter. As such, DCB is not in itself a standard nor is the DCB group solely involved in those standards that apply to I/O and network convergence. The mots recent work of this task group falls into two distinct areas both of which apply to the datacenter one is the now completed standards around network and I/O convergence (802.1Qau,Qaz, Qbb) but the other are those standards that address the impact of server virtualization technology (802.1Qbg, BR, and the now withdrawn Qbh).

Also critical to understand, so that we do not overstate either the limitations of traditional Ethernet nor the advantages of the new standards around I/O and network convergence, is that these new standards build on top of many well understood, well used mature capabilities that already exist within the IEEE Ethernet standards set. Indeed IMHO, the most important element of this is that the DCB Convergence standards are building on top of the 802.1p capability to specify eight different classes of service through a 3-bit PCP field in the 802.1Q header the VLAN header. Or to say that in English Ethernet has for some considerable time had the ability to separate traffic into 8 separate categories to ensure that those different categories got different treatment – or more bluntly the fundamentals of I/O and network convergence are nothing new to Ethernet. Not only that, but the VLAN identification itself can be used to apply QoS on different sets of traffic, as does the fact that we can usually identify different traffic types by the Ethertype or IP socket.

So what is all the fuss about? As much as there were some good convergence capabilities, it was recognized that these could be further enhanced.

802.1Qbb or Priority-based Flow Control (PFC), far from adding into Ethernet a non-existent capability for lossless, simply takes the existing capability for lossless 802.3X and enhances it. 802.3X, when deployed with both RX and TX pause, can give lossless Ethernet as both recognized by many in the iSCSI community as well as by the FCoE specifications. However, the pause mechanisms apply at the port level, which means giving one traffic class lossless causes blocking of other traffic classes. All 802.1Qbb does, along with 802.3bd, is allow the pause mechanisms to be applied individually to specific priorities or traffic classes aka pause FCoE or iSCSI or RoCE whilst allowing other traffic to flow.

802.1Qaz or ETS (let’s ignore that DCBX is also part of this document and discussed in another SNIA-ESF blog) is not bandwidth allocation to your individual priorities, rather it is the ability to create a group of priorities and apply bandwidth rules to that group. In English, it adds a new tier to your QoS schedulers so you can now apply bandwidth rules to port, priority or class group, individual priority, and VLAN. The standard suggests a practice of at least three groups, one for best effort traffic classes, one for PFC-lossless classes, and one for strict priority though it does allow more groups.

Last but not least, 802.1Qau or QCN is not a mechanism to provide lossless capabilities. Where pause and PFC are point-to-point flow control mechanisms QCN allows flow control to be applied by a message from the congestion point all the way back to the source. Being Ethernet level mechanisms it is of course across multiple hops within a layer 2 domain and so cannot cross either IP routing or FCF FCoE-based forwarding. If QCN is applied to a non PFC priority, then it would most likely reduce drop by telling the source device to slow down rather than having frames be dropped and allowing the TCP congestion window to trigger slowing down at the TCP level. If QCN is applied to a PFC priority, then it could reduce back propagation of PFC pause and so congestion propagation within that priority.

Although not part of the standards for DCB-based convergence, but mentioned in the standards, devices that implement DCB typically have some form of buffer carving or partitioning such that the different traffic classes are not just on different priorities or classes as they flow through the network, but are being queued in and utilizing separate buffer queues. This is important as the separated queuing and buffer allocation is another aspect of how fate sharing is limited or avoided between the different traffic classes. It also makes conversations around microbursts, burst absorption, and latency bubbles all far more complex than before when there was less or no buffer separation.

It is important to remember that what we are describing here are the layer 2 Ethernet mechanisms around I/O and network convergence, QoS, flow control. These are not the only tools available (or in operation) and any datacenter design needs to fully consider what is happening at every level of the network and server stack including, but not limited to, the TCP/IP layer, SCSI layer, and indeed application layer. The interactions between the layers are often very interesting but that is perhaps the subject for another blog.

In summary, with the set of enhanced convergence protocols now fully standardized and fairly commonly available on many platforms, along with the many capabilities that exist within Ethernet, and the increasing deployment of networks with 10GbE or above, more organizations are benefiting from convergence – but to do so they quickly find that they need to learn about aspects of Ethernet that in the past were perhaps of less interest in a non-converged world.

I guess this is a blog that could either be very short or very long… The full name of the protocol – Data Center Bridging capability eXchange (DCBX) basically tells you all you need to know or maybe nothing at all. At its simplest, DCBX does what it says on the tin and the way it is in effect used is no more or less than the DCB auto negotiation capability to make sure that the data center network is correctly and consistently configured. It is important to note that technically you can debate if this is an auto negotiation protocol or not, but in reality that’s how it is actually used.

Now it is important to note that there are many misnomers around DCB itself. Let’s remember that DCB is actually a group within IEEE responsible for many separate standards – basically anything for Ethernet (or as IEEE say bridging) that is assumed to be specific to the data center. Currently, discussed are those standards and protocols related to I/O Convergence (PFC, ETS, QCN, DCBX) and those related to server virtualization (Virtual Ethernet Port Aggregator or VEPA and others). So in essence the intent of DCBX is to help two adjacent devices share information about how these protocols are, or need to be, configured. DCBX actually does this by leveraging good old LLDP – just as PFC, ETS and QCN leverage 802.1p. What is particularly nice though is that DCBX not only allows the simple exchange of information around the DCB protocols themselves but also around how upper level protocols might want to use the DCB layer.

This brings us nicely to a very critical point – like most things in this area, DCBX purely works at the link level to allow a pair of connected ports (node to switch or switch to switch) to exchange their specific port configuration. This is an important point as in a multi-hop environment you need to keep in mind that every link may successfully complete its DCBX negotiation but unless some higher level intelligence (you) ensures that things are set right on each and every link, you may still not be meeting the needs of an end-to-end traffic flow. Even in a simple case of device-switch-switch-device I could have Fibre Channel over Ethernet (FCoE) negotiated on the first device-switch and last switch-device connection, and nothing configured on the intermediate switch-switch connection – and the two FCoE end points would happily talk to each other thinking that they have end-to-end lossless connectivity. In a more complex scenario let’s also remember that many L2/L3 switches have not just the ability to route between L2 domains, but also have the ability to reclassify traffic from one 802/1p priority to another. For this reason it is often simpler to use DCB to support 8 independent forwarding planes across the data center as this means we can simply configure all ports pretty much identically. I believe the term here around being clever is ‘here be dragons’.

Anyone that has spent a little time with DCB or FCoE will actually know that DCBX doesn’t just help at the level of the layer 2 protocols, but also helps at the level of the actual upper level protocol we care about. Most well known is that DCBX can carry specific exchanges to ensure the correct configuration of DCB to support FCoE and many people may be aware that it can do the same for iSCSI as well. Far less known however is that these two examples of setting up DCB for upper level protocols are in fact just that – examples. DCBX actually has a generic application type-length-value (TLV) format whereby you can specify what you would like for any upper level protocol that can be identified by either Ethertype or IP socket. Thus DCBX has like the rest of DCB been carefully architected to support the full broad needs of I/O and network convergence and not just the needs of storage convergence. DCBX as a protocol allows you to have an NFS Application TLV, an SMB Application TLV, a RDMA over Converged Ethernet or RoCE Application TLV, iWarp Application TLV, an SNMP Application TLV – etc.

A final and very practical point that any article on DCBX needs to cover is that we are in an evolving world and there are multiple different, and indeed incompatible, versions of DCBX available. Just reviewing the common DCB equipment available today you need to consider DCBX 1.0 as used by pre-standards FCoE products, DCBX 1.01 sometimes referred to as the Converged Enhanced Ethernet (CEE) or baseline version as found most commonly on shipping products today, and DCBX IEEE as actually defined in the standards (physically mostly contained within the ETS standard). It is also important to note that while some products have mechanisms to auto discover and select which version of DCBX to use, there is in fact no standard for such mechanisms. In this case the term is I assume ‘caviat emptor – buyer beware’.

All that said, maybe I should have started this blog reminding everyone that the I/O convergence parts of DCB are not just about allowing storage traffic to be mixed with non-storage traffic without fate sharing problems, but is actually about collapsing the multiple networks of different networks into a single network. I believe the average server is said to have about 6 NICs’ today? As such in the 10GbE and up Ethernet world, the full capabilities of DCBX really are a critical enabler for simplifying the operation of the modern converged virtualized data center.

Author: Simon Gordon

Resolving the Confusion around DCB (I Hope)

What Up with DCBX?