• Home
  • About

    Ethernet Roadmap for Networked Storage Q&A

    July 17th, 2015

    Almost 200 people attended our joint Webcast with the Ethernet Alliance: “The 2015 Ethernet Roadmap for Networked Storage.” We had a lot of great questions during the live event, but we did not have time to answer them all. As promised, we’ve complied answers for all of the questions that came in. If you think of additional questions, please feel free to comment on this blog.

    Q. What did you mean by parity of flash with HDD?

    A. We were referring to the O’Reilly article in “Network Computing.”  O’Reilly is predicting parity in BOTH capacity and price in 2016.

    Q. When do we expect IEEE standards ratification for 25G speed?

    A. 2016.  You can see the exact schedule here.

    Q. Do you envision the Enterprise, Cloud Providers, HPC, Financials getting rid of their 10/40GbE infrastructure and replacing that with 25/100GbE infrastructure in 2017? Will these customers deploy 100GbE/25GbE switch in the leaf layer in 2017?

    A. Deployment will occur over a multi-year time span overall if only because switch infrastructure is expensive to upgrade, as reflected in the Crehan Research forecast.  New deployments will likely move to 25/100GbE as new switches with 100GbE downstream ports become available in 2016.   Just because the Cloud Service Providers are currently the most aggressive in driving new infrastructure purchases, they represent the largest early volumes for 25/100 GbE.  Enterprise is still in the midst of the transition from 1GbE to 10GbE.

    Q. What are some of the developments on spanning-tree derivatives vs. Dykstra based derivatives such as OSPF, FSPF for switches?

    A. Beyond the scope of this presentation on Ethernet.  Ethernet is defined by the IEEE for L1 and L2 in the ISO model.  Your questions are at L3 and L4, which is handled by organizations like IETF.

    Q. With all the speeds possible who is working on flow control?

    A. Flow control at the 802.1 level is supported in the Layer 1/2 PHY & MAC by setting upper bounds on the delay through each layer which allows higher layers to comprehend the delays & response times to pause frames. Each new speed & PHY in 802.3 is accompanied by delay constraint specifications to support this.

    Q.  Do you have an overlay graphic that shows the Ethernet RDMA roadmap?  If so, is Ethernet storage the primary driver for that technology?

    A.  Beyond the scope of this presentation on Ethernet.  Ethernet is defined by the IEEE for L1 and L2 in the ISO model.  Your questions are at L3 and L4, which is handled by organizations like IETF and the InfiniBand Trade Association.

    Q. The adoption of faster and new Ethernet always has to do with the costs of acquiring new technology. How long do you think it will take to adopt/acquire faster Ethernet in datacenters now that the development is happening much faster than the last 20 years?

    A. Please see the chart on slide 7 where Crehan Research predicts how fast the technology will diffuse into deployments.

    Q. What do you expect as cost comparison between Ethernet and InfiniBand going forward?
    Also, what work is being done to reduce latency?

    A. Beyond the scope of this presentation.  Latency is primarily a consequence of design methodologies and semiconductor process technology, and thus under the control of the silicon device manufacturers.  Some vendors prioritize latency more than others.

    Q. What’s the technical limitation as speeds go higher and higher?

    A. A number of factors limit speeds going faster and faster, but the main problem is that materials attenuate signals as they travel at higher frequencies.

    Q. Will 1GbE used for manageability purposes disappear from public cloud? If so, what is the expected time frame?

    A. This is a choice for end users.  Most equipment is managed on a separate network for security concerns, but users can eliminate these management networks at any time.

    Q. What are the relative market size predictions for the expanding number of standards (25G, 50G, 100G, 200G, etc.)?

    A. See the Crehan Research forecast in the presentation.

    Q. What is the major difference between SMF & MMF for the not so initiated?

    A. The SMF has a 9um core while the MMF has a 50um core.  Different lasers are used for each fiber type and MMF typically goes 100 meters above 10GbE and SMF goes from 500m to 10km.

    Q. Will 25G be available through both copper and fibre connectivity?

    A. Yes.  IEEE 802.3 work is currently underway to specify 25Gb/s on twinax (“direct attach copper)” to 5 meters, printed circuit backplane up to ~1m, twisted pair copper to 30m, multimode fiber to 100m.  There is no technology barrier to 25G on SMF, just that a standards project to specify it has not started yet.

    Q. This is interesting from a hardware viewpoint, but has nothing to do with storage yet.  Are we going to get to how this relates to storage other than saying flash drives are fast and only Ethernet can keep up?

    A. Beyond the scope of this presentation on Ethernet.  Ethernet is defined by the IEEE for L1 and L2 in the ISO model.  Your questions are directed at the higher layers.  The key point of this webcast is that storage networking engineers need to pay much more attention to the Ethernet roadmap than they have historically, primarily because of NVM.

    Q. How does “SFP 28″ fit in this mix?  Is it required for 25G?

    A. SFP28 connectors and modules are required for 25GbE because they give better performance than SFP+ that only works to 10GbE.

    Q. Can you provide the quick difference between copper & optical on speed & costs?

    A. Copper and optical Ethernet links are usually standardized at the same speed.  400GbE is not defining a copper link but an active Direct Attached Cable (DAC) will probably support 400GbE.  Cost depends on volume and many factors and is beyond the scope of this presentation.  Copper is usually a fraction of the cost of optical links.

    Q. Do you think people will try to use multiple CAT 5e to get more aggregate bandwidth to the access points to avoid having to run Fibre to them?

    A. IEEE is defining 2.5GBASE-T and 5GBASE-T to enable Cat5e to support faster wireless access points.

    Q. When are higher speeds and PoE going to reach the point when copper based Ethernet will become a viable heat source for buildings thus helping the environment?

    A. :)  IEEE is defining 4 wire PoE to deliver at least 60W to end devices.  You can find out more here.

    Q. What are the use cases for 2.5Gb and 5.0Gb Base-T?

    A. The leading use case for 2.5G/5GBASE-T is to provide the uplink for wireless LAN access points that support 802.11ac and future wireless technology.  Wireless LAN technology has advanced to the point where >1Gb/s BW is needed upstream from the AP, and 2.5G/5G provide a higher speed uplink while preserving the user’s investment in Cat5e/Cat6 cabling.

    Q. Why not have only CFP2 sockets right away with things disabled for lower speeds for all the intervening years leading to full-fledged CFP2?

    A. CFP2 is defined for 100GbE and 8 ports can be used on a 1U switch. 100GbE switches are shifting to QSFP28 so that 32 ports of 100GbE is supported in a 1U switch at low cost.  The CFP2 is much more expensive than QSFP28 and will not be used for lower speeds because of the high cost.
















    Relentless Advance Of Ethernet – And Ethernet Storage Networking

    March 31st, 2014

    As one Cisco colleague once said to me, “After the nuclear holocaust, there will be two things left: cockroaches and Ethernet.”  Not sure I like Ethernet’s unappealing company in that statement, but the truth it captures is that Ethernet, now entering its fifth decade (wow!), is ubiquitous and still continuing to advance at a breathtaking pace.  And as it advances, it advances the capabilities of storage networking based on the Ethernet backbone, be it file storage like NFS or SMB or block storage like iSCSI or FCoE.

    Most recent evidence of Ethernet’s continuing and relentless evolution is illustrated in the 28 March 2014 announcement from the Ethernet Alliance congratulating the IEEE on formation of their IEEE P802.3bs™ Task Force:

    The new group is chartered with the development of the IEEE P802.3bs 400 Gigabit Ethernet (GbE) project, which will define Ethernet Media Access Control (MAC) parameters, physical layer specifications, and management parameters for the transfer of Ethernet format frames at 400 Gb/s. As the leading voice of the Ethernet ecosystem, the Ethernet Alliance is ideally positioned to support this latest move towards standardizing and advancing 400Gb/s technologies through efforts such as the launch of the Ethernet Alliance’s own 400 GbE Subcommittee.

    Ethernet is in production today from multiple vendors at 40GbE and supports all storage protocols, including FCoE, at those speeds.  Market forecasters expect the first 100GbE adapters to appear in 2015.  Obviously, it is too early to forecast when 400GbE will arrive, but the train is assuredly in motion.  And support for all the key storage protocols we see today on 10GbE and 40GbE will naturally extend to 100GbE and 400GbE.  Jim O’Reilly makes similar points in his recent Information Week article, “Ethernet: The New Storage Area Network where he argues, “Ethernet wins on schedule, cost, and performance.”

    Beyond raw transport speed, the rich Ethernet infrastructure offers techniques to catapult your performance even beyond the fastest single-pipe speed.  The Ethernet world has established techniques for what is alternately referred to as link aggregation, channel bonding, or teaming.  The levels available are determined by the capabilities provided in system software and what switch vendors will support.  And those capabilities, in turn, are determined by what they respectively see as market demand.  VMware, for example, today will let you bond eight 10GbE channels into a single 80GbE pipe.  And that’s today with mainstream 10GbE technology.

    Ethernet will continue to evolve in many different ways to support the needs of the industry.  Serving as a backbone for all storage networking traffic is just one of many such roles for Ethernet.  In fact, precisely because of the increasing breadth of usage models Ethernet supports, it will also continue to offer cost advantages.  The argument here is a very simple volume argument:

    Total Server-class Adapter and LOM Market Ports


    Enough said, except to also note that volume is what funds speed roadmaps.



    What Up with DCBX?

    January 31st, 2013

    I guess this is a blog that could either be very short or very long… The full name of the protocol – Data Center Bridging capability eXchange (DCBX) basically tells you all you need to know or maybe nothing at all. At its simplest, DCBX does what it says on the tin and the way it is in effect used is no more or less than the DCB auto negotiation capability to make sure that the data center network is correctly and consistently configured. It is important to note that technically you can debate if this is an auto negotiation protocol or not, but in reality that’s how it is actually used.

    Now it is important to note that there are many misnomers around DCB itself. Let’s remember that DCB is actually a group within IEEE responsible for many separate standards – basically anything for Ethernet (or as IEEE say bridging) that is assumed to be specific to the data center. Currently, discussed are those standards and protocols related to I/O Convergence (PFC, ETS, QCN, DCBX) and those related to server virtualization (Virtual Ethernet Port Aggregator or VEPA and others). So in essence the intent of DCBX is to help two adjacent devices share information about how these protocols are, or need to be, configured. DCBX actually does this by leveraging good old LLDP – just as PFC, ETS and QCN leverage 802.1p. What is particularly nice though is that DCBX not only allows the simple exchange of information around the DCB protocols themselves but also around how upper level protocols might want to use the DCB layer.

    This brings us nicely to a very critical point – like most things in this area, DCBX purely works at the link level to allow a pair of connected ports (node to switch or switch to switch) to exchange their specific port configuration. This is an important point as in a multi-hop environment you need to keep in mind that every link may successfully complete its DCBX negotiation but unless some higher level intelligence (you) ensures that things are set right on each and every link, you may still not be meeting the needs of an end-to-end traffic flow. Even in a simple case of device-switch-switch-device I could have Fibre Channel over Ethernet (FCoE) negotiated on the first device-switch and last switch-device connection, and nothing configured on the intermediate switch-switch connection – and the two FCoE end points would happily talk to each other thinking that they have end-to-end lossless connectivity. In a more complex scenario let’s also remember that many L2/L3 switches have not just the ability to route between L2 domains, but also have the ability to reclassify traffic from one 802/1p priority to another. For this reason it is often simpler to use DCB to support 8 independent forwarding planes across the data center as this means we can simply configure all ports pretty much identically. I believe the term here around being clever is ‘here be dragons’.

    Anyone that has spent a little time with DCB or FCoE will actually know that DCBX doesn’t just help at the level of the layer 2 protocols, but also helps at the level of the actual upper level protocol we care about. Most well known is that DCBX can carry specific exchanges to ensure the correct configuration of DCB to support FCoE and many people may be aware that it can do the same for iSCSI as well. Far less known however is that these two examples of setting up DCB for upper level protocols are in fact just that – examples. DCBX actually has a generic application type-length-value (TLV) format whereby you can specify what you would like for any upper level protocol that can be identified by either Ethertype or IP socket. Thus DCBX has like the rest of DCB been carefully architected to support the full broad needs of I/O and network convergence and not just the needs of storage convergence. DCBX as a protocol allows you to have an NFS Application TLV, an SMB Application TLV, a RDMA over Converged Ethernet or RoCE Application TLV, iWarp Application TLV, an SNMP Application TLV – etc.

    A final and very practical point that any article on DCBX needs to cover is that we are in an evolving world and there are multiple different, and indeed incompatible, versions of DCBX available. Just reviewing the common DCB equipment available today you need to consider DCBX 1.0 as used by pre-standards FCoE products, DCBX 1.01 sometimes referred to as the Converged Enhanced Ethernet (CEE) or baseline version as found most commonly on shipping products today, and DCBX IEEE as actually defined in the standards (physically mostly contained within the ETS standard). It is also important to note that while some products have mechanisms to auto discover and select which version of DCBX to use, there is in fact no standard for such mechanisms. In this case the term is I assume ‘caviat emptor – buyer beware’.

    All that said, maybe I should have started this blog reminding everyone that the I/O convergence parts of DCB are not just about allowing storage traffic to be mixed with non-storage traffic without fate sharing problems, but is actually about collapsing the multiple networks of different networks into a single network. I believe the average server is said to have about 6 NICs’ today? As such in the 10GbE and up Ethernet world, the full capabilities of DCBX really are a critical enabler for simplifying the operation of the modern converged virtualized data center.