Principles of Networked Solid State Storage – Q&A

At this month’s SNIA Ethernet Storage Forum Webcast, “Architectural Principles for Networked Solid State Storage Access,” Doug Voigt, Chair of the SNIA NVM Programming Technical Working Group, and a member of the SNIA Technical Council, outlined key architectural principles surrounding the application of networked solid state technologies. We had a flurry of questions near the end of the Webcast that we did not have enough time to answer. Here are Doug’s answers to all the questions we received during the event:

Q. Are there wait cycles in accessing persistent memory?

A. It depends entirely on which persistent memory (PM) technology is being accessed and how the memory interconnect is used.  Some technologies have write times that are quite different from read times.  When using tightly timed interconnects such as DDR with those technologies it may be difficult to avoid wait cycles.

Q. How do Pmalloc and malloc share the virtual address space of the application?

A. This is entirely up to the OS and other libraries operating within any constraints of the processor architecture-specific memory management units.  A good mental model would be fairly large regions of contiguous address space in both the physical and virtual domains, where each region will comprise a single type of memory. Capacity will be reserved for pmalloc and malloc in the appropriate regions.

Q. Always flush after doing your memory-mapped IO.  Is that simply good hygiene?

A. Not exactly. The term “Memory Mapped IO” is used to reference control plane (as opposed to data plane) access.  It is often reasonable to set up control plane memory as uncacheable. The need for strict order of access to physical control plane registers is so pervasive that caching is generally not useful. Uncacheable writes are always flushed by the processor, as opposed to the application.

Generally with memory mapped IO devices the data plane uses direct memory access (DMA).  With memory mapped files (as opposed to memory mapped IO) Load/Store (more commonly referred to as “Ld/St”), not DMA, is used in the data plane. Disabling caching in the data plane is generally a big performance sacrifice for small byte range access.

In the Ld/St datapath, strategically placed flushing is required to retain both performance and power failure recovery. The SNIA NVM Programming Model describes this type of functionality.

Q. Once NVDIMM support become pervasive with support from NVMe drives in the server box, should network storage be more focused on SAS Flash or just SAS HDDs?

A. Not necessarily.  NVMe over Fabric, Fibre Channel and iSCSI are also types of networked storage that will likely retain significant market share relative to SAS.

Q. Are the ‘Big Data’ Data Warehouse applications starting to use the persistence memory and domain technologies in their applications?

A. It is too early to see much of this yet. PM technologies might become a priority as a staging area for analytic applications with high ingest or checkpoint rates. NVDIMMs are likely to be too expensive to store anything “big” for quite a while.

Q. Also, is the persistence memory/domains being used in the Hyper-converged and Converged hardware infrastructures?

A. Persistent memory is quintessentially (Hyper-) converged.  It wouldn’t be unreasonable to expect some traction with hyper-converged solutions that experience high storage-performance demand.

Q. What distance would you associate with 10’s of microseconds?

A. In terms of transmission delay, 10’s of uS align with a campus or small city scale, but the distance itself is often not the primary factor.  Switching delays, transmission line properties and software overhead are generally bigger factors.

Q. So latency would be the binding factor for distances…not a question, an observation.

A. Yes, in effect, either through transmission or relay.  See above.

Q. Aren’t there multi-threaded SSDs?

A. Yes, but since the primary metric in this presentation is latency we ignore multi-threading.  It can enable more work to get done, but it generally increases latency rather than reducing it.

Q. Is Pmalloc universal usage?

A. The term is starting to be recognized among developers and has been used in research. Various similar names have been used in early research prototypessuch as pmalloc in Mnemosyne and nvmalloc in SCMFS.

Q. So how would PM help in a (stock broking) requirement, where we currently prophesize an RDMA or iWARP solution?

A. With PM the answer is always lower latency.  PM can be litegrated like memory or like flash. RDMA network paths for both of these options were discussed in the presentation. In either case, PM is low-latency enough that networking and software overheads will completely determine performance, even when using RDMA. The performance boost from PM is greatest when it is accessed locally.  If remote access is a requirement then the new work being done in the RDMA community should help.

Q. If data stored in memory requires to be copied to a different host, memory (for consistency) how does PM assist, or is there an extension to PM? Coherency between multiple hosts in a cluster, if you will?

A. PM technology does not help with this; the methods of managing consistency across hosts remain unchanged by PM.  All PM offers is low latency persistence.

Coordination across hosts or nodes in a cluster must use existing clustering techniques such as locking and quorums. In addition, the relative timescales of memory access and network communication suggest the application of asynchronous remote replication techniques used in today’s storage solutions.

Regarding coherency, PM brings nothing new to the known techniques for managing coherency.  Classical cluster architecture must be applied outside of symmetric multi-processing coherency domains. Within coherency domains, all of the logic is above the PM level in a processor side memory controller or a software emulation of the same algorithms.

 

 

 

 

Q&A on Exactly How iSCSI has Evolved

Our recent SNIA ESF Webcast, “The Evolution of iSCSI” drew a big and diverse group of attendees. From beginners looking for iSCSI basics, to experts with a lot of iSCSI deployment experience, there were plenty of good questions. Our presenters, Andy Banta and Fred Knight, did a great job answering as many as they could during the live event, but we didn’t have time to get to them all. So here are answers to them all. And by the way, if you missed the Webcast, it’s now available on-demand.

Q. What are the top 3 reasons to choose iSCSI over FC SAN?

A. 1. Use of commodity equipment and protocols. It means that you don’t have to set up a completely separate network. It means you don’t have to buy separate HBAs. 2. Inherent networking capability. Built on top of TCP/IP, it benefits from any networking technology to come along. These include routing, tunneling, authentication, encryption, etc. 3. Ease of automation and configuration. In it’s simplest form, an iSCSI host only needs to know the IP address of the target system. In more complex systems, hosts and storage provide APIs to allow automation through scripting or management tools.

Q. Please comment on why SCSI went from being a widely used protocol for all sorts of devices to being focused as only essentially a storage protocol?

A. SCSI was originally designed as both a protocol and a bus (original Parallel SCSI). Because there were no other busses, the SCSI bus did it all; disks, tapes, scanners, printers, Optical (CDs), media changers, etc. As other busses came onto the market (think USB), many of those devices moved to the new bus (CDs, printers, scanners, etc.) Commodity devices used commodity busses (IDE, SATA, USB), and enterprise devices used enterprise busses (FC, SAS); and so, disks, tapes, and media changers mostly stayed on SCSI.

The name SCSI can be confusing for some, as the term originally was used for both the SCSI protocol and the SCSI bus. The term for the SCSI protocol is all that remains today; the SCSI bus (the old SCSI parallel bus) is no longer in wide use. Today, the FC bus, or the SAS bus, or the SoP bus, or the SRP bus are used to carry the SCSI protocol. The SCSI Architecture Model (SAM) describes a very distinct separation between the device layer (the SCSI protocol) and the transport layer (the bus).

And, the SCSI command set has become the basis for many subsequent command sets. The JEDEC group used the SCSI command set as a model (JEDEC devices are in your cell phone), the ATAPI devices used SCSI commands, and many SCSI commands and SATA commands have a common heritage. The Mt. Fuji group (a standards group in Japan) also uses SCSI as the basis for new DVD and BlueRay devices. So, while not widely known, the SCSI command family has grown well beyond what is managed by the ANSI/INCITS T10 committee that originally defined SCSI in to a broad set of capabilities that are used across the industry, by a broad group of organizations. But, that all said, scanners and printers are still on USB, and SCSI is almost all about storage in one form or another.

Q. How does iSCSI support software-defined storage?

A. Answered during the talk. SDS provides more automation and knobs on the storage capabilities. But SDS still needs a way to transport the storage and iSCSI works perfectly fine for that. They are complementary technologies, not competing.

Q. With 40Gb and faster coming soon to a server near you, what kind of impact will that have on CPU utilization? Will smaller servers be able to push that much traffic?

A. More throughput simply requires more CPU. With good multithreaded drivers available, this can mean simply adding cores to keep the pipe as full as possible. As we mentioned near the end, using iSCSI with RDMA lightens the load on the CPU even more, so you’ll probably be seeing more of that.

Q. Is IPSec commonly supported on iSCSI targets?  

A. Yes, IPsec is required to be implemented on an iSCSI target to be a compliant device.  However, it is not commonly enabled by customers. If they MUST provide IPsec there are a lot of non-compliant initiators and targets on the market.

Q. I’m told direct connect with iSCSI is discouraged, that there should be a switch in place to handle the buffering, latency, acknowledgement etc….. Is this true or a best practice to make sure switches are part of the design?

A. If you have no need to connect to multiple targets or multiple initiators, there’s no harm in direct connections.

Q. Ethernet was not designed to support storage traffic. The TCP/IP protocol suite was not designed to support storage traffic. SCSI was not designed to be encapsulated. So TCP/IP FTW? I think not. The reason iSCSI is exists is [perceived] cost savings. I get fed up with people constantly looking for ways to squeeze another penny out of something. To me it illustrates that they’re not very creative. Fibre Channel is a stupid name, but it is a purpose built protocol that works as designed to.

A. Ethernet is a general purpose network. It is capable of handling lots of different traffic (including storage). By putting iSCSI onto an existing Ethernet infrastructure, it can (as you point out) create a substantial cost savings over installing a FC network (although that infrastructure savings comes with other costs – such as the impact of a shared wire). However, installing a dedicated Ethernet network provides many of the advantages of a dedicated FC network, but at an added cost over that of a shared Ethernet infrastructure. While most consider FC a purpose-built storage network, it is worth pointing out that some also consider it a general purpose network (for example FC-Avionics is built into Fighter Jets, and it’s not for storage). And while not designed to be encapsulated, (it was designed for a parallel bus), SCSI today is encapsulated on every transport that carries it (yes, that includes FCP and SAS).
There are many kinds of storage at different price points, USB storage, SATA devices, rotating media (at different RPMs), SSD devices, SAS devices, FC devices, single spindles, arrays, cloud, drop boxes, etc., all with the corresponding transport wires. iSCSI is one of those wires. Each protocol and wire offer specific advantages and disadvantages.   There can be a lot of confusion about which to use, but just as everyone does not drive the same type car (a FORD FUSION for example), everyone does not need the same type of storage (FC devices/arrays). Yes, I drive a FORD FUSION, and I like FC storage, but I use a USB stick on my laptop, and I pray my bank never puts my financial records out in the cloud. Selecting the right storage (and wire) for the job at hand can be one of a system administrators most interesting problems to solve.As for the name – that is often what happens in committees…

Q. As a best practice for Windows servers, disable hardware acceleration features in NICs (TOE etc.)? Are any NIC features valuable given modern multicore CPUs?

A. Yes. Typically the only reason to disable TOE is that multiple or virtual TCP/IP stacks are going to be using the same NIC. TSO, LRO and jumbo frames will benefit any OS that can take advantage of them.

Q. What is the advantage of iSCSI when compared with NVMe?

A. NVMe and iSCSI are very different protocols. NVMe started life as a direct attach protocol to communicate to native PCIe devices (not even outside the box). iSCSI was a network protocol from day one. iSCSI has to deal with the potential for long network induced delays, and complex out of order error recovery issues. NVMe operates over an interlocked bus, and as such, does not have those issues.

But, NVMe is now being extended over fabrics. NVMe over a RoCE V1 transport will be a data center network (since there is no IP routing). NVMe over a RoCE V2 transport or an iWARP transport will have the same routing capabilities that iSCSI has.   When it comes to the raw command set, they are very similar (but there are some differences). SCSI is a more full featured command set than NVMe – it has been developed over a span of over 25 years, and has developed solutions for all the problems that have been discovered during that time span. NVMe has a more limited (or more focused) command set (for example, there are no tape commands in the NVMe command set). iSCSI is available today, as is direct attach NVMe, but NVMe over Fabrics is still in the development phases (the specification is expected to be available the first week of June, 2016). NVMe products will take some time to mature and to develop solutions for the problems they have not discovered yet. Another example of this is the ability to support shared storage – it existed on day one in iSCSI, but did not exist in the first NVMe specification. To support shared storage in NVMe over Fabrics, that capability has since been added, and it was done using a SCSI compatible method (to make it easier for host S/W that already performs this function).

There is a large community working to develop NVMe over Fabrics. As memory based storage device get cheaper, and the solution space matures, NVMe will become more attractive.

Q. How often do iSCSI installations provide encryption of data in flight? How: IPsec, IKEv2-SCSI + ESP-SCSI, etc.?

A. Rarely. More often than not, if in-flight data security is needed, it will be run on an isolated network. Well under 100% of installations are 100% compliant.  VMware never qualified IPsec with iSCSI and didn’t have any obvious switch to turn it on. Side note: We standards guys can be overly picky about words.  Since the question is “provide” the answer is – 100% of compliant installations PROVIDE encryption (IPsec V2 – see above), however, in practice, installations that require that type of security typically run on isolated networks, rather than turn on encryption.

Q. How do multiple independent applications inside the same initiator map to iSCSI sessions to the same target? E.g., iSCSI session one-to-one with application?

A. There is no relationship between applications and sessions. When an iSCSI initiator discovers a target, the initiator logs in and establishes a session. If iSCSI MCS (multi connection session) is being used, multiple TCP connections may be established and used in parallel to process operations for that session.

Applications send reads and writes to the operating system. Those IO requests make their way through the file system and caching layers into the device driver. The device driver issues the IO request to the device (over the iSCSI session) and retains information about that IO. When a completion is received from a device (the WRITE command or READ command completed), it is matched up with the request. That completion status (success or error) is passed back through the operating system (file system, etc.) to the application. So it is the responsibility of the device driver to mux/demux the requests from all the applications out over the iSCSI session and track the responses as the operations are completed.

When an operating system is using MPIO (multi-pathing), then the device driver may create multiple sessions between the initiator and the target. This is where operating system MPIO policies such as round-robin, shortest queue, LRU, etc. come into play. In this case, the MPIO driver will send an IO operation to the device using what it considers to be the most appropriate path (based on the selected policy). But again, there is no relationship between the application and the path used for IO (any application can have it’s IO send via any path).

Today, MPIO is used more commonly than MCS.

Q. Will Microsoft iSCSI implement iSER?

A. This is a question for Microsoft or iSER-capable NIC vendor that provides Microsoft drivers.

Q.Zadara has some iSER deployments using Linux and VMware clients going to the Zadara cloud storage.

A. There’s an answer, all by itself.

Q. In the case of iWARP, the TCP layer takes care of out-of-order IP packet receptions. What layer does the out-of-order management of packets in ROCE ?

A. RoCE headers contain a 24 bit “Packet Sequence Number” that is used to validate the required ordering and detect lost packets. As such, ordering still occurs, just in a different way.

Q. Correction: RoCE is over Ethernet packets and is not routable. RoCEv2 is the one over UDP/IP and *is* routable.

A. You are correct. RoCE is not routable by IP. RoCE transmits raw Ethernet frames with just Ethernet MAC headers and no IP headers, and as such, it is not routable by IP. RoCE V2 puts the information into UDP packets (with appropriate IP headers), and therefore it is routable by IP.

Q. How prevalent is iSER today in deployment? And what are some of the typical applications that leverage iSER?

A. Not terribly prevalent today, but higher speed Ethernet might drive more adoption, due to the CPU savings demonstrated.

 

 

 

Next Live Webcast: NFS 101

Need a primer on NFS? On March 23, 2106, The Ethernet Storage Forum (ESF) will present a live Webcast “What is NFS? An NFS Primer.” The popular and ubiquitous Network File System (NFS) is a standard protocol that allows applications to store and manage data on a remote computer or server. NFS provides two services; a network part that connects users or clients to a remote system or server; and a file-based view of the data. Together these provide a seamless environment that masks the differences between local files and remote files.

At this Webcast, Alex McDonald, SNIA ESF Vice Chair, will provide an introduction and overview presentation to NFS. Geared for technologists and tech managers interested in understanding:

  • NFS history and development
  • The facilities and services NFS provides
  • Why NFS rose in popularity to dominate file based services
  • Why NFS continues to be important in the cloud

As always, the Webcast will be live and Alex and I will be on hand to answer your questions. Register today. Alex and I look forward to hearing from you on March 23rd.

How Ethernet RDMA Protocols iWARP and RoCE Support NVMe over Fabrics

NVMe (Non-Volatile Memory Express) over Fabrics is of tremendous interest among storage vendors, flash manufacturers, and cloud and Web 2.0 customers. Because it offers efficient remote and shared access to a new generation of flash and other non-volatile memory storage, it requires fast, low latency networks, and the first version of the specification is expected to take advantage of RDMA (Remote Direct Memory Access) support in the transport protocol.

Many customers and vendors are now familiar with the advantages and concepts of NVMe over Fabrics but are not familiar with the specific protocols that support it. Join us on January 26th for this live Webcast that will explore and compare the Ethernet RDMA protocols and transports that support NVMe over Fabrics and the infrastructure needed to use them. You’ll hear:

  • Why NVMe Over Fabrics requires a low-latency network
  • How the NVMe protocol is mapped to the network transport
  • How RDMA-capable protocols work
  • Comparing available Ethernet RDMA transports: iWARP and RoCE
  • Infrastructure required to support RDMA over Ethernet
  • Congestion management methods

The event is live, so please bring your questions. We look forward to answering them.

Next Webcast: The 2015 Ethernet Roadmap for Networked Storage

The ESF is excited to announce our next live Webcast, “The 2015 Ethernet Roadmap for Networked Storage.”

For over three decades, Ethernet has advanced on a simple “powers-of-ten” speed increases, and this model has served the industry well.  Ethernet is changing in big ways and the Ethernet Alliance has captured the latest changes in the 2015 Ethernet Roadmap.

On June 30th at 10:00 a.m. PT an expert panel comprised of Scott Kipp, President of the Ethernet Alliance, David Chalupsky, Chair IEEE P802.3bq/bz TFs and the Ethernet Alliance BASE-T Subcommittee and myself will present the Ethernet Alliance’s 2015 Ethernet Roadmap for the networking technology that underlies most of future network storage.

SNIA has focused on protocols and usage models and more or less just takes Ethernet for granted.  The biggest technology disruption in the storage space is the emergence into the mainstream of Non-Volatile Memory (NVM), FLASH in particular.  NVM increasingly moves system bottlenecks from the storage subsystem to the network.  Developments in NVM — most recently 3D FLASH — assure that the cost per GB will continue aggressive declines and demand for bandwidth will go up.  NVM will become more prevalent, making the roadmap for Ethernet increasingly more important to the storage networking community.

This will be a live and interactive session. I encourage you to register now and bring your questions for our experts. I hope to see you on June 30th.

SNIA ESF Leadership Welcomes Chad Hintz

The ESF continues our busy schedule hosting informative Webcasts, writing and publishing articles and participating at industry conferences. 2015 also brings a change in ESF leadership. I’d like to welcome Chad Hintz of Cisco as our newest ESF board member. Chad has been elected as chair of our Storage over Ethernet Special Interest Group (SIG).

The Storage over Ethernet SIG is focused on a growing trend among modern data centers to deploy consolidated Ethernet networks as the primary network infrastructure for all LAN and storage traffic. Technologies such as Data Center Bridging (DCB), Fibre Channel over Ethernet (FCoE) among others, offer organizations a robust environment to support mixed workloads with each using the most appropriate protocol (including NFS, SMB and iSCSI) over a shared Ethernet physical transport. The Storage over Ethernet SIG offers educational and thought leadership materials related to these technologies and the business value they offer to organizations of all sizes. Especially appreciated by our audience is that this information comes from SNIA and thus is vendor-neutral.

Chad brings a wealth of expertise to ESF. He is a Technical Solutions Architect focusing on designing enterprise solutions for customers around data center technologies. He holds 3 CCIEs in routing and switching, security and storage and has held certifications from Novell, VMware, and Cisco. We’re confident his expertise and passion for Ethernet Storage will be a big asset to our group.

We are looking forward to having Chad on our team to help guide the many activities we have planned this year. Other members of the 2015 board include myself as ESF chair, Alex McDonald (NetApp), and Mike Jochimsen (Emulex).

As I mentioned, the ESF is busy creating vendor-neutral educational on Ethernet connected storage networking technologies. I encourage you to check out some of our recent and upcoming content:

Upcoming Webcast – Visions for Ethernet Connected Drives

On-demand Webcast – Benefits of RDMA in Accelerating Ethernet Storage Connectivity

Article – Cloud File Services

Article – Weave Your Cloud with a Data Fabric

 

 

Benefits of RDMA in Accelerating Ethernet Storage Q&A

At our recent live Webcast “Benefits of RDMA in Accelerating Ethernet Storage Connectivity” experts from Emulex, Intel and Microsoft had an insightful discussion on the ways RDMA is having an impact on Ethernet storage. The live event was attended by nearly 200 people and feedback was overwhelming positive with several attendees thanking us for our vendor neutral presentation and one attendee commenting that it was, “Probably the most clearly comprehensible yet comprehensive webinar I’ve attended in some time.” If you missed the Webcast, it’s now available on demand. We did not have time to get to everyone’s questions, so as promised, below are answers to all of them. If you have additional questions, please ask them in the comments section in this blog and we’ll get back to you as soon as possible.

Q. Is RDMA over RoCEv2 in production?

A. The IBTA released the RoCEv2 Specification in September 2014.  In order to support that specification changes may be required across the RDMA stack, including firmware, drivers & operating systems.  Schedules for implementation of that specification will vary by operating system.  For example, the OpenFabrics Alliance (OFA) has not released an Open Fabrics Enterprise Distribution (OFED) version that implements that standard yet, although it is in process now.  Once OFA completes their OFED stack implementation, the Linux distribution vendors will then incorporate and support the updated OFED stack.  Implementations provided prior to full OFA and Distro vendor support would be preliminary, potentially incompatible with the OFED release, and require confirmation by the distro vendor with regard to the nature/level of support they would be providing

Q. I would have liked a list of Windows applications that take advantage of SMB Direct – both in a Hyper-V host or bare metal.

A. In Windows, any file-based application can make use of SMB3 and SMB Direct due to the native file-based programming interface support. No application changes are required. For certain enterprise applications such as Hyper-V and SQL Server, SMB3 is officially supported, and more information can be found in the product catalog at www.microsoft.com.

Q. Are there any particular benefits in using one network protocol over another for SMB Direct/RDMA (iWARP vs. RoCE vs. IB)?

A. There are no hard and fast rules; any adapter or protocol can be suitable for many scenarios. Of the Ethernet-based protocols we considered in today’s webcast

  • iWARP offers the benefit of operation over TCP with its reliability and routability, well-suited to a broad range of installed infrastructure.
  • RoCE offers a lightweight, efficient protocol when a DCB-enabled switched fabric is available. RoCE, however, is not routable.
  • RoCEv2 offers similar properties to RoCE, with the possibility to scale to larger routed and DCB-enabled fabrics.

Q. Who are the vendors offering iWARP capable RNICs?

A. Chelsio Communications has production iWARP adapters today, and both Intel and Qlogic have publicly committed to future iWARP controllers.

Q. How much testing has been done with SMB3, and in particular SMB direct, over WAN connections?

A. The SMB2 protocol was originally designed to adapt to WAN scenarios, and supports a credit-based management of large amounts of data to be outstanding, to make best use of WAN-type long pipes. The SMB3 protocol retains these design attributes, and the SMB Direct protocol also supports similar deep pipelining. The iWARP protocol, being layered on standard TCP, is well suited to such deployments, and RoCE WAN adapters are potentially available. Please contact the respective technology vendors for information on any available testing results.

Q. I love a future webcast for RDMA enabled distributed filesystems.

A. Thanks for the suggestion! We’re always looking for ideas for future webcasts and SNIA-ESF will consider this as a potential follow-on.

Q. Is Live Migration the scenario where “packet size” is 1MB?

A. All SMB Direct scenarios have workloads that range anywhere up to 8MB. For large file copies, most SMB3 clients request from 1MB to 8MB per operation, for Hyper-V live migration, transfers are typically similar, during the bulk transfer phase.

Q. SMB3 is being compared to FC for enterprise. If Ethernet based protocols are of interest, wouldn’t FCoE give the same performance as FC (same stack) vs. SMB3?

A. SMB3 with SMB Direct enables many workloads not possible with Fibre Channel over Ethernet, and performance comparisons are therefore difficult. Perhaps another SNIA webcast could investigate this!

Q. Regarding your SMB direct example with lots of small operations, how do you deal with the overhead of registering and unregistering buffers for the RDMA operations?

A. As answered later in the session, the registration and unregistration is not a protocol matter, but in the case of the Windows implementation, it is strictly performed for the specific buffers of each operation, which is critical for security, data integrity, and system protection. The standard “Fast Register Work Request” method is used, and careful implementation has shown that the overhead does not negatively impact performance, even for small I/O (4KB/operation). Check out Jose Barreto’s blog, which contains many benchmark results.

Q. But isn’t Live Migration done in 1MB “chunks”? So not “small” I/Os?

A. As answered later in the session, Hyper-V Live Migration is done in several phases, the first phase is the initial bulk copy of memory, done in large chunks, but immediately after it a second phase of copying individual pages which were dirtied by the live-running VM is performed. These operations are typically 4KB. Note: The faster the initial phase goes, the less work there is in this second phase, but in both phases, the faster, the better, and RDMA accelerates both.

Q. Are iSER and iWARP alternatives to one another?

A.  iWARP is an RDMA protocol, and iSER is a mapping of iSCSI to iWARP, as well as RoCE/InfiniBand.

Q. What’s Intel’s roadmap for RoCE and/or iWARP?

A. Intel is committed to iWARP and plans to incorporate it in future server chipsets and SOCs. See http://www.intel.com/content/www/us/en/ethernet-products/accelerating-ethernet-iwarp-video.html for more information.

Q. Is there any other Transport being used other than IB to create a reliable transport for RoceV2? Puristically it is possible?

A. RoCE was developed to leverage Infiniband as much as possible.  For that reason, the Infiniband transport was chosen when the RoCE standard was developed.  As the RoCEv2 standard was developed, the underlying Infiniband network protocol was replaced with IPv4 / IPv6 in order to provide the layer 3 routability and UDP to provide stateless encapsulation (and indication) of the Infiniband transport header that was retained.  While it may be possible to develop a reliable transport to replace Infiniband, the RoCE standards body has elected not to go that route as of this writing.

 

 

 

Relentless Advance Of Ethernet – And Ethernet Storage Networking

As one Cisco colleague once said to me, “After the nuclear holocaust, there will be two things left: cockroaches and Ethernet.”  Not sure I like Ethernet’s unappealing company in that statement, but the truth it captures is that Ethernet, now entering its fifth decade (wow!), is ubiquitous and still continuing to advance at a breathtaking pace.  And as it advances, it advances the capabilities of storage networking based on the Ethernet backbone, be it file storage like NFS or SMB or block storage like iSCSI or FCoE.

Most recent evidence of Ethernet’s continuing and relentless evolution is illustrated in the 28 March 2014 announcement from the Ethernet Alliance congratulating the IEEE on formation of their IEEE P802.3bs™ Task Force:

The new group is chartered with the development of the IEEE P802.3bs 400 Gigabit Ethernet (GbE) project, which will define Ethernet Media Access Control (MAC) parameters, physical layer specifications, and management parameters for the transfer of Ethernet format frames at 400 Gb/s. As the leading voice of the Ethernet ecosystem, the Ethernet Alliance is ideally positioned to support this latest move towards standardizing and advancing 400Gb/s technologies through efforts such as the launch of the Ethernet Alliance’s own 400 GbE Subcommittee.

Ethernet is in production today from multiple vendors at 40GbE and supports all storage protocols, including FCoE, at those speeds.  Market forecasters expect the first 100GbE adapters to appear in 2015.  Obviously, it is too early to forecast when 400GbE will arrive, but the train is assuredly in motion.  And support for all the key storage protocols we see today on 10GbE and 40GbE will naturally extend to 100GbE and 400GbE.  Jim O’Reilly makes similar points in his recent Information Week article, “Ethernet: The New Storage Area Network where he argues, “Ethernet wins on schedule, cost, and performance.”

Beyond raw transport speed, the rich Ethernet infrastructure offers techniques to catapult your performance even beyond the fastest single-pipe speed.  The Ethernet world has established techniques for what is alternately referred to as link aggregation, channel bonding, or teaming.  The levels available are determined by the capabilities provided in system software and what switch vendors will support.  And those capabilities, in turn, are determined by what they respectively see as market demand.  VMware, for example, today will let you bond eight 10GbE channels into a single 80GbE pipe.  And that’s today with mainstream 10GbE technology.

Ethernet will continue to evolve in many different ways to support the needs of the industry.  Serving as a backbone for all storage networking traffic is just one of many such roles for Ethernet.  In fact, precisely because of the increasing breadth of usage models Ethernet supports, it will also continue to offer cost advantages.  The argument here is a very simple volume argument:

Total Server-class Adapter and LOM Market Ports

crehan-relentless-ethernet-420

Enough said, except to also note that volume is what funds speed roadmaps.

 

 

2013 in Review and the Outlook for 2014 – A SNIA ESF Perspective

Technology continues to advance rapidly. Making sense of it all can be a challenge. At the SNIA Ethernet Storage Forum, we focus on storage technologies and solutions enabled by and associated with Ethernet Networks. Last year, we modified the charters of our two Special Interest Groups (SIG) to address topics about file protocols and storage over Ethernet. The File Protocols SIG includes the prior focus on Network File System (NFS) related topics and adds discussions around Server Message Block (SMB / CIFS). We had our first webcast last November on the topic of SMB 3.0 and it was our best attended webcast ever. The Storage over Ethernet SIG focuses on general Ethernet storage topics as well as more information about technologies like FCoE, iSCSI, Data Center Bridging, and virtual networking for storage. I encourage you to check out other articles on these hot topics in this SNIAESF blog to hear from our member experts as well as guest posts from leading analysts.

2013 was a busy year and we are already kickin’ it in 2014. This should be an exciting year in IT. Data storage continues to be a hot sector especially in the areas of All-Flash and Hybrid arrays. This year, we will expect to see new standards coming out of the T11 committee for Fibre Channel and possibly FCoE as well as progress in high speed Ethernet networks. Lower cost network interconnects will facilitate adoption of high speed networks in the small to midsize business segment. And a new conversation around “Software Defined…” should push a lot of ink in trade rags and other news sources. Oh, and don’t forget about the “Internet of Things”, mobile solutions, and all things Cloud.

The ESF will be addressing the impact on Ethernet storage solutions from these hot technologies. Next month, on February 18th, experts from the ESF, along with industry analysts from Dell’Oro Group will speak to the benefits and best practices of deploying FCoE and iSCSI storage protocols. This presentation “Use Cases for iSCSI and Fibre Channel: Where Each Makes Sense” will be part of an upcoming BrightTalk Summit on Storage Networking. I encourage you to register for this session. Additionally, we will be publishing a couple of white papers on file-based storage and a review of FCoE and iSCSI in storage applications.

Finally, SNIA will be kicking off its first year of the new user conference, Data Storage Innovation Conference. This will be one of the few storage focused user conferences in the market and should be quite interesting.

We’re excited about our growing membership and our plans for 2014. Our goal is to advance application of innovative technologies and we encourage you to send us mail or comment below with topics that are of interest to you.

Here’s to an exciting 2014!

How is 10GBASE-T Being Adopted and Deployed?

For nearly a decade, the primary deployment of 10 Gigabit Ethernet (10GbE) has been using network interface cards (NICs) supporting enhanced Small Form-Factor Pluggable (SFP+) transceivers. The predominant transceivers for 10GbE are Direct Attach (DA) copper, short range optical (10GBASE-SR), and long-range optical (10GBASE-LR). The Direct Attach copper option is the least expensive of the three. However, its adoption has been hampered by two key limitations:

- DA’s range is limited to 7m, and

- because of the SFP+ connector, it is not backward-compatible with existing 1GbE infrastructure using RJ-45 connectors and twisted-pair cabling.

10GBASE-T addresses both of these limitations.

10GBASE-T delivers 10GbE over Category 6, 6A, or 7 cabling terminated with RJ-45 jacks. It is backward-compatible with 1GbE and even 100 Megabit Ethernet. Cat 6A and 7 cables will support up to 100m. The advantages for deployment in an existing data center are obvious. Most existing data centers have already installed twisted pair cabling at Cat 6 rating or better. 10GBASE-T can be added incrementally to these data centers, either in new servers or via NIC upgrades “without forklifts.” New 10GBASE-T ports will operate with all the existing Ethernet infrastructure in place. As switches get upgraded to 10GBASE-T at whatever pace, the only impact will be dramatically improved network bandwidth.

Market adoption of 10GBASE-T accelerated sharply with the first single-chip 10GBASE-T controllers to hit production. This integration become possible because of Moore’s Law advances in semiconductor technology, which also enabled the rise of dense commercial switches supporting 10GBASE-T. Integrating PHY and MAC on a single piece of silicon significantly reduced power consumption. This lower power consumption made fan-less 10GBASE-T NICs possible for the first time. Also, switches supporting 10GBASE-T are now available from Cisco, Dell, Arista, Extreme Networks, and others with more to come. You can see the early market impact single-chip 10GBASE-T had by mid-year 2012 in this analysis of shipments in numbers of server ports from Crehan Research:

 

Server-class Adapter & LOM 10GBASE-T Shipments

Note, Crehan believes that by 2015, over 40% of all 10GbE adapters and controllers sold that year will be 10GBASE-T.

Early concerns about the reliability and robustness of 10GBASE-T technology have all been addressed in the most recent silicon designs. 10GBASE-T meets all the bit-error rate (BER) requirements of all the Ethernet and storage over Ethernet specifications. As I addressed in an earlier SNIA-ESF blog, the storage networking market is a particularly conservative one. But there appear to be no technical reasons why 10GBASE-T cannot support NFS, iSCSI, and even FCoE. Today, Cisco is in production with a switch, the Nexus 5596T, and a fabric extender, the 2232TM-E that support “FCoE-ready” 10GBASE-T. It’s coming – with all the cost of deployment benefits of 10GBASE-T.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll. Note: There is a poll embedded within this post, please visit the site to participate in this post's poll. Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.