pNFS and Future NFSv4.2 Features

In this third and final blog post on NFS (see previous blog posts Why NFSv4.1 and pNFS are Better than NFSv3 Could Ever Be and The Advantages of NFSv4.1) I’ll cover pNFS (parallel NFS), an optional feature of NFSv4.1 that improves the bandwidth available for NFS protocol access, and some of the proposed features of NFSv4.2 – some of which are already implemented in commercially available servers, but will be standardized with the ratification of NFSv4.2 (for details, see the IETF NFSv4.2 draft documents).

Finally, I’ll point out where you can get NFSv4.1 clients with support for pNFS today.

Parallel NFS (pNFS) and Layouts

Parallel NFS (pNFS) represents a major step forward in the development of NFS. Ratified in January 2010 and described in RFC-5661, pNFS depends on the NFS client understanding how a clustered filesystem stripes and manages data. It’s not an attribute of the data, but an arrangement between the server and the client, so data can still be accessed via non-pNFS and other file access protocols.  pNFS benefits workloads with many small files, or very large files, especially those run on compute clusters requiring simultaneous, parallel access to data.

 NFS 3 image 1

Clients request information about data layout from a Metadata Server (MDS), and get returned layouts that describe the location of the data. (Although often shown as separate, the MDS may or may not be standalone nodes in the storage system depending on a particular storage vendor’s hardware architecture.) The data may be on many data servers, and is accessed directly by the client over multiple paths. Layouts can be recalled by the server, as in the case for delegations, if there are multiple conflicting client requests. 

By allowing the aggregation of bandwidth, pNFS relieves performance issues that are associated with point-to-point connections. With pNFS, clients access data servers directly and in parallel, ensuring that no single storage node is a bottleneck. pNFS also ensures that data can be better load balanced to meet the needs of the client.

The pNFS specification also accommodates support for multiple layouts, defining the protocol used between clients and data servers. Currently, three layouts are specified; files as supported by NFSv4, objects based on the Object-based Storage Device Commands (OSD) standard (INCITS T10) approved in 2004, and block layouts (either FC or iSCSI access). The layout choice in any given architecture is expected to make a difference in performance and functionality. For example, pNFS object based implementations may perform RAID parity calculations in software on the client, to allow RAID performance to scale with the number of clients and to ensure end-to-end data integrity across the network to the data servers.

So although pNFS is new to the NFS standard, the experience of users with proprietary precursor protocols to pNFS shows that high bandwidth access to data with pNFS will be of considerable benefit.

Potential performance of pNFS is definitely superior to that of NFSv3 for similar configurations of storage, network and server. The management is definitely easier, as NFSv3 automounter maps and hand-created load balancing schemes are eliminated; and by providing a standardized interface, pNFS ensures fewer issues in supporting multi-vendor NFS server environments.

Some Proposed NFSv4.2 features

NFSv4.2 promises many features that end-users have been requesting, and that makes NFS more relevant as not only an “every day” protocol, but one that has application beyond the data center. As the requirements document for NFSv4.2 puts it, there are requirements for: 

  • High efficiency and utilization of resources such as, capacity, network bandwidth, and processors.
  • Solid state flash storage which promises faster throughput and lower latency than magnetic disk drives and lower cost than dynamic random access memory.

Server Side Copy

Server-Side Copy (SSC) removes one leg of a copy operation. Instead of reading entire files or even directories of files from one server through the client, and then writing them out to another, SSC permits the destination server to communicate directly to the source server without client involvement, and removes the limitations on server to client bandwidth and the possible congestion it may cause.

Application Data Blocks (ADB)

ADB allows definition of the format of a file; for example, a VM image or a database. This feature will allow initialization of data stores; a single operation from the client can create a 300GB database or a VM image on the server.

Guaranteed Space Reservation & Hole Punching

As storage demands continue to increase, various efficiency techniques can be employed to give the appearance of a large virtual pool of storage on a much smaller storage system.  Thin provisioning, (where space appears available and reserved, but is not committed) is commonplace, but often problematic to manage in fast growing environments. The guaranteed space reservation feature in NFSv4.2 will ensure that, regardless of the thin provisioning policies, individual files will always have space available for their maximum extent.

 NFS 3 image 2

While such guarantees are a reassurance for the end-user, they don’t help the storage administrator in his or her desire to fully utilize all his available storage. In support of better storage efficiencies, NFSv4.2 will introduce support for sparse files. Commonly called “hole punching”, deleted and unused parts of files are returned to the storage system’s free space pool.

Obtaining Servers and Clients

With this background on the features of NFS, there is considerable interest in the end-user community for NFSv4.1 support from both servers and clients. Many Network Attached Storage (NAS) vendors now support NFSv4, and in the last 12 months, there has been a flurry of activity and many developments in server support of NFSv4.1 and pNFS.

For NFS server vendors, there are NFSv4.1 and files based, block based and object based implementations of pNFS available; refer to the vendor websites, where you will get the latest up-to-date information.

On the client side, there is RedHat Enterprise Linux 6.4 that includes full support for NFSv4.1 and pNFS (see www.redhat.com), Novell SUSE Linux Enterprise Server 11 SP2 with NFSv4.1 and pNFS based on the 3.0 Linux kernel (see www.suse.com), and Fedora available at fedoraproject.org.

Conclusion     

NFSv4.1 includes features intended to enable its use in global wide area networks (WANs).  These advantages include:

  • Firewall-friendly single port operations
  • Advanced and aggressive cache management features
  • Internationalization support
  • Replication and migration facilities
  • Optional cryptography quality security, with access control facilities that are compatible across UNIX® and Windows®
  • Support for parallelism and data striping

The goal for NFSv4.1 and beyond is to define how you get to storage, not what your storage looks like. That has meant inevitable changes. Unlike earlier versions of NFS, the NFSv4 protocol integrates file locking, strong security, operation coalescing, and delegation capabilities to enhance client performance for data sharing applications on high-bandwidth networks.

NFSv4.1 servers and clients provide even more functionality such as wide striping of data to enhance performance.  NFSv4.2 and beyond promise further enhancements to the standard that increase its applicability to today’s application requirements. It is due to be ratified in August 2012, and we can expect to see server and client implementations that provide NFSv4.2 features soon after this; in some cases, the features are already being shipped now as vendor specific enhancements. 

With careful planning, migration to NFSv4.1 (and NFSv4.2 when it becomes generally available) from prior versions can be accomplished without modification to applications or the supporting operational infrastructure, for a wide range of applications; home directories, HPC storage servers, backup jobs and a variety of other applications.

FOOTNOTE: Parts of this blog were originally published in Usenix ;login: February 2012 under the title The Background to NFSv4.1. Used with permission.

 

New Performance Test Service Launched for Solid State Drives

The SNIA Solid State Storage Initiative (SNIA SSSI) announces a testing service where interested parties may submit their SSD products for testing to the SSS Performance Test Specification.

Drive Requirements

Any mSATA, SATA, SAS and PCIe SSDs can be tested. The tested device must be recognized as a logical device by CentOS 6.3 and must support Purge (via Security Erase, Format Unit, or equivalent proprietary method of Purge).

Available Tests

Testing is based on the SSS PTS version 1.1.  Visit the SSS Performance Test Service page for more information on the tests.

Testing Process

Testing will be conducted by Calypso Systems, a certified SSS PTS testing facility.  Participants must submit two (2) samples of the SSD to be tested and provide prepaid return express shipment bills (FedEx, DHL or UPS). Testing will take approximately 3-4 weeks to complete.

Any failed test, or test that will not complete, will be tested twice and error logs will be provided.  All product test result data will be kept confidential.

Test results are provided in standard SNIA Report Format as specified in the SSS Performance Test Specification.

For more details, contact ptstest@snia.org

10 Gigabit Ethernet – 2H12 Results and 2013 Outlook

Seamus Crehan, President, Crehan Research Inc.

2H12 results

2012 turned out be another very strong growth year for 10 Gigabit Ethernet (10GbE), with the data center switch market and the server-class adapter and LAN-on-Motherboard (LOM) market both growing more than 50%.  Broad long-term trends such as virtualization, convergence, data center network traffic growth, cloud deployments, and price declines were helped further by more specific demand drivers, many of which materialized in the latter half of 2012. These included the adoption of Romley servers, expanded 10GBASE-T product offerings for both switches and servers, 10GbE LOM solutions for volume rack servers (which drive the majority of server shipments), and the public cloud’s migration to 10GbE for mainstream server networking access. (The SNIA Ethernet Storage Forum wrote about many of these in its July 2012 whitepaper titled 10GbE Comes of Age).

However, despite another stellar growth year, 10GbE still remained a minority of the overall data center and server shipment mix (see Figure 1).  

Crehan figure 1

Furthermore, its adoption hit some turbulence in the latter half of 2012, mostly related to the initial high prices and the learning curve associated with the new Modular LOM form-factor, resulting in some inventory issues.  Another drag on 2H12 10GbE growth was the lack of comprehensive 10GBASE-T offerings from many market participants. Although we saw a very significant step up in 10GBASE-T shipments in 2012, limited product offerings throughout much of 2012 capped its adoption at under less than 10% of total 10GbE shipments.

But these 2H12 issues were more than offset by 10GbE entering its next major stage of volume server adoption during this time period.  Crehan Research reported a near-50% increase in 2H12 10GbE results as many public cloud, Web 2.0, and massively scalable data center companies deployed 10GbE servers and server-access data center switches. We believe this is the second of three major stages of mainstream 10GbE server adoption, the first of which was driven by blade servers. The third will be driven by the upgrade of the traditional enterprise segment’s large installed base of 1GbE rack and tower server ports to 10GbE.

2013 expectations

As we move through 2013, Crehan Research expects the following factors to have positive impacts on the 10GbE market, driving it closer to becoming the majority data center networking interconnect:

Better pricing and understanding of Modular LOMs.  Initial pricing on 10GbE Modular LOMs has been relatively high, contributing to slower adoption and inventory issues.  In the past, end customers were given the higher-speed LOM for free for example, during the 1GbE and blade-server 10GbE transitions.  The Modular LOM is a new product form-factor, and it takes time for buyers and sellers to get comfortable with and fully understand it. During 2013, we should see lower pricing for this class of product, driving a higher server attach rate.

Comprehensive 10GBASE-T product offerings. 2013 should finally bring complete 10GBASE-T product offerings from the major server and switch OEMs, helping drive stronger 10GBASE-T adoption and growth. More specifically, we should see more 10GBASE-T LOMs in addition to top-of-rack and end-of-row data center switches. Furthermore, we expect many of these products to be attractively priced, in order to entice the large installed base of 1GBASE-T customers to upgrade to 10GbE.

Higher-speed uplink, aggregation, and core data center switches. Servers and server-access switches likely won’t see volume deployments to 10GbE without robust and cost-effective higher-speed uplink, aggregation, and core networking options. These have now begun to arrive with 40GbE, and we are starting to see a strong ramp for this technology. Crehan Research expects 2013 to bring the advent of many 40GbE data center switches, and foresees all of the major switch vendors rolling out offerings in 2013. In contrast with the early days of 10GbE, 40GbE prices are already close to parity on a bandwidth basis with 10GbE and have settled on a single interface form factor (QSFP), which should propel 40GbE data center switches to a much stronger start than that seen by 10GbE data center switches.

Continued traction of 10GbE for storage applications. We expect that 2013 will see a continuation of the broader adoption of 10GbE as a storage protocol, in both the public cloud and traditional enterprise segments.  Although Fibre Channel remains a very important data center storage networking technology, Fibre Channel switch and Host Bus Adapter (HBA) shipments each declined slightly in 2012 and have seen flat compound annual growth rates over the past four years (see Figure 2). We expect this gradual Fibre Channel decline to continue in 2013 as more customers run Ethernet-based protocols such as NAS, iSCSI and FCoE, especially over 10GbE, for their storage needs and deployments.

Crehan figure 2

Resolving the Confusion around DCB (I Hope)

Storage traffic running over Ethernet-based networks has been around for as long as we have had Ethernet-based networks.  Of course sometimes it is technically not accurate to think of the protocols as fundamentally Ethernet protocols – whilst FCoE, by definition, only runs on Ethernet, iSCSI, SMB, and NFS,  they are, in reality, IP-based storage protocols and whilst most commonly run on Ethernet, could run on any network that supports IP.  That notwithstanding, it is increasingly important to understand the real nature of Ethernet, and in particular, the nature of the new enhancements that we put under the umbrella of Data Center Bridging (DCB).

Although there is a great deal of information around DCB, there is also a lot of confusion where even the best articles miss describing a number of its elements.  As such, with 10GbE ramping now is a good time to try to clarify the reality of what DCB does and does not do.

Perhaps the first and most important point is that DCB is, in reality, a task group in IEEE responsible for the development enhancements to the 802.1 bridge specifications that apply specifically to Ethernet Switching (or as IEEE say Bridging) in the datacenter.  As such, DCB is not in itself a standard nor is the DCB group solely involved in those standards that apply to I/O and network convergence.  The mots recent work of this task group falls into two distinct areas both of which apply to the datacenter ­ one is the now completed standards around network and I/O convergence (802.1Qau,Qaz, Qbb) but the other are those standards that address the impact of server virtualization technology (802.1Qbg, BR, and the now withdrawn Qbh). 

Protocol Tree

Also critical to understand, so that we do not overstate either the limitations of traditional Ethernet nor the advantages of the new standards around I/O and network convergence, is that these new standards build on top of many well understood, well used mature capabilities that already exist within the IEEE Ethernet standards set.  Indeed IMHO, the most important element of this is that the DCB Convergence standards are building on top of the 802.1p capability to specify eight different classes of service through a 3-bit PCP field in the 802.1Q header ­ the VLAN header.  Or to say that in English ­ Ethernet has for some considerable time had the ability to separate traffic into 8 separate categories to ensure that those different categories got different treatment – or more bluntly the fundamentals of I/O and network convergence are nothing new to Ethernet.  Not only that, but the VLAN identification itself can be used to apply QoS on different sets of traffic, as does the fact that we can usually identify different traffic types by the Ethertype or IP socket.

So what is all the fuss about? As much as there were some good convergence capabilities, it was recognized that these could be further enhanced.

802.1Qbb or Priority-based Flow Control (PFC), far from adding into Ethernet a non-existent capability for lossless, simply takes the existing capability for lossless ­ 802.3X ­ and enhances it.  802.3X, when deployed with both RX and TX pause, can give lossless Ethernet as both recognized by many in the iSCSI community as well as by the FCoE specifications.  However, the pause mechanisms apply at the port level, which means giving one traffic class lossless causes blocking of other traffic classes.  All 802.1Qbb does, along with 802.3bd, is allow the pause mechanisms to be applied individually to specific priorities or traffic classes ­ aka pause FCoE or iSCSI or RoCE whilst allowing other traffic to flow. 

PFC

802.1Qaz or ETS (let’s ignore that DCBX is also part of this document and discussed in another SNIA-ESF blog) is not bandwidth allocation to your individual priorities, rather it is the ability to create a group of priorities and apply bandwidth rules to that group.  In English, it adds a new tier to your QoS schedulers so you can now apply bandwidth rules to port, priority or class group, individual priority, and VLAN.  The standard suggests a practice of at least three groups, one for best effort traffic classes, one for PFC-lossless classes, and one for strict priority ­ though it does allow more groups. 

ETS logical view

Last but not least, 802.1Qau or QCN is not a mechanism to provide lossless capabilities.  Where pause and PFC are point-to-point flow control mechanisms QCN allows flow control to be applied by a message from the congestion point all the way back to the source.  Being Ethernet level mechanisms it is of course across multiple hops within a layer 2 domain and so cannot cross either IP routing or FCF FCoE-based forwarding.  If QCN is applied to a non PFC priority, then it would most likely reduce drop by telling the source device to slow down rather than having frames be dropped and allowing the TCP congestion window to trigger slowing down at the TCP level.  If QCN is applied to a PFC priority, then it could reduce back propagation of PFC pause and so congestion propagation within that priority. 

QCN

Although not part of the standards for DCB-based convergence, but mentioned in the standards, devices that implement DCB typically have some form of buffer carving or partitioning such that the different traffic classes are not just on different priorities or classes as they flow through the network, but are being queued in and utilizing separate buffer queues.  This is important as the separated queuing and buffer allocation is another aspect of how fate sharing is limited or avoided between the different traffic classes.  It also makes conversations around microbursts, burst absorption, and latency bubbles all far more complex than before when there was less or no buffer separation.

It is important to remember that what we are describing here are the layer 2 Ethernet mechanisms around I/O and network convergence, QoS, flow control.  These are not the only tools available (or in operation) and any datacenter design needs to fully consider what is happening at every level of the network and server stack ­ including, but not limited to, the TCP/IP layer, SCSI layer, and indeed application layer.  The interactions between the layers are often very interesting ­ but that is perhaps the subject for another blog.

In summary, with the set of enhanced convergence protocols now fully standardized and fairly commonly available on many platforms, along with the many capabilities that exist within Ethernet, and the increasing deployment of networks with 10GbE or above, more organizations are benefiting from convergence – but to do so they quickly find that they need to learn about aspects of Ethernet that in the past were perhaps of less interest in a non-converged world.