• Home
  • About
  •  

    Would You Like Some Rosé with Your iSCSI?

    February 3rd, 2017

    Would you like some rosé with your iSCSI? I’m guessing that no one has ever asked you that before. But we at the SNIA Ethernet Storage Forum like to get pretty colorful in our “Everything You Wanted To Know about Storage But Were Too Proud To Ask” webcast series as we group common storage terms together by color rather than by number.

    In our next live webcast, Part Rosé – The iSCSI Pod, we will focus entirely on iSCSI, one of the most used technologies in data centers today. With the increasing speeds for Ethernet, the technology is more and more appealing because of its relative low cost to implement. However, like any other storage technology, there is more here than meets the eye.

    We’ve convened a great group of experts from Cisco, Mellanox and NetApp who will start by covering the basic elements to make your life easier if you are considering using iSCSI in your architecture, diving into:

    • iSCSI definition
    • iSCSI offload
    • Host-based iSCSI
    • TCP offload

    Like nearly everything else in storage, there is more here than just a protocol. I hope you’ll register today to join us on March 2nd and learn how to make the most of your iSCSI solution. And while we won’t be able to provide the rosé wine, our panel of experts will be on-hand to answer your questions.


    NFS FAQ – Test Your Knowledge

    April 7th, 2016

    How would you rate your NFS knowledge? That’s the question Alex McDonald and I asked our audience at our recent live Webcast, “What is NFS?.” From those who considered themselves to be an NFS expert to those who thought NFS was a bit of a mystery, we got some great questions. As promised, here are answers to all of them. If you think of additional questions, please comment in this blog and we’ll get back to you as soon as we can.

    Q. I hope you touch on dNFS in your presentation

    A. Oracle Direct NFS (dNFS) is a client built into Oracle’s database system that Oracle claims provides faster and more scalable access to NFS servers. As it’s proprietary, SNIA doesn’t really have much to say about it; we’re vendor neutral, and it’s not the only proprietary NFS client out there. But you can read more here if you wish at the Oracle site.

    Q. Will you be talking about pNFS?

    A. We did a series of NFS presentations that covered pNFS a little while ago. You can find them here.

    Q. What is the difference between SMB vs. CIFS? And what is SAMBA? Is it a type of SMB protocol?

    A. It’s best explained in this tutorial that covers SMB. Samba is the open source implementation of SMB for Linux. Information on Samba can be found here.

    Q. Will you touch upon how file permissions are maintained when users come from an SMB or a non-SMB connection? What are best practices?

    A. Although NFS and SMB share some common terminology for security (ACLs or Access Control Lists) the implementations are different. The ACL security model in SMB is richer than the NFS file security model. I touched on some of those differences during the Webcast, but my advice is; don’t expect the two security domains of SMB (or Samba, the open source equivalent) and NFS to perfectly overlap. Where possible, try to avoid the requirement, but if you do need the ability to file share, talk to your NFS server supplier. A Google search on “nfs smb mixed mode” will also bring up tips and best practices.

    Q. How do you tune and benchmark NFSv4?

    A. That’s a topic in its own right! This paper gives an overview and how-to of benchmarking NFS; but it doesn’t explain what you might do to tune the system. It’s too difficult to give generic advice here, except to say that vendors should be relied on to provide their experience. If it’s a commercial solution, they will have lots of experience based on a wide variety of use cases and workloads.

    Q. Is using NFS to provide block storage a common use case?

    A. No, it’s still fairly unusual. The most common use case is for files in directories. Object and block support are relatively new, and there are more NFS “personalities” being developed, see our ESF Webcast on NFSv4.2 for more information. 

    Q. Can you comment about file locking issues over NFS?

    A. Locking is needed by NFS to maintain file consistency in the face of multiple readers and writers. Locking in NVSv3 was difficult to manage; if a server failed or clients went AWOL, then the lock manager would be left with potentially thousands of stale locks. They often required manual purging. NFSv4 simplifies that by being a stateful protocol, and by integrating the lock management functions and employing timeouts and state, it can manage client and server recovery much more gracefully. Locks are, in the main, automatically released or refreshed after a failure.

     Q. Where do things like AFS come into play? Above NFS? Below NFS? Something completely different?

    A. AFS is another distributed file system, but it is not POSIX compliant. It influenced but is not directly related to NFS. Its use is relatively small; SMB and NFS dominate. Wikipedia has a good overview.

    Q. As you said NFSv4 can hide some of the directories when exporting to clients. Can this operation hide different folders for different clients?

    A. Yes. It’s possible to maintain completely different exports to expose or hide whatever directories on the server you wish. The pseudo file system is built separately for each server export. So you can have export X with subdirectories A B and C; or export Y with subdirectories B and C only.

    Q. Similar to DFS-N and DFS-R in combination, if a user moves to a different location, does NFS have a similar methodology?

    A. I’m not sure what DFS-N and DFS-R do in terms of location transparency. NFS can be set up such that if you can contact a particular server, and if you have the correct permissions, you should be able to see the same exports regardless of where the client is running.

    Q. Which daemons should be running on server side and client side for accessing filesystem over NFS?

    A. This is NFS server and client specific. You need to look at the documentation that comes with each.

    Q. Regarding VMware 6.0. Why use NFS over FC?

    A. Good question but you’ll need to speak to VMware to get that question answered. It depends on the application, your infrastructure, your costs, and the workload.


    A Deep Dive into the SNIA Storage Developer Conference – The File Systems Track

    September 9th, 2015

    SNIA Storage Developer Conference (SDC) 2015 is two weeks away, and the SNIA Technical Council is finalizing a strong, comprehensive agenda of speakers and sessions. Wherever your interests lie, you’re sure to find experts and topics that will expand your knowledge and fuel your professional development!SDC15_WebHeader3_999x188

    For the next two weeks, SNIA on Storage will highlight exciting interest areas in the 2015 agenda. If you have not registered, you need to! Visit www.storagedeveloper.org to see the four day overview and sign up.

    Year after year, the File Systems Track at SDC provides in depth information on the latest technologies, and 2015 is no exception. The track kicks off with Vinod Eswaraprasad, Software Architect, Wipro, on Creating Plugin Modules for OpenStack Manila Services. He’ll discuss his work on integrating a multi-protocol NAS storage device to the OpenStack Manila service, looking at the architecture principle behind the scalability and modularity of Manila services, and the analysis of interface extensions required to integrate a typical NAS head.

    Ankit Agrawal and Sachin Goswami of TCS will discuss How to Enable a Reliable and Economic Cloud Storage Solution by Integrating SSD with LTFS. They will share views on how to integrate SSD as a cache with a LTFS tape system to transparently deliver the best benefits for Object Base storage, and talk about the potential challenges in their approach and best practices that can be adopted to overcome these challenges.

    Jakob Homan, Distributed Systems Engineer, Microsoft, will present Apache HDFS: Latest Developments and Trends. He’ll discuss the new features of HDFS, which has rapidly developed to meet the needs of enterprise and cloud customers, and take a look at HDFS at implementations and how they address previous shortcomings of HDFS.

    James Cain, Principal Software Architect, Quantel Limited, will discuss a Pausable File System, using his own implementation of an SMB3 server (running in user mode on Windows) to demonstrate the effects of marking messages as asynchronously handled and then delaying responses in order to build up a complete understanding of the semantics offered by a pausable file system.

    Ulrich Fuchs, Service Manager, CERN, will talk about Storage Solutions for Tomorrow’s Physics Projects, suggesting possible architectures for tomorrow’s storage implementations in this field, and showing results of first performance tests done on various solutions (Lustre, NFS, Block Object storage, GPFS ..) for typical application access patterns.

    Neal Christiansen, Principal Development Lead, Microsoft, will present Support for New Storage Technologies by the Windows Operating System, describing the changes being made to the Windows OS, its file systems, and storage stack in response to new evolving storage technologies.

    Richard Morris and Peter Cudhea of Oracle will discuss ZFS Async Replication Enhancements, exploring design decisions around enhancing the ZFS send and ZFS receive commands to transfer already compressed data more efficiently and to recover from failures without re-sending data that has already been received.

    J.R. Tipton, Development Lead, Microsoft, will discuss ReFS v2: Cloning, Projecting, and Moving Data File Systems, presenting new abstractions that open up greater control for applications and virtualization, covering block projection and cloning as well as in-line data tiering.

    Poornima Gurusiddaiah and Soumya Koduri of Red Hat will present Achieving Coherent and Aggressive Client Caching in Gluster, a Distributed System, discussing how to implement file system notifications and leases in a distributed system and how these can be leveraged to implement a client side coherent and aggressive caching.

    Sriram Rao, Partner Scientist Manager, Microsoft, will present Petabyte-scale Distributed File Systems in Open Source Land: KFS Evolution, providing an overview of OSS systems (such as HDFS and KFS) in this space, and describing how these systems have evolved to take advantage of increasing network bandwidth in data center settings to improve application performance as well as storage efficiency.

    Richard Levy, CEO and President, Peer Fusion, will discuss a High Resiliency Parallel NAS Cluster, including resiliency design considerations for large clusters, the efficient use of multicast for scalability, why large clusters must administer themselves, fault injection when failures are the nominal conditions, and the next step of 64K peers.

    Join your peers as well – register now at www.storagedeveloper.org. And stay tuned for tomorrow’s blog on Cloud topics at SDC!


    NFS 4.2 Q&A

    May 28th, 2015

    We received several great questions at our What’s New in NFS 4.2 Webcast. We did not have time to answer them all, so here is a complete Q&A from the live event. If you missed it, it’s now available on demand.

    Q. Are there commercial Linux or windows distributions available which have adopted pNFS?

    A. Yes. RedHat RHEL6.2, Suse SLES 11.3 and Ubuntu 14.10 all support the pNFS capable client. There aren’t any pNFS servers on Linux so far; but commercial systems such as NetApp (file pNFS), EMC (block pNFS), Panasas (object pNFS) and maybe others support pNFS servers. Microsoft Windows has no client or server support for pNFS.

    Q. Are we able to prevent it from going back to NFS v3 if we want to ensure file lock management?

    A. An NFSv4 mount (mount -t nfs4) won’t fall back to an nfs3 mount. See man mount for details.

    Q. Can pNFS metadata servers forward clients to other metadata servers?

    A. No, not currently.

    Q. Can pNfs provide a way similar to synchronous writes? So data’s instantly safe in at least 2 locations?

    A. No; that kind of replication is a feature of the data servers. It’s not covered by the NFSv4.1 or pNFS specification.

    Q. Does hole punching depend on underlying file system in server?

    A. If the underlying server supports it, then hole punching will be supported. The client & server do this silently; a user of the mount isn’t aware that it’s happening.

    Q. How are Ethernet Trunks formed? By the OS or by the NFS client or NFS Server or other?

    A. Currently, they’re not! Although trunking is specified and is optional, there are no servers that support it.

    Q. How do you think vVols could impact NFS and VMware’s use of NFS?

    A. VMware has committed to supporting NFSv4.1 and there is currently support in vSphere 6. vVols adds another opportunity for clients to inform the server with IO hints; it is an area of active development.

    Q. In pNFS, does the callback call to the client must come from the original-called-to metadata server?

    A. Yes, the callback originates from the MDS.

    Q. Is hole punched in block units?

    A. That depends on the server.

    Q. Is there any functionality like SMB continuous availability?

    A. Since it’s a function of the server, and much of the server’s capabilities are unspecified in NFSv4, the answer is – it depends. It’s a question for the vendor of your server.

    Q. NFS has historically not been used in large HPC cluster environments for cluster-wide storage, for performance reasons. Do you see these changes as potentially improving this situation?

    A. Yes. There’s much work being done on the performance side, and the cluster parallelism that pNFS brings will have it outperform NFSv3 once clients employ more of its capabilities.

    Q. Speaking of the Amazon adoption for NFSv4.0. Do you have insight / guess on why Amazon did not select NFSv4.1, which has a lot more performance / scalability advantages over NFSv4.0?

    A. No, none at all.


    New Webcast: Cloud File Services: SMB/CIFS and NFS…in the Cloud

    September 18th, 2014

    Imagine evaporating your existing file server into the cloud with the same experience and level of control that you currently have on-premises. On October 1st, ESF will host a live Webcast that introduces the concept of Cloud File Services and discusses the pros and cons you need to consider.

    There are numerous companies and startups offering “sync & share” solutions across multiple devices and locations, but for the enterprise, caveats are everywhere. Register now for this Webcast to learn:

    • Key drivers for file storage
    • Split administration with sync & share solutions and on-premises file services
    • Applications over File Services on-premises (SMB 3, NFS 4.1)
    • Moving to the cloud: your storage OS in a hyperscalar or service provider
    • Accommodating existing File Services workloads with Cloud File Services
    • Accommodating cloud-hosted applications over Cloud File Services

    This Webcast will be a vendor-neutral and informative discussion on this hot topic. And since it’s live, we encourage your to bring your questions for our experts. I hope you’ll register today and we’ll look forward to having you attend on October 1st 

     


    2013 in Review and the Outlook for 2014 – A SNIA ESF Perspective

    January 28th, 2014

    Technology continues to advance rapidly. Making sense of it all can be a challenge. At the SNIA Ethernet Storage Forum, we focus on storage technologies and solutions enabled by and associated with Ethernet Networks. Last year, we modified the charters of our two Special Interest Groups (SIG) to address topics about file protocols and storage over Ethernet. The File Protocols SIG includes the prior focus on Network File System (NFS) related topics and adds discussions around Server Message Block (SMB / CIFS). We had our first webcast last November on the topic of SMB 3.0 and it was our best attended webcast ever. The Storage over Ethernet SIG focuses on general Ethernet storage topics as well as more information about technologies like FCoE, iSCSI, Data Center Bridging, and virtual networking for storage. I encourage you to check out other articles on these hot topics in this SNIAESF blog to hear from our member experts as well as guest posts from leading analysts.

    2013 was a busy year and we are already kickin’ it in 2014. This should be an exciting year in IT. Data storage continues to be a hot sector especially in the areas of All-Flash and Hybrid arrays. This year, we will expect to see new standards coming out of the T11 committee for Fibre Channel and possibly FCoE as well as progress in high speed Ethernet networks. Lower cost network interconnects will facilitate adoption of high speed networks in the small to midsize business segment. And a new conversation around “Software Defined…” should push a lot of ink in trade rags and other news sources. Oh, and don’t forget about the “Internet of Things”, mobile solutions, and all things Cloud.

    The ESF will be addressing the impact on Ethernet storage solutions from these hot technologies. Next month, on February 18th, experts from the ESF, along with industry analysts from Dell’Oro Group will speak to the benefits and best practices of deploying FCoE and iSCSI storage protocols. This presentation “Use Cases for iSCSI and Fibre Channel: Where Each Makes Sense” will be part of an upcoming BrightTalk Summit on Storage Networking. I encourage you to register for this session. Additionally, we will be publishing a couple of white papers on file-based storage and a review of FCoE and iSCSI in storage applications.

    Finally, SNIA will be kicking off its first year of the new user conference, Data Storage Innovation Conference. This will be one of the few storage focused user conferences in the market and should be quite interesting.

    We’re excited about our growing membership and our plans for 2014. Our goal is to advance application of innovative technologies and we encourage you to send us mail or comment below with topics that are of interest to you.

    Here’s to an exciting 2014!


    The Advantages of NFSv4.1

    February 19th, 2013

    In a previous blog post Why NFSv4.1 and pNFS are Better than NFSv3 Could Ever Be, some of the issues with NFSv3 that made it difficult to implement as a WAN based or data center wide protocol were discussed. The question then becomes; why not move to NFSv4 instead of NFSv4.1? Isn’t that a bigger leap from NFSv3?

    Well, practical experience and some issues with NFSv4 made NFSv4.1 a necessity; for one, it introduces the key concept of sessions, and provides a foundation for pNFS (parallel NFS) which we’ll discuss in a later blog post. And all the features of NFSv4 were carried over into NFSv4.1, since it was a minor version update; there’s little more to do to take advantage of NFSv4.1, so that’s where your focus evaluation and implementation should be.

    TCP for Transport

    NFSv3 supports both TCP (Transmission Control Protocol) and UDP (User Datagram Protocol), and UDP is sometimes employed (for those applications that support it) because it is perceived to be lightweight and faster in comparison to TCP.

    The downside of UDP is that it’s connectionless (that is, stateless) and an unreliable protocol. There is no guarantee that the datagrams will be delivered in any given order to the destination host — or even delivered at all — so applications must be specifically designed to handle missing, duplicate or incorrectly ordered data. UDP is also not a good network citizen; there is no concept of congestion or flow control, and no ability to apply quality of service (QoS) criteria.

    The NFSv4 specification requires that any transport used provides congestion control. The easiest way to do this is via TCP. By using TCP, NFSv4 clients and servers are able to adapt to known frequent spikes in unreliability on the Internet; and retransmission is managed in the transport layer instead of in the application layer, greatly simplifying applications and their management on a shared network.

    NFSv4 also introduces strict rules about retries over TCP in contrast to the complete lack of rules in NFSv3 for retries over TCP. As a result, if NFSv3 clients have timeouts that are too short, NFSv3 servers may drop requests. NFSv4 uses the timers that are built into the connection-oriented transport.

    Network Ports

    To access an NFS server, an NFSv3 client must contact the server’s portmapper to find the port of the mountd server. It then contacts the mount server to get an initial file handle, and again contacts the portmapper to get the port of the NFS server. Finally, the client can access the NFS server.

    This creates problems for using NFS through firewalls, because firewalls typically filter traffic based on well-known port numbers. If the client is inside a firewalled network, and the server is outside the network, the firewall needs to know what ports the portmapper, mountd and nfsd servers are listening on. The mount server can listen on any port, so telling the firewall what port to permit is not practical. While the NFS server usually listens on port 2049, sometimes it does not. While the portmapper always listens on the same port (111), many firewall administrators, out of excessive caution, block requests to port 111 from inside the firewalled network to servers outside the network. As a result, NFSv3 is not practical to use through firewalls. (Aside from which, without security, it’s risky too.)

    NFSv4 uses a single port number by mandating the server will listen on port 2049. There are no “auxiliary” protocols like statd, lockd and mountd required as the mounting and locking protocols have been incorporated into the NFSv4 protocol. This means that NFSv4 clients do not need to contact the portmapper, and do not need to access services on floating ports.

    As NFSv4 uses a single TCP connection with a well-defined destination TCP port, it traverses firewalls and network address translation (NAT) devices with ease, and makes firewall configuration as simple as configuration for HTTP servers.

    Mounts and Automounter

    The automounter daemons and the utilities on different flavors of UNIX and Linux are capable of identifying different NFS versions. However, using the automounter will require at least port 111 to be permitted through any firewall between server and client, as it uses the portmapper.

    This is undesirable if you are extending the use of NFSv4 beyond traditional NFSv3 environments, so in preference the widely available “mirror mount” facility can be used. It enhances the behavior of the NFSv4 client by creating a new mountpoint whenever it detects that a directory’s fsid differs from that of its parent and automatically mounts filesystems when they are encountered at the NFSv4 server .

    This enhancement does not require the use of the automounter and therefore does not rely on the content or propagation of automounter maps, the availability of NFSv3 services such as mountd, or opening firewall ports beyond the single port 2049 required for NFSv4.

    Internationalization Support; UTF-8

    Yes, those funny characters outside of US-ASCII are supported. In a welcome recognition that it set no longer provides the descriptive capabilities demanded by languages with larger alphabets or those that use an extensive range of non-Roman glyphs, NFSv4 uses UTF-8 for file names, directories, symlinks and user and group identifiers. As UTF-8 is backwards compatible with 7 bit encoded ASCII, any names that are 7 bit ASCII will continue to work.

    Compound RPCs

    Latency in a wide area network (WAN) is a perennial issue, and is very often measured in tenths of a second to seconds. NFS uses Remote Procedure Calls (RPCs) to undertake all its communication with the server, and although the payload is normally small, meta-data operations are largely synchronous and serialized. Operations such as file lookup (LOOKUP), the fetching of attributes (GETATTR) and so on, make up the largest percentage by count of the average traffic load on NFS.

    This mix of a typical NFS set of RPC calls in versions prior to NFSv4 requires each RPC call is a separate transaction over the wire. NFSv4 avoids the expense of single RPC requests and the attendant latency issues and allows these calls to be bundled together. For instance, a lookup, open, read and close can be sent once over the wire, and the server can execute the entire compound call as a single entity. The effect is to reduce latency considerably for multiple operations.

    Delegations

    Servers are employing ever more quantities of RAM and flash technologies, and very large caches in the orders of terabytes are not uncommon. Applications running over NFSv3 can’t take advantage of these caches unless they have specific application support. With increasing WAN latencies doing every IO over the wire introduces significant delay.

    NFSv4 allows the server to delegate certain responsibilities to the client, a feature that allows caching locally where the data is being accessed. Once delegated, the client can act on the file locally with the guarantee that no other client has a conflicting need for the file; it allows the application to have locking, reading and writing requests serviced on the application server without any further communication with the NFS server. To prevent deadlocking conditions, the server can recall the delegation via an asynchronous callback to the client should there be a conflicting request for access to the file from a different client.

    Migration, Replicas and Referrals

    For broader use within a datacenter, and in support of high availability applications such as databases and virtual environments, copying data for backup and disaster recovery purposes, or the ability to migrate it to provide VM location independence are essential. NFSv4 provides facilities for both transparent replication and migration of data, and the client is responsible for ensuring that the application is unaware of these activities. An NFSv4 referral allows servers to redirect clients from this server’s namespace to another server; it allows the building of a global namespace while maintaining the data on discrete and separate servers.

    Sessions

    Perhaps one of the most significant features of NFSv4.1 is the introduction of stateful sessions. Sessions bring the advantages of correctness and simplicity to NFS semantics. In order to improve on the correctness of NFSv4, NFSv4.1 sessions introduce “exactly-once” semantics.
    Servers maintain one or more session states in agreement with the client; they maintain the server’s state relative to the connections belonging to a client. Clients can be assured that their requests to the server have been executed, and that they will never be executed more than once.

    Sessions extend the idea of NFSv4 delegations, which introduced server-initiated asynchronous callbacks; clients can initiate session requests for connections to the server. For WAN based systems, this simplifies operations through firewalls.

    Security

    An area of great confusion, many believe that NFSv4 requires the use of strong security. The NFSv4 specification simply states that implementation of strong RPC security by servers and clients is mandatory, not the use of strong RPC security. This misunderstanding may explain the reluctance of users from migrating to NFSv4 due to the additional work in implementing or modifying their existing Kerberos security.

    Security is increasingly important as NFSv4 makes data more easily available over the WAN. This feature was considered so important by the IETF NFS working group that the security specification using Kerberos v5 was “retrofitted” to the NFSv2 and NFSv3 specifications.

    Graphic for Advantages of NFS

    Although access to an NFS filesystem without strong security such as provided by Kerberos is possible, across a WAN it should really be considered only as a temporary measure. In that spirit, it should be noted that NFSv4 can be used without implementing Kerberos security. The fact that it is possible does not make it desirable! A fuller description of the issues and some migration considerations can be found in the SNIA White Paper “Migrating from NFSv3 to NFSv4”.

    Many of the practical issues faced in implementing robust Kerberos security in a UNIX environment can be eased by using a Windows Active Directory (AD) system. Windows uses the standard Kerberos protocol as specified in RFC 1510; AD user accounts are represented to Kerberos in the same way as accounts in UNIX realms. This can be a very attractive solution in mixed-mode environments.

    In the next post, we’ll discuss one of the primary features of NFSv4.1; pNFS, or parallelized NFS, and some of the new work being done in support of NFSv4.2.
    FOOTNOTE: Parts of this blog were originally published in Usenix ;login: February 2012 under the title The Background to NFSv4.1. Used with permission.


    The Advantages of NFSv4.1

    February 19th, 2013

    In a previous blog post Why NFSv4.1 and pNFS are Better than NFSv3 Could Ever Be, some of the issues with NFSv3 that made it difficult to implement as a WAN based or data center wide protocol were discussed. The question then becomes; why not move to NFSv4 instead of NFSv4.1? Isn’t that a bigger leap from NFSv3?

    Well, practical experience and some issues with NFSv4 made NFSv4.1 a necessity; for one, it introduces the key concept of sessions, and provides a foundation for pNFS (parallel NFS) which we’ll discuss in a later blog post. And all the features of NFSv4 were carried over into NFSv4.1, since it was a minor version update; there’s little more to do to take advantage of NFSv4.1, so that’s where your focus evaluation and implementation should be.

    TCP for Transport

    NFSv3 supports both TCP (Transmission Control Protocol) and UDP (User Datagram Protocol), and UDP is sometimes employed (for those applications that support it) because it is perceived to be lightweight and faster in comparison to TCP.

    The downside of UDP is that it’s connectionless (that is, stateless) and an unreliable protocol. There is no guarantee that the datagrams will be delivered in any given order to the destination host — or even delivered at all — so applications must be specifically designed to handle missing, duplicate or incorrectly ordered data. UDP is also not a good network citizen; there is no concept of congestion or flow control, and no ability to apply quality of service (QoS) criteria.

    The NFSv4 specification requires that any transport used provides congestion control. The easiest way to do this is via TCP. By using TCP, NFSv4 clients and servers are able to adapt to known frequent spikes in unreliability on the Internet; and retransmission is managed in the transport layer instead of in the application layer, greatly simplifying applications and their management on a shared network.

    NFSv4 also introduces strict rules about retries over TCP in contrast to the complete lack of rules in NFSv3 for retries over TCP. As a result, if NFSv3 clients have timeouts that are too short, NFSv3 servers may drop requests. NFSv4 uses the timers that are built into the connection-oriented transport.

    Network Ports

    To access an NFS server, an NFSv3 client must contact the server’s portmapper to find the port of the mountd server. It then contacts the mount server to get an initial file handle, and again contacts the portmapper to get the port of the NFS server. Finally, the client can access the NFS server.

    This creates problems for using NFS through firewalls, because firewalls typically filter traffic based on well-known port numbers. If the client is inside a firewalled network, and the server is outside the network, the firewall needs to know what ports the portmapper, mountd and nfsd servers are listening on. The mount server can listen on any port, so telling the firewall what port to permit is not practical. While the NFS server usually listens on port 2049, sometimes it does not. While the portmapper always listens on the same port (111), many firewall administrators, out of excessive caution, block requests to port 111 from inside the firewalled network to servers outside the network. As a result, NFSv3 is not practical to use through firewalls. (Aside from which, without security, it’s risky too.)

    NFSv4 uses a single port number by mandating the server will listen on port 2049. There are no “auxiliary” protocols like statd, lockd and mountd required as the mounting and locking protocols have been incorporated into the NFSv4 protocol. This means that NFSv4 clients do not need to contact the portmapper, and do not need to access services on floating ports.

    As NFSv4 uses a single TCP connection with a well-defined destination TCP port, it traverses firewalls and network address translation (NAT) devices with ease, and makes firewall configuration as simple as configuration for HTTP servers.

    Mounts and Automounter

    The automounter daemons and the utilities on different flavors of UNIX and Linux are capable of identifying different NFS versions. However, using the automounter will require at least port 111 to be permitted through any firewall between server and client, as it uses the portmapper.

    This is undesirable if you are extending the use of NFSv4 beyond traditional NFSv3 environments, so in preference the widely available “mirror mount” facility can be used. It enhances the behavior of the NFSv4 client by creating a new mountpoint whenever it detects that a directory’s fsid differs from that of its parent and automatically mounts filesystems when they are encountered at the NFSv4 server .

    This enhancement does not require the use of the automounter and therefore does not rely on the content or propagation of automounter maps, the availability of NFSv3 services such as mountd, or opening firewall ports beyond the single port 2049 required for NFSv4.

    Internationalization Support; UTF-8

    Yes, those funny characters outside of US-ASCII are supported. In a welcome recognition that it set no longer provides the descriptive capabilities demanded by languages with larger alphabets or those that use an extensive range of non-Roman glyphs, NFSv4 uses UTF-8 for file names, directories, symlinks and user and group identifiers. As UTF-8 is backwards compatible with 7 bit encoded ASCII, any names that are 7 bit ASCII will continue to work.

    Compound RPCs

    Latency in a wide area network (WAN) is a perennial issue, and is very often measured in tenths of a second to seconds. NFS uses Remote Procedure Calls (RPCs) to undertake all its communication with the server, and although the payload is normally small, meta-data operations are largely synchronous and serialized. Operations such as file lookup (LOOKUP), the fetching of attributes (GETATTR) and so on, make up the largest percentage by count of the average traffic load on NFS.

    This mix of a typical NFS set of RPC calls in versions prior to NFSv4 requires each RPC call is a separate transaction over the wire. NFSv4 avoids the expense of single RPC requests and the attendant latency issues and allows these calls to be bundled together. For instance, a lookup, open, read and close can be sent once over the wire, and the server can execute the entire compound call as a single entity. The effect is to reduce latency considerably for multiple operations.

    Delegations

    Servers are employing ever more quantities of RAM and flash technologies, and very large caches in the orders of terabytes are not uncommon. Applications running over NFSv3 can’t take advantage of these caches unless they have specific application support. With increasing WAN latencies doing every IO over the wire introduces significant delay.

    NFSv4 allows the server to delegate certain responsibilities to the client, a feature that allows caching locally where the data is being accessed. Once delegated, the client can act on the file locally with the guarantee that no other client has a conflicting need for the file; it allows the application to have locking, reading and writing requests serviced on the application server without any further communication with the NFS server. To prevent deadlocking conditions, the server can recall the delegation via an asynchronous callback to the client should there be a conflicting request for access to the file from a different client.

    Migration, Replicas and Referrals

    For broader use within a datacenter, and in support of high availability applications such as databases and virtual environments, copying data for backup and disaster recovery purposes, or the ability to migrate it to provide VM location independence are essential. NFSv4 provides facilities for both transparent replication and migration of data, and the client is responsible for ensuring that the application is unaware of these activities. An NFSv4 referral allows servers to redirect clients from this server’s namespace to another server; it allows the building of a global namespace while maintaining the data on discrete and separate servers.

    Sessions

    Perhaps one of the most significant features of NFSv4.1 is the introduction of stateful sessions. Sessions bring the advantages of correctness and simplicity to NFS semantics. In order to improve on the correctness of NFSv4, NFSv4.1 sessions introduce “exactly-once” semantics.
    Servers maintain one or more session states in agreement with the client; they maintain the server’s state relative to the connections belonging to a client. Clients can be assured that their requests to the server have been executed, and that they will never be executed more than once.

    Sessions extend the idea of NFSv4 delegations, which introduced server-initiated asynchronous callbacks; clients can initiate session requests for connections to the server. For WAN based systems, this simplifies operations through firewalls.

    Security

    An area of great confusion, many believe that NFSv4 requires the use of strong security. The NFSv4 specification simply states that implementation of strong RPC security by servers and clients is mandatory, not the use of strong RPC security. This misunderstanding may explain the reluctance of users from migrating to NFSv4 due to the additional work in implementing or modifying their existing Kerberos security.

    Security is increasingly important as NFSv4 makes data more easily available over the WAN. This feature was considered so important by the IETF NFS working group that the security specification using Kerberos v5 was “retrofitted” to the NFSv2 and NFSv3 specifications.

    Graphic for Advantages of NFS

    Although access to an NFS filesystem without strong security such as provided by Kerberos is possible, across a WAN it should really be considered only as a temporary measure. In that spirit, it should be noted that NFSv4 can be used without implementing Kerberos security. The fact that it is possible does not make it desirable! A fuller description of the issues and some migration considerations can be found in the SNIA White Paper “Migrating from NFSv3 to NFSv4”.

    Many of the practical issues faced in implementing robust Kerberos security in a UNIX environment can be eased by using a Windows Active Directory (AD) system. Windows uses the standard Kerberos protocol as specified in RFC 1510; AD user accounts are represented to Kerberos in the same way as accounts in UNIX realms. This can be a very attractive solution in mixed-mode environments.

    In the next post, we’ll discuss one of the primary features of NFSv4.1; pNFS, or parallelized NFS, and some of the new work being done in support of NFSv4.2.
    FOOTNOTE: Parts of this blog were originally published in Usenix ;login: February 2012 under the title The Background to NFSv4.1. Used with permission.