Take Our 10GBASE-T Quick Poll

I’ve gotten some interesting feedback on my recent 10GBASE-T blog, “How is 10GBASE-T Being Adopted and Deployed.” It’s prompted us at the ESF to learn more about your 10BASE-T plans. Please let us know by taking our 3-question poll. I’ll share the results in a future blog post.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll. Note: There is a poll embedded within this post, please visit the site to participate in this post's poll. Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

How DCB Makes iSCSI Better

A challenge with traditional iSCSI deployments is the non-deterministic nature of Ethernet networks. When Ethernet networks only carried non-storage traffic, lost data packets where not a big issue as they would get retransmitted. However; as we layered storage traffic over Ethernet, lost data packets became a “no no” as storage traffic is not as forgiving as non-storage traffic and data retransmissions introduced I/O delays which are unacceptable to storage traffic. In addition, traditional Ethernet also had no mechanism to assign priorities to classes of I/O.

Therefore a new solution was needed. Short of creating a separate Ethernet network to handle iSCSI storage traffic, Data Center Bridging (DCB), was that solution.

The DCB standard is a key enabler of effectively deploying iSCSI over Ethernet infrastructure. The standard provides the framework for high-performance iSCSI deployments with key capabilities that include:
- Priority Flow Control (PFC)—enables “lossless Ethernet”, a consistent stream of data between servers and storage arrays. It basically prevents dropped frames and maximizes network efficiency. PFC also helps to optimize SCSI communication and minimizes the effects of TCP to make the iSCSI flow more reliably.
- Quality of Service (QoS) and Enhanced Transmission Selection (ETS)—support protocol priorities and allocation of bandwidth for iSCSI and IP traffic.
- Data Center Bridging Capabilities eXchange (DCBX) — enables automatic network-based configuration of key network and iSCSI parameters.

With DCB, iSCSI traffic is more balanced over high-bandwidth 10GbE links. From an investment protection perspective, the ability to support iSCSI and LAN IP traffic over a common network makes it possible to consolidate iSCSI storage area networks with traditional IP LAN traffic networks. There is also another key component needed for iSCSI over DCB. This component is part of Data Center Bridging eXchange (DCBx) standard, and it’s called TCP Application Type-Length-Value, or simply “TLV”! TLV allows the DCB infrastructure to apply unique ETS and PFC settings to specific sub-segments of the TCP/IP traffic. This is done through switches which can identify the sub-segments based on their TCP socket or port identifier which are included in the TCP/IP frame. In short, TLV directs servers to place iSCSI traffic on available PFC queues, which separates storage traffic from other IP traffic. PFC also eliminates data retransmission and supports a consistent data flow with low latency. IT administrators can leverage QoS and ETS to assign bandwidth and priority for iSCSI storage traffic, which is crucial to support critical applications.

Therefore, depending on your overall datacenter environment, running iSCSI over DCB can improve:
- Performance by insuring a consistent stream of data, resulting in “deterministic performance” and the elimination of packet loss that can cause high latency
- Quality of service through allocation of bandwidth per protocol for better control of service levels within a converged network
- Network convergence

For more information on this topic or technologies discussed in this blog, please visit some of our other blog articles:
- What Up with DCBX Blog and iSCSI over DCB: Reliability and predictable performance or check out the IEEE website on DCB

VN2VN: “Ethernet Only” Fibre Channel over Ethernet (FCoE) Is Coming

The completion of a specification for FCoE (T11 FC-BB-5, 2009) held great promise for unifying storage and LAN over a unified Ethernet network, and now we are seeing the benefits. With FCoE, Fibre Channel protocol frames are encapsulated in Ethernet packets. To achieve the high reliability and “lossless” characteristics of Fibre Channel, Ethernet itself has been enhanced by a series of IEEE 802.1 specifications collectively known as Data Center Bridging (DCB). DCB is now widely supported in enterprise-class Ethernet switches. Several major switch vendors also support the capability known as Fibre Channel Forwarding (FCF) which can de-encapsulate /encapsulate the Fibre Channel protocol frames to allow, among other things, the support of legacy Fibre Channel SANs from a FCoE host.

 
The benefits of unifying your network with FCoE can be significant, in the range of 20-50% total cost of ownership depending on the details of the deployment. This is significant enough to start the ramp of FCoE, as SAN administrators have seen the benefits and successful Proof of Concepts have shown reliability and delivered performance. However, the economic benefits of FCoE can be even greater than that. And that’s where VN2VN — as defined in the final draft T11 FC-BB-6 specification — comes in. This spec completed final balloting in January 2013 and is expected to be published this year. The code has been incorporated in the Open FCoE code (www.open-fcoe.org). VN2VN was demonstrated at the Fall 2012 Intel Developer Forum in two demos by Intel and Juniper Networks, respectively.

 
“VN2VN” refers to Virtual N_Port to Virtual N_Port in T11-speak. But the concept is simply “Ethernet Only” FCoE. It allows discovery and communication between peer FCoE nodes without the existence or dependency of a legacy FCoE SAN fabric (FCF). The Fibre Channel protocol frames remain encapsulated in Ethernet packets from host to storage target and storage target to host. The only switch requirement for functionality is support for DCB. FCF-capable switches and their associated licensing fees are expensive. A VN2VN deployment of FCoE could save 50-70% relative to the cost of an equivalent Fibre Channel storage network. It’s these compelling potential cost savings that make VN2VN interesting. VN2VN could significantly accelerate the ramp of FCoE. SAN admins are famously conservative, but cost savings this large are hard to ignore.

 
An optional feature of FCoE is security support through Fibre Channel over Ethernet (FCoE) Initialization Protocol (FIP) snooping. FIP snooping, a switch function, can establish firewall filters that prevent unauthorized network access by unknown or unexpected virtual N_Ports transmitting FCoE traffic. In BB-5 FCoE, this requires FCF capabilities in the switch. Another benefit of VN2VN is that it can provide the security of FIP snooping, again without the requirement of an FCF.

 
Technically what VN2VN brings to the party is new T-11 FIP discovery process that enables two peer FCoE nodes, say host and storage target, to discover each other and establish a virtual link. As part of this new process of discovery they work cooperatively to determine unique FC_IDs for each other. This is in contrast to the BB-5 method where nodes need to discover and login to an FCF to be assigned FC_IDs. A VN2VN node can login to a peer node and establish a logical point-to-point link with standard fabric login (FLOGI) and port login (PLOGI) exchanges.

VN2VN also has the potential to bring the power of Fibre Channel protocols to new deployment models, most exciting, disaggregated storage. With VN2VN, a rack of diskless servers could access a shared storage target with very high efficiency and reliability. Think of this as “L2 DAS,” the immediacy of Direct Attached Storage over an L2 Ethernet network. But storage is disaggregated from the servers and can be managed and serviced on a much more scalable model. The future of VN2VN is bright.

The Advantages of NFSv4.1

In a previous blog post Why NFSv4.1 and pNFS are Better than NFSv3 Could Ever Be, some of the issues with NFSv3 that made it difficult to implement as a WAN based or data center wide protocol were discussed. The question then becomes; why not move to NFSv4 instead of NFSv4.1? Isn’t that a bigger leap from NFSv3?

Well, practical experience and some issues with NFSv4 made NFSv4.1 a necessity; for one, it introduces the key concept of sessions, and provides a foundation for pNFS (parallel NFS) which we’ll discuss in a later blog post. And all the features of NFSv4 were carried over into NFSv4.1, since it was a minor version update; there’s little more to do to take advantage of NFSv4.1, so that’s where your focus evaluation and implementation should be.

TCP for Transport

NFSv3 supports both TCP (Transmission Control Protocol) and UDP (User Datagram Protocol), and UDP is sometimes employed (for those applications that support it) because it is perceived to be lightweight and faster in comparison to TCP.

The downside of UDP is that it’s connectionless (that is, stateless) and an unreliable protocol. There is no guarantee that the datagrams will be delivered in any given order to the destination host — or even delivered at all — so applications must be specifically designed to handle missing, duplicate or incorrectly ordered data. UDP is also not a good network citizen; there is no concept of congestion or flow control, and no ability to apply quality of service (QoS) criteria.

The NFSv4 specification requires that any transport used provides congestion control. The easiest way to do this is via TCP. By using TCP, NFSv4 clients and servers are able to adapt to known frequent spikes in unreliability on the Internet; and retransmission is managed in the transport layer instead of in the application layer, greatly simplifying applications and their management on a shared network.

NFSv4 also introduces strict rules about retries over TCP in contrast to the complete lack of rules in NFSv3 for retries over TCP. As a result, if NFSv3 clients have timeouts that are too short, NFSv3 servers may drop requests. NFSv4 uses the timers that are built into the connection-oriented transport.

Network Ports

To access an NFS server, an NFSv3 client must contact the server’s portmapper to find the port of the mountd server. It then contacts the mount server to get an initial file handle, and again contacts the portmapper to get the port of the NFS server. Finally, the client can access the NFS server.

This creates problems for using NFS through firewalls, because firewalls typically filter traffic based on well-known port numbers. If the client is inside a firewalled network, and the server is outside the network, the firewall needs to know what ports the portmapper, mountd and nfsd servers are listening on. The mount server can listen on any port, so telling the firewall what port to permit is not practical. While the NFS server usually listens on port 2049, sometimes it does not. While the portmapper always listens on the same port (111), many firewall administrators, out of excessive caution, block requests to port 111 from inside the firewalled network to servers outside the network. As a result, NFSv3 is not practical to use through firewalls. (Aside from which, without security, it’s risky too.)

NFSv4 uses a single port number by mandating the server will listen on port 2049. There are no “auxiliary” protocols like statd, lockd and mountd required as the mounting and locking protocols have been incorporated into the NFSv4 protocol. This means that NFSv4 clients do not need to contact the portmapper, and do not need to access services on floating ports.

As NFSv4 uses a single TCP connection with a well-defined destination TCP port, it traverses firewalls and network address translation (NAT) devices with ease, and makes firewall configuration as simple as configuration for HTTP servers.

Mounts and Automounter

The automounter daemons and the utilities on different flavors of UNIX and Linux are capable of identifying different NFS versions. However, using the automounter will require at least port 111 to be permitted through any firewall between server and client, as it uses the portmapper.

This is undesirable if you are extending the use of NFSv4 beyond traditional NFSv3 environments, so in preference the widely available “mirror mount” facility can be used. It enhances the behavior of the NFSv4 client by creating a new mountpoint whenever it detects that a directory’s fsid differs from that of its parent and automatically mounts filesystems when they are encountered at the NFSv4 server .

This enhancement does not require the use of the automounter and therefore does not rely on the content or propagation of automounter maps, the availability of NFSv3 services such as mountd, or opening firewall ports beyond the single port 2049 required for NFSv4.

Internationalization Support; UTF-8

Yes, those funny characters outside of US-ASCII are supported. In a welcome recognition that it set no longer provides the descriptive capabilities demanded by languages with larger alphabets or those that use an extensive range of non-Roman glyphs, NFSv4 uses UTF-8 for file names, directories, symlinks and user and group identifiers. As UTF-8 is backwards compatible with 7 bit encoded ASCII, any names that are 7 bit ASCII will continue to work.

Compound RPCs

Latency in a wide area network (WAN) is a perennial issue, and is very often measured in tenths of a second to seconds. NFS uses Remote Procedure Calls (RPCs) to undertake all its communication with the server, and although the payload is normally small, meta-data operations are largely synchronous and serialized. Operations such as file lookup (LOOKUP), the fetching of attributes (GETATTR) and so on, make up the largest percentage by count of the average traffic load on NFS.

This mix of a typical NFS set of RPC calls in versions prior to NFSv4 requires each RPC call is a separate transaction over the wire. NFSv4 avoids the expense of single RPC requests and the attendant latency issues and allows these calls to be bundled together. For instance, a lookup, open, read and close can be sent once over the wire, and the server can execute the entire compound call as a single entity. The effect is to reduce latency considerably for multiple operations.

Delegations

Servers are employing ever more quantities of RAM and flash technologies, and very large caches in the orders of terabytes are not uncommon. Applications running over NFSv3 can’t take advantage of these caches unless they have specific application support. With increasing WAN latencies doing every IO over the wire introduces significant delay.

NFSv4 allows the server to delegate certain responsibilities to the client, a feature that allows caching locally where the data is being accessed. Once delegated, the client can act on the file locally with the guarantee that no other client has a conflicting need for the file; it allows the application to have locking, reading and writing requests serviced on the application server without any further communication with the NFS server. To prevent deadlocking conditions, the server can recall the delegation via an asynchronous callback to the client should there be a conflicting request for access to the file from a different client.

Migration, Replicas and Referrals

For broader use within a datacenter, and in support of high availability applications such as databases and virtual environments, copying data for backup and disaster recovery purposes, or the ability to migrate it to provide VM location independence are essential. NFSv4 provides facilities for both transparent replication and migration of data, and the client is responsible for ensuring that the application is unaware of these activities. An NFSv4 referral allows servers to redirect clients from this server’s namespace to another server; it allows the building of a global namespace while maintaining the data on discrete and separate servers.

Sessions

Perhaps one of the most significant features of NFSv4.1 is the introduction of stateful sessions. Sessions bring the advantages of correctness and simplicity to NFS semantics. In order to improve on the correctness of NFSv4, NFSv4.1 sessions introduce “exactly-once” semantics.
Servers maintain one or more session states in agreement with the client; they maintain the server’s state relative to the connections belonging to a client. Clients can be assured that their requests to the server have been executed, and that they will never be executed more than once.

Sessions extend the idea of NFSv4 delegations, which introduced server-initiated asynchronous callbacks; clients can initiate session requests for connections to the server. For WAN based systems, this simplifies operations through firewalls.

Security

An area of great confusion, many believe that NFSv4 requires the use of strong security. The NFSv4 specification simply states that implementation of strong RPC security by servers and clients is mandatory, not the use of strong RPC security. This misunderstanding may explain the reluctance of users from migrating to NFSv4 due to the additional work in implementing or modifying their existing Kerberos security.

Security is increasingly important as NFSv4 makes data more easily available over the WAN. This feature was considered so important by the IETF NFS working group that the security specification using Kerberos v5 was “retrofitted” to the NFSv2 and NFSv3 specifications.

Graphic for Advantages of NFS

Although access to an NFS filesystem without strong security such as provided by Kerberos is possible, across a WAN it should really be considered only as a temporary measure. In that spirit, it should be noted that NFSv4 can be used without implementing Kerberos security. The fact that it is possible does not make it desirable! A fuller description of the issues and some migration considerations can be found in the SNIA White Paper “Migrating from NFSv3 to NFSv4”.

Many of the practical issues faced in implementing robust Kerberos security in a UNIX environment can be eased by using a Windows Active Directory (AD) system. Windows uses the standard Kerberos protocol as specified in RFC 1510; AD user accounts are represented to Kerberos in the same way as accounts in UNIX realms. This can be a very attractive solution in mixed-mode environments.

In the next post, we’ll discuss one of the primary features of NFSv4.1; pNFS, or parallelized NFS, and some of the new work being done in support of NFSv4.2.
FOOTNOTE: Parts of this blog were originally published in Usenix ;login: February 2012 under the title The Background to NFSv4.1. Used with permission.

The Advantages of NFSv4.1

In a previous blog post Why NFSv4.1 and pNFS are Better than NFSv3 Could Ever Be, some of the issues with NFSv3 that made it difficult to implement as a WAN based or data center wide protocol were discussed. The question then becomes; why not move to NFSv4 instead of NFSv4.1? Isn’t that a bigger leap from NFSv3?

Well, practical experience and some issues with NFSv4 made NFSv4.1 a necessity; for one, it introduces the key concept of sessions, and provides a foundation for pNFS (parallel NFS) which we’ll discuss in a later blog post. And all the features of NFSv4 were carried over into NFSv4.1, since it was a minor version update; there’s little more to do to take advantage of NFSv4.1, so that’s where your focus evaluation and implementation should be.

TCP for Transport

NFSv3 supports both TCP (Transmission Control Protocol) and UDP (User Datagram Protocol), and UDP is sometimes employed (for those applications that support it) because it is perceived to be lightweight and faster in comparison to TCP.

The downside of UDP is that it’s connectionless (that is, stateless) and an unreliable protocol. There is no guarantee that the datagrams will be delivered in any given order to the destination host — or even delivered at all — so applications must be specifically designed to handle missing, duplicate or incorrectly ordered data. UDP is also not a good network citizen; there is no concept of congestion or flow control, and no ability to apply quality of service (QoS) criteria.

The NFSv4 specification requires that any transport used provides congestion control. The easiest way to do this is via TCP. By using TCP, NFSv4 clients and servers are able to adapt to known frequent spikes in unreliability on the Internet; and retransmission is managed in the transport layer instead of in the application layer, greatly simplifying applications and their management on a shared network.

NFSv4 also introduces strict rules about retries over TCP in contrast to the complete lack of rules in NFSv3 for retries over TCP. As a result, if NFSv3 clients have timeouts that are too short, NFSv3 servers may drop requests. NFSv4 uses the timers that are built into the connection-oriented transport.

Network Ports

To access an NFS server, an NFSv3 client must contact the server’s portmapper to find the port of the mountd server. It then contacts the mount server to get an initial file handle, and again contacts the portmapper to get the port of the NFS server. Finally, the client can access the NFS server.

This creates problems for using NFS through firewalls, because firewalls typically filter traffic based on well-known port numbers. If the client is inside a firewalled network, and the server is outside the network, the firewall needs to know what ports the portmapper, mountd and nfsd servers are listening on. The mount server can listen on any port, so telling the firewall what port to permit is not practical. While the NFS server usually listens on port 2049, sometimes it does not. While the portmapper always listens on the same port (111), many firewall administrators, out of excessive caution, block requests to port 111 from inside the firewalled network to servers outside the network. As a result, NFSv3 is not practical to use through firewalls. (Aside from which, without security, it’s risky too.)

NFSv4 uses a single port number by mandating the server will listen on port 2049. There are no “auxiliary” protocols like statd, lockd and mountd required as the mounting and locking protocols have been incorporated into the NFSv4 protocol. This means that NFSv4 clients do not need to contact the portmapper, and do not need to access services on floating ports.

As NFSv4 uses a single TCP connection with a well-defined destination TCP port, it traverses firewalls and network address translation (NAT) devices with ease, and makes firewall configuration as simple as configuration for HTTP servers.

Mounts and Automounter

The automounter daemons and the utilities on different flavors of UNIX and Linux are capable of identifying different NFS versions. However, using the automounter will require at least port 111 to be permitted through any firewall between server and client, as it uses the portmapper.

This is undesirable if you are extending the use of NFSv4 beyond traditional NFSv3 environments, so in preference the widely available “mirror mount” facility can be used. It enhances the behavior of the NFSv4 client by creating a new mountpoint whenever it detects that a directory’s fsid differs from that of its parent and automatically mounts filesystems when they are encountered at the NFSv4 server .

This enhancement does not require the use of the automounter and therefore does not rely on the content or propagation of automounter maps, the availability of NFSv3 services such as mountd, or opening firewall ports beyond the single port 2049 required for NFSv4.

Internationalization Support; UTF-8

Yes, those funny characters outside of US-ASCII are supported. In a welcome recognition that it set no longer provides the descriptive capabilities demanded by languages with larger alphabets or those that use an extensive range of non-Roman glyphs, NFSv4 uses UTF-8 for file names, directories, symlinks and user and group identifiers. As UTF-8 is backwards compatible with 7 bit encoded ASCII, any names that are 7 bit ASCII will continue to work.

Compound RPCs

Latency in a wide area network (WAN) is a perennial issue, and is very often measured in tenths of a second to seconds. NFS uses Remote Procedure Calls (RPCs) to undertake all its communication with the server, and although the payload is normally small, meta-data operations are largely synchronous and serialized. Operations such as file lookup (LOOKUP), the fetching of attributes (GETATTR) and so on, make up the largest percentage by count of the average traffic load on NFS.

This mix of a typical NFS set of RPC calls in versions prior to NFSv4 requires each RPC call is a separate transaction over the wire. NFSv4 avoids the expense of single RPC requests and the attendant latency issues and allows these calls to be bundled together. For instance, a lookup, open, read and close can be sent once over the wire, and the server can execute the entire compound call as a single entity. The effect is to reduce latency considerably for multiple operations.

Delegations

Servers are employing ever more quantities of RAM and flash technologies, and very large caches in the orders of terabytes are not uncommon. Applications running over NFSv3 can’t take advantage of these caches unless they have specific application support. With increasing WAN latencies doing every IO over the wire introduces significant delay.

NFSv4 allows the server to delegate certain responsibilities to the client, a feature that allows caching locally where the data is being accessed. Once delegated, the client can act on the file locally with the guarantee that no other client has a conflicting need for the file; it allows the application to have locking, reading and writing requests serviced on the application server without any further communication with the NFS server. To prevent deadlocking conditions, the server can recall the delegation via an asynchronous callback to the client should there be a conflicting request for access to the file from a different client.

Migration, Replicas and Referrals

For broader use within a datacenter, and in support of high availability applications such as databases and virtual environments, copying data for backup and disaster recovery purposes, or the ability to migrate it to provide VM location independence are essential. NFSv4 provides facilities for both transparent replication and migration of data, and the client is responsible for ensuring that the application is unaware of these activities. An NFSv4 referral allows servers to redirect clients from this server’s namespace to another server; it allows the building of a global namespace while maintaining the data on discrete and separate servers.

Sessions

Perhaps one of the most significant features of NFSv4.1 is the introduction of stateful sessions. Sessions bring the advantages of correctness and simplicity to NFS semantics. In order to improve on the correctness of NFSv4, NFSv4.1 sessions introduce “exactly-once” semantics.
Servers maintain one or more session states in agreement with the client; they maintain the server’s state relative to the connections belonging to a client. Clients can be assured that their requests to the server have been executed, and that they will never be executed more than once.

Sessions extend the idea of NFSv4 delegations, which introduced server-initiated asynchronous callbacks; clients can initiate session requests for connections to the server. For WAN based systems, this simplifies operations through firewalls.

Security

An area of great confusion, many believe that NFSv4 requires the use of strong security. The NFSv4 specification simply states that implementation of strong RPC security by servers and clients is mandatory, not the use of strong RPC security. This misunderstanding may explain the reluctance of users from migrating to NFSv4 due to the additional work in implementing or modifying their existing Kerberos security.

Security is increasingly important as NFSv4 makes data more easily available over the WAN. This feature was considered so important by the IETF NFS working group that the security specification using Kerberos v5 was “retrofitted” to the NFSv2 and NFSv3 specifications.

Graphic for Advantages of NFS

Although access to an NFS filesystem without strong security such as provided by Kerberos is possible, across a WAN it should really be considered only as a temporary measure. In that spirit, it should be noted that NFSv4 can be used without implementing Kerberos security. The fact that it is possible does not make it desirable! A fuller description of the issues and some migration considerations can be found in the SNIA White Paper “Migrating from NFSv3 to NFSv4”.

Many of the practical issues faced in implementing robust Kerberos security in a UNIX environment can be eased by using a Windows Active Directory (AD) system. Windows uses the standard Kerberos protocol as specified in RFC 1510; AD user accounts are represented to Kerberos in the same way as accounts in UNIX realms. This can be a very attractive solution in mixed-mode environments.

In the next post, we’ll discuss one of the primary features of NFSv4.1; pNFS, or parallelized NFS, and some of the new work being done in support of NFSv4.2.
FOOTNOTE: Parts of this blog were originally published in Usenix ;login: February 2012 under the title The Background to NFSv4.1. Used with permission.

Free Booklet: How SSD Controllers Maximize SSD Life

SNIA SSD Controller BookSNIA’s SSSI has introduced a new booklet: How Controllers Maximize SSD Life.  This 20-page volume, which can be downloaded as a pdf for free from the SNIA website, is a compilation of a series of blog posts on the SSD Guy blog.

The booklet explains the tricks SSD controller designers use to extend NAND flash life far beyond the limits posed by NAND endurance specifications of 10,000 or fewer erase/write cycles.

The series was written by this blogger with important help from companies like Intel, SMART Storage, Marvell, and Calypso Systems.

Hard copies are available through the SNIA Solid State Storage Initiative.

What Up with DCBX?

I guess this is a blog that could either be very short or very long… The full name of the protocol – Data Center Bridging capability eXchange (DCBX) basically tells you all you need to know or maybe nothing at all. At its simplest, DCBX does what it says on the tin and the way it is in effect used is no more or less than the DCB auto negotiation capability to make sure that the data center network is correctly and consistently configured. It is important to note that technically you can debate if this is an auto negotiation protocol or not, but in reality that’s how it is actually used.

Now it is important to note that there are many misnomers around DCB itself. Let’s remember that DCB is actually a group within IEEE responsible for many separate standards – basically anything for Ethernet (or as IEEE say bridging) that is assumed to be specific to the data center. Currently, discussed are those standards and protocols related to I/O Convergence (PFC, ETS, QCN, DCBX) and those related to server virtualization (Virtual Ethernet Port Aggregator or VEPA and others). So in essence the intent of DCBX is to help two adjacent devices share information about how these protocols are, or need to be, configured. DCBX actually does this by leveraging good old LLDP – just as PFC, ETS and QCN leverage 802.1p. What is particularly nice though is that DCBX not only allows the simple exchange of information around the DCB protocols themselves but also around how upper level protocols might want to use the DCB layer.

This brings us nicely to a very critical point – like most things in this area, DCBX purely works at the link level to allow a pair of connected ports (node to switch or switch to switch) to exchange their specific port configuration. This is an important point as in a multi-hop environment you need to keep in mind that every link may successfully complete its DCBX negotiation but unless some higher level intelligence (you) ensures that things are set right on each and every link, you may still not be meeting the needs of an end-to-end traffic flow. Even in a simple case of device-switch-switch-device I could have Fibre Channel over Ethernet (FCoE) negotiated on the first device-switch and last switch-device connection, and nothing configured on the intermediate switch-switch connection – and the two FCoE end points would happily talk to each other thinking that they have end-to-end lossless connectivity. In a more complex scenario let’s also remember that many L2/L3 switches have not just the ability to route between L2 domains, but also have the ability to reclassify traffic from one 802/1p priority to another. For this reason it is often simpler to use DCB to support 8 independent forwarding planes across the data center as this means we can simply configure all ports pretty much identically. I believe the term here around being clever is ‘here be dragons’.

Anyone that has spent a little time with DCB or FCoE will actually know that DCBX doesn’t just help at the level of the layer 2 protocols, but also helps at the level of the actual upper level protocol we care about. Most well known is that DCBX can carry specific exchanges to ensure the correct configuration of DCB to support FCoE and many people may be aware that it can do the same for iSCSI as well. Far less known however is that these two examples of setting up DCB for upper level protocols are in fact just that – examples. DCBX actually has a generic application type-length-value (TLV) format whereby you can specify what you would like for any upper level protocol that can be identified by either Ethertype or IP socket. Thus DCBX has like the rest of DCB been carefully architected to support the full broad needs of I/O and network convergence and not just the needs of storage convergence. DCBX as a protocol allows you to have an NFS Application TLV, an SMB Application TLV, a RDMA over Converged Ethernet or RoCE Application TLV, iWarp Application TLV, an SNMP Application TLV – etc.

A final and very practical point that any article on DCBX needs to cover is that we are in an evolving world and there are multiple different, and indeed incompatible, versions of DCBX available. Just reviewing the common DCB equipment available today you need to consider DCBX 1.0 as used by pre-standards FCoE products, DCBX 1.01 sometimes referred to as the Converged Enhanced Ethernet (CEE) or baseline version as found most commonly on shipping products today, and DCBX IEEE as actually defined in the standards (physically mostly contained within the ETS standard). It is also important to note that while some products have mechanisms to auto discover and select which version of DCBX to use, there is in fact no standard for such mechanisms. In this case the term is I assume ‘caviat emptor – buyer beware’.

All that said, maybe I should have started this blog reminding everyone that the I/O convergence parts of DCB are not just about allowing storage traffic to be mixed with non-storage traffic without fate sharing problems, but is actually about collapsing the multiple networks of different networks into a single network. I believe the average server is said to have about 6 NICs’ today? As such in the 10GbE and up Ethernet world, the full capabilities of DCBX really are a critical enabler for simplifying the operation of the modern converged virtualized data center.

January 29 NVM Summit At the SNIA Symposium Brings Experts Together

January 29th’s Summit on Non-Volatile Memory - in San Jose, California as part of the SNIA Winter Symposium - delivers an excellent one-day, comprehensive deep-dive on all the issues you need to consider about this technology that has changed the ways that storage devices can be used.   Join 150 of your colleagues with products, strategies, or just an interest in NVM who have already signed up for this complimentary event.  Speakers from companies leading the way in NVM will offer critical insights into NVM and the future of computing in an exciting day-long agenda:

  • Keynotes from Mark Peters, Senior Analyst, ESG on the Storage Industry Landscape and David Alan Grier, President, The Computer Society on The Future of Computing with NVM Inflection Point
  • Industry Analyst Perspectives from Jeff Janukowicz, Research Director, IDC
  • Presentations from:
    • Andy Rudoff, Senior Software Engineer, Intel on the problems being solved
    • Ric Wheeler, Manager, Software Engineering, Red Hat on Linux and NVM
    • Dr. Garret Swart, Database Architect, Oracle on killer apps benefiting from this new architecture
    • Jim Pinkerton, Partner Architect, Microsoft  on design considerations when implementing NVM
    • Steven Peters, Principal Engineer, LSI  on what’s nice to have in this new stack
    • Danny Cobb, CTO, EMC on the workings of subsystem speeds and feeds
    • Kaladhar Vorguranti, Technical Director, NetApp on tools for performance modeling and measuring

Remember, this Summit is COMPLIMENTARY to attend but you must register to guarantee your seat at www.snia.org/nvmsummit-reg.   See you there!

Introducing SNIA’s Workload I/O Capture Program

SNIA’s Solid State Storage Initiative (SSSI) recently rolled out its new Workload I/O Capture Program, or WIOCP, a simple tool that captures software applications’ I/O activity by gathering statistics on workloads at the user level (IOPS, MB/s, response times queue depths, etc.)

The WIOCP helps users to identify “Hot Spots” where storage performance is creating bottlenecks.  SNIA hopes that users will help the association to collect real-use statistics on workloads by uploading their results to the SNIA website.

Using this information SNIA member companies will be able to improve the performance of their solid state storage solutions, including SSDs and flash storage arrays.

How it Works

The WIOCP software is a safe and thoroughly-tested tool which runs unobtrusively in the background to constantly capture a large set of SSD and HDD I/O metrics that are useful to both the computer user and to SNIA.

Users simply enter the drive letters for those drives for which I/O operations metrics are to be collected.   The program does not record anything that might be sensitive, including details of your actual workload (for example, files you’ve accessed.)   Results are presented in clear and accessible report formats.

How can WIOCP Help You?

Users can collect (and optionally display in real time) information reflecting their current environment and operations with the security of a tool delivered with digital authentication for their protection.

The collected I/O metrics will provide information useful to evaluate an SSD system environment.

Statistics from a wide range of applications will be collected, and can be used with the SSS Performance Test Specification to help users determine which SSD should  perform best for them.

How can Your Participation Help SNIA and the SSSI?

The WIOCP provides unique, raw information that can be analyzed by SNIA’s Technical Work Groups (TWGs) including the IOTTA TWG to gain insights into workload characteristics, key performance metrics, and SSD design tradeoffs.

The collected data from all participants will be aggregated and publicly available for download and analysis. No personally identifiable information is collected – participants will benefit from this information pool without comprising their privacy or confidentiality.

Downloading the WIOCP

Help SNIA get started on this project by clicking HERE and using the “Download Key Code”: SSSI52kd9A8Z.

The WIOCP tool will be delivered to your system with a unique digital signature.  The tool only takes a few minutes to download and initialize, after which users can return to the task at hand!

If you have any questions or comments, please contact: SSSI_TechDev-Chair@SNIA.org

Object Storage is a Big Deal (and Ethernet Matters)

A significant challenge in managing large amounts of data (or Big Data) is a lack of what I like to call “total data awareness”. It’s a situation where you know (or suspect) that you have data – you just can’t find it. When you think about many current IT environments, they are often not built for total data awareness. This starts with core elements of the IT infrastructure, such as file systems. Traditional file systems and access methods were not designed to store hundreds of millions or billions of files in a single namespace. This leads to admins storing data in multiple file systems, multiple shares, complex directory structures – not because the data should be logically organized in that way, but simply because of limitations in file system architectures. This issue becomes even more pressing when data sits in multiple locations, maybe even across on-premise and off-premise, cloud-based storage.

Is object-based storage the answer?

Think about how you find data on your computer. Do you navigate complex directory structures, trying to remember the file name of the file that hopefully has the data you are looking for – or have you moved on and just use search tools like Spotlight? Imagine you have hundreds of millions of files, scattered across dozens or hundreds of sites. How about just searching across these sites and immediately finding the data you are looking for? With object storage technology you have the ability to store data in objects, along with metadata that describes the object. Now you can just search for your data based on metadata tags (like a filename – or even better an account number and document type) – as well as manage data based on policies that leverage that metadata.

However, this often means that you have to consider interfacing with your storage system through APIs, as opposed to NFS and CIFS – so your applications need to support whatever API your storage vendor offers.

CDMI to the rescue?

Today, storage vendors often use proprietary APIs. This means that application vendors would have to support a plethora of APIs from a number of different vendors, leading to a lack of commitment from application vendors to support more innovative, object-based storage architectures.

A key path to solve this issue is to leverage technology and standards that have been specifically developed to provide this idea of a single namespace for billions of data sets and across locations and even managed services that might reside off-premise.

Relatively new on the standards side you have CDMI (http://www.snia.org/cdmi), the Cloud Data Management Interface. CDMI is a standard developed by SNIA (http://www.snia.org), the Storage Networking Industry Association, with heavy involvement from a number of leading storage vendors. CDMI not only introduces a standard interface to ingest and retrieve data into and out of a large-scale repository, it also enables applications to easily manage this repository and where the data sits.

CDMI is the new NFS

Forgive the provocation, but when it comes to creating and managing large, distributed content repositories it quickly becomes clear that NFS and CIFS are not ideally suited for this use case. This is where CDMI shines, especially with an object-based storage architecture behind it that was built to support multi-petabyte environments with billions of data sets across hundreds of sites and accommodates retention policies that can reach to “forever”.

CDMI and NFS have something in common – Ethernet

One of the key commonalities between CDMI and NFS is that they both are ideally suited to be deployed in an Ethernet infrastructure. CDMI, specifically, is a RESTful HTTP interface, so it runs on standard Ethernet networks. Even for object storage deployments that don’t support CDMI, practically all of these multi-site, long-term repositories support HTTP (and thus Ethernet) through proprietary APIs based on REST or SOAP.

Why does this matter

Ethernet infrastructure is a great foundation to run any number of workloads, including access to data that sits in large, multi-site content repositories that are based on object storage technologies. So if you are looking at object storage, chances are that you will be able to leverage existing Ethernet infrastructure.