SNIA’s Solid State Storage Initiative Advances the Industry at Flash Memory Summit

A classic case of SNIA Solid State Storage Initiative (SSSI) member collaboration for industry advancement was on display in the SSSI booth for NVDIMM-N demonstration at the Flash Memory Summit (FMS) 2015. Under the direction of SSSI Chair Jim Ryan and coordinated by NVDIMM SIG co chairs Arthur Sainio and Jeff Chang and TechDev Committee chair Eden Kim, the SSSI was able to update and include NVDIMM-N storage performance in the SSSI marketing collaterals on the Summary Performance Comparison by Storage Class charts.

2015SummaryPerformanceChart.NVDIMM.1200

Five SSSI member companies – AgigA Tech, Calypso, Micron, SMART Modular, and Viking Technology – collaborated over a four week period on the introduction of a new NVDIMM-N storage performance demonstration. While it is rare to have potential competitors collaborate in such a fashion, NVDIMM-N storage represents a new paradigm for super fast, low latency, high IO/watt storage solutions. The NVDIMM-SIG has taken a leadership position by evangelizing the technology and developing the industry infrastructure necessary for large scale deployment.

This collaboration highlighted a classic blend of technical, marketing and industry association cooperation.

In the weeks leading up to FMS, the NVDIMM-SIG planned for an in-booth demonstration of the NVDIMM-N storage modules. To pave the way for universal adoption, the team worked together to dial in the Intel Open Source block IO development driver to meet the standards of the SNIA Performance Test Specification (PTS). An added goal was inclusion of NVDIMM-N modules as a new line item on the Summary Performance Comparison by Storage Class chart which lists PTS performance for various storage technologies. Under the guidance of NVDIMM-SIG, a rush project was instigated to get NVDIMM-N performance data tested to the PTS for the trade show.

Micron took the lead by lending a Supermicro server with Micron NVDIMM-N to Calypso for testing. Calypso then installed CTS test software on the server to allow full testing to the PTS. Viking and SMART Modular contributed by helping dial in the drivers, as well as sending modules from Viking and SMART Modular to cross reference with the Micron modules. The test plan was comprised of several test iterations using single, dual and finally quad modules using each of the vendor contributed modules.

The early single and dual module tests ran into repeatability and stability issues. NVDIMM-SIG consulted with Intel on the nuance of the Intel block IO driver while Calypso continued testing. The team successfully completed a test run that met the PTS steady state requirements on the quad module in time to release data for the show.

We had a solid demonstration at the SNIA SSSI Flash Memory Summit Booth on NVDIMM-N Performance complete with marketing collateral available for review and a handout. NVDIMM-SIG members responded to the many questions and interest in the NVDIMM-N storage technology.

fms booth

“Once again,” said SSSI Chair Jim Ryan, “we can see the value and benefit of SNIA SSSI to its members, the SNIA educational community and the NVDIMM industry. I believe this is a great case study in how we all can contribute and benefit from working within the SSSI for the betterment of individual companies, market development and the Solid State Storage industry at large.” SSSI provides educational and marketing materials free of charge on its public website while SNIA SSSI members may join the NVDIMM-SIG and other SSSI committees. Anyone interested to find out more about the SSSI or any of its many committees can go to the following link http://www.snia.org/sssi.

 

OpenStack File Services Options

How can OpenStack consume and control file services appropriate to High Performance Compute (HPC) in a cloud and multi-tenanted environment? Find out on September 22nd when SNIA Cloud hosts a live Webcast and examines two approaches to integration.

One approach is to have OpenStack manage the storage infrastructure services using Cinder, Nova and Neutron to provide HPC Filesystem as a Service.

A second option is to use Manila file services for OpenStack to control the HPC File system deployment and manage the exports etc. This part also looks at the creation (in progress) of the Lustre Manila driver and its current progress.

I hope you’ll join Alex McDonald and me as we discuss the pros and cons of each approach. Register today and

Storage Performance Benchmarking Q&A

Our recent SNIA-ESF webcast, “Storage Performance Benchmarking: Introduction and Fundamentals” really struck a chord! It was our most highly rated and well attended webcast to date with more than 300 people at the live event. If you missed it, it’s now available on-demand. Thanks again to my colleagues, Ken Cantrell and Mark Rogov, who did an outstanding job explaining the basics and setting the stage for our next webcast in this series, “Storage Performance Benchmarking: Part 2“ on October 21st. Mark your calendar! Our audience had many great questions and there just wasn’t enough time to answer them all, so as promised, here are answers to all the questions we received.

Q. Can you explain the difference between MiB and MB?

A. The difference lies between the way that bytes are calculated. A megabyte (MB) is defined in decimal (base-10), and a mebibyte (MiB) in binary (base-2). There is a similar relationship between kilobytes (base-10) and kibibytes (KiB). So, if you begin with a single byte:

1 kilobyte (KB) = 1000 bytes (B)

1 kibibyte (KiB) = 1024 bytes (B)

1 megabyte (MB) = 1000 KB = 1000 * 1000 bytes = 1,000,000 bytes

1 mebibyte (MiB) = 1024 KiB = 1024 * 1024 bytes = 1,048,576 bytes

The distinction is important because very few applications or vendors make use of the MiB notation and instead use the MB notation to refer to binary numbers. If vendor or tool A is using MB to refer to base-10 measurements, and vendor or tool B is using MB to refer to base-2 measurements, it may falsely appear that the two vendors or tools are performing differently (or the same), when in fact the difference (or similarity) is simply because they are labeling two different things (MB and MiB) as the same (MB).

Telecommunications and networking generally measure and report in MB. Disk drive manufacturers tend label storage in terms of MB. Many operating systems report storage labeled as MB, but calculated as MiB. Storage performance (whether from an application’s view, or the storage vendor’s own tools) is generally measured in binary (MiB/s), but reported in decimal (MB/s).

Q. I disagree with the conflation of the terms “IOPS” and “throughput.” Throughput generally refers to overall concept of aggregate performance. IOPS measure throughput at a given request size, which most clients assume to be small blocks (~512B-4K). Many clients would say that bandwidth, measured in MB/s or GB/s, is a measure of throughput @ large block size (>~128K). Performance typically varies significantly @ different block sizes. So clients need to understand that all 3 of these concepts differ.

A. The SNIA dictionary equates throughput to IOPS, which is why we listed it as a common alternative for IOPS.  However, the problem with “throughput” is in the industry’s lack of agreement on what this term means. See the examples below, where sometimes it clearly refers to IOPS and other times to MB/s.  This is why, in the end, we recommend you simply don’t use the term at all, and use either IOPS or MB/s.

Throughput = MB/s

Techterms.com: “Throughput refers to how much data can be transferred from one location to another in a given amount of time. It is used to measure the performance of hard drives and RAM, as well as Internet and network connections.”

Meriam-Webster: “The amount of material, data, etc., that enters and goes through something (such as a machine or system)”

Throughput = IOPS

SNIA Dictionary: “Throughput:  [Computer System] the number of I/O requests satisfied per unit time.   Expressed in I/O requests/second, where a request is an application request to a storage subsystem to perform a read or write operation.”

Wikipedia:  “When used in the context of communication networks, such as Ethernet or packet radio, throughput or network throughput is the rate of successful message delivery over a communication channel. The data these messages belong to may be delivered over a physical or logical link, or it can pass through a certain network node. Throughput is usually measured in bits per second (bit/s or bps), and sometimes in data packets per second (p/s or pps) or data packets per time slot.”

TechTarget:  “Throughput is a term used in information technology that indicates how many units of information can be processed in a set amount of time.”

Throughput = Either

Dictionary.com:  “1. The rate at which a processor can work expressed in instructions per second or jobs per hour or some other unit of performance. 2. Data transfer rate.”

Q. So, did I hear it right? IOPS and Throughput are the same and thus can be used interchangeably?

A. Please see definitions above.

Q. The difference between FE/BE is the raid level + Write percentage = overhead IOPS. It is different at times due to RAID or any virtualization at the storage level, like if you have RAID 5, every front-end write would have 4 backend write. What happens if the throughput at the back end of a controller is more than at the front end?

A. The difference between Front End (FE) and Back End (BE) is generally the vantage point. One of the objectives of the presentation was to show that when comparing two results (or systems) the IOPS must be measured at the same point. BE IOPS depend, in part, on the protection overhead and on the software engine’s ability to counteract it. But instead of defining why there is a difference, we aimed at exposing that such difference exists. The “whys” will be addressed in the future webcasts.

Q. Also, another point that would dictate which system is the storage capacity each has to offer. Very good compare & contrast illustration.

A. Agreed, although capacity itself has several facets: raw capacity or the sum of manufacture advertised capacities of all internal drives; usable capacity or the amount of data that could be written on one system after all formatting and partitioning; and logical capacity or the amount of data that external hosts could write onto the system. Logical may be different from usable based on some data reduction services that a system could offer.

Q. Why is there a lower limit on your acceptable latency band? Storage can be too fast for your business needs?

A. We had some discussions looking at whether you show it as a cap, or a band. In many cases, customers will have a target latency cap, but allow some level of exceptions (either % of time it is exceeded, or % by which it is exceeded). That initially suggested a soft cap and a hard cap. One of us also has seen customers that highly value consistent behavior. In a service provider context, providing “too fast” performance could actually build a downstream customer expectation that will cause satisfaction issues if performance later degrades, even if it degrades to levels within the original targets.  You could certainly see this more generally as a cap though, which Ken stated verbally in the presentation. The band represents not only the cap, but also variance.

Q. What is an “OP”? An”Operation Per”? Shouldn’t this be “$/OPS” or “dollars per operation per second”?

A. This question was raised during the verbal Q&A. An “OP” in our context was any of the potential protocol-level operations. For example, in an NFSv3 context, this could be a CREATE, SETATTR, GETATTR, or FINDFILE operation. The number of these that occur in a second is what we referred to as OPS. There is certainly potential confusion here, and we struggled for consistency as we put the presentation together.

For example, sometimes we (and others) refer to operations per second as “op/s” instead of OPS, or use “Ops” or “OPs” or “ops” to mean “more than one operation” instead of “operations per second.” We chose to use OPS because it is consistent with IOPS, which we see frequently in this space, and are trying to push others towards this terminology for consistency.

Q. More IOPS in the same amount of time seems to imply that the system with more IOPS is performing each IO in less time than the other system. So how do we explain a system that can do more IOPS than another system but with a higher response time for each IO than the other system?

A. Excellent question! Two different angles on a reply follow …

First angle:

A simple IOP count doesn’t include the size of the IOP, or the work that the storage controller must perform to process it. Therefore comparing just the IOPS does not provide a good comparison: what if system 1 was doing large IOs (64KiB or more) and system 2 was doing small IOs (4KiB or less)? To avoid that, most people fix the IO size or run controlled tests changing the IO size in steps and then compare the arrays.

The point that we are making with the webcast is that IOPS alone are not enough for a comparison, one must look beyond that into IO size, response time and above all business requirements.

Second angle:

This is an excellent question, and there are at least two answers.

To begin, we need to explain the relationship between IOPS and response time.

We generally talk about response time (latency) in terms of seconds or milliseconds. We say things like “a response time of 5ms.” But this is actually a shortcut. It is really “seconds per I/O” or “milliseconds per I/O.” Also, remember that in “IOPS” the “p” stands for “per”, so this is “IOs per second.” So, ignoring the conversions between “seconds” and “milliseconds”, the relationship between IOPS and response time is easy…they are inverses of each other:

blog image 1

So:

blog image 2

Implies:

blog image 3

Example:

blog image 4

 

blog image 5

Given this:

The first answer to your question is tied to idle time. Consider the following simple example.

System 1:

For system1, assume that the storage array service time is the only significant contributor to latency.

Client issues an I/O

Storage array services the I/O in 10ms

As soon as the client sees the response, it immediately issues a new I/O

Process continues, with all I/Os serviced by the storage array in 10ms each

How many IOPS is system 1 doing?  Measured at the client, or the storage array, it is doing 100 IOPS with an average response time of 10ms.  Remember the relationship between IOPS and response time from the intro above.

System 2:

For system2, assume that both the client and storage array contribute to latency.

Client issues an I/O

Storage array services the I/O in 5ms.

Client waits 5ms before issuing a new I/O.

Process continues, with all I/Os being serviced by the storage array in 5ms each, and the client waiting 5ms between issuing each I/O.

How many IOPS is system2 doing? If you measure the total number of I/Os completed in a minute, it is completing the same # as System1 and so doing 100 IOPS. But what is the latency? If you measured the latency at the storage array, it would be 5ms. If you measured at the client, it would depend on whether your measurement was above or below the point where the 5ms delay per I/O was inserted, and would see either 5ms or 10ms.

The second answer is tied to “concurrency,” an element we didn’t discuss in the webcast.

Concurrency might also be called “parallelism.”  In the examples above, the concurrency was ‘1’. There was only ‘1’ I/O active at a time.

What if we had more though?  Example:

For both systems, assume that the storage array service time is the only significant contributor to latency.

System 1 (same as above)

Client issues an I/O

Storage array services the I/O in 10ms

As soon as the client sees the response, it immediately issues a new I/O

Process continues, with all I/Os serviced by the storage array in 10ms each

How many IOPS is system 1 doing?  Measured at the client, or the storage array, it is doing 100 IOPS with an average response time of 10ms. Remember the relationship between IOPS and response time from the intro above.

System 2:  Increased concurrency

Client issues two I/Os

Storage array receives both and is able to handle both in parallel, but they take 20ms to complete

As soon as the client sees the response, it immediately issues two new I/Os

Process continues, with all I/Os serviced by the storage array in 20ms each, and the client always issuing two at a time

How many IOPS is system 2 doing? In the aggregate, measured at the client or the storage array, it is doing 100 IOPS. What is the response time? This is where the relationship between IOPS and response time introduced before can break down, depending on what you’re really asking for and it really takes some queuing theory to properly explain this. Ignoring the different kinds of queuing models and simplifying for this discussion, at a high level you could say that the average response time is still 10ms. Some reporting tools are likely to do this. That is misleading however, since each I/O really took 20ms. You’d only see a 20ms response if (a) you actually kept track of the real response times somewhere [which many systems do … they’ll create many different response time buckets and store counts of the # of I/Os with a given response time in a given bucket], or (b) you make use of Little’s Law. Little’s Law would at least give you a better mean response time. We didn’t discuss Little’s Law in the webcast, and I won’t go into much detail of it here (follow the Wikipedia link if you want to know more), but in short, Little’s Law can be rearranged to state that:

blog image 6

or, in our case above:

blog image 7

Q. What is valuable from SPEC’s SFS benchmark in this context?

A. Excellent question. We think very highly of SPEC SFS© 2014 for a variety of reasons, but in this context, a few stick out in particular:

1) The workloads provided by SPEC SFS 2014 are the result of multi-vendor research, actual workload traces, and efforts to target real business use cases. The VDI (virtual desktop infrastructure), SWBUILD (software build), DB (OLTP database) and VDA (video data acquisition) workloads are all distinct workloads with very different I/O patterns, I/O size mixes, and operation mixes. We’ll talk more in another session about these aspects of a workload, but the important element is that each workload is designed to help users understand performance in a specific business context.

2) To encourage folks to think in a business context, SPEC SFS 2014 primarily reports results not in IOPS or MB/S, but in the relevant “business units” at a given response time, along with a measure of overall response time, aptly referred to as ORT (Overall Response Time).  For example, the SWBUILD workload reports load points as “number of concurrent software builds.” MB/s information is still available for those that want to dig in, but what is really important is not the MB/s, but how many software builds, or databases, or virtual desktops, or video streams can a system support, and this is where the reporting focus is.

3) SPEC SFS 2014 recognizes that consistent performance is generally an important element of good performance. SPEC SFS 2014 implements a variety of checks during execution to make sure that each element doing I/O is doing roughly the same amount of I/O and that they are doing the amount of work that was requested of them and aren’t lagging behind.

More information on SPEC SFS 2014 (and SPEC in general) is available from http://www.spec.org

Q. Anything different/special about measuring and testing object storage?

A. Object Storage is a hot topic in the industry right now, but unfortunately fell outside the scope of this fundamentals presentation. We will be examining object storage – and corresponding network protocols – in a later webcast. Stay tuned!

Q. Does Queue depth not matter in response times?

A. We did not define Queue Depth as a term in this webcast because we classified it as an advanced subject. We are planning to address queue depth during a future webcast. In short, the queue depth does play an important role in the performance world, but response time is affected by it only indirectly. When the target Queue is full, the Initiator can’t send any more IOs and waits for the Queue to free up.

The time spent waiting is not response time. Response time measures the time the Target spends processing the IO it was able to receive (place into its queue). However, some scenarios may keep an IO inside a Queue for a long time (due to storage controller doing something else), and thus increase the response time for the waiting IO. In the latter case, increasing the depth of the Queue will increase the response time.

Keep in mind thought that “It Depends”. It depends on where you measure. For example, a client may have no visibility into what is happening at the storage array. It cares about how long the storage array takes to respond … it doesn’t know or care about the differences between the storage array’s view of its service time (how long the storage array is doing work) and its queuing delay / wait time (how long it is waiting to do work).

As far as the client is concerned, this is all response time. It just wants a response. Most “response time” metrics reported by tools / apps/ storage arrays really decompose into a set of true service times and queuing delays, although you may never get to see how those components come together w/o using other tools.

Q. Order is important, in addition to mix ratios. For IOs, sequential vs. random (and specifics on randomness) is important. For file operations, where the protocol is stateful, the sequence of operations can strongly impact performance. Both block and file storage can have caches, which behave very differently depending on ordering and working set size. I suggest including the concept of order in a storage benchmarking intro.

A. These are excellent points, but given the amount of time we had, we had to make some hard choices about what to include and what to drop (or defer). This fell on the chopping block. Someone once told me that the best talks are those where you cry over what you’ve been forced to leave on the cutting room floor. There was a lot of good stuff we couldn’t include. I expect that we will address this idea in the planned discussion of workloads.

Q. The “S” in “OPS” and “IOPS” means Seconds and isn’t a pluralization. Some people make a mistake here and use the term “OP” and “IOP” to mean “1 operation/second” or “1 IO/second”. This mistake happens in this presentation, where “$/OP” is used. The numbers suggest what was intended is “# of dollars to get 1 operations/second.” The term “$/OP” can be misinterpreted to mean “dollars per operation,” which is a different but also relevant metric for perishable storage, like flash.

A. Good catch. We tried hard to be consistent in using “OPS” instead of “op/s” or some variant, when we meant “operations per second”.  There is another Q&A that addresses this. Technically, we slipped up in saying “$/OP” instead of “$/OPS” when we meant, as you noted, “Dollars per (operations per second)”. That said, I think you’ll find that nearly everyone makes this shortcut/mistake. That doesn’t make it right, but does mean you should watch out for it.

Q. Isn’t capacity reported in MiB, Gib etc and bandwidth in MB/s today?

A. Unfortunately, our impression is that capacity is also reported in decimal, at least in marketing literature. All we were trying to point out is that units of measurements must be consistent at all points. Many people mix decimal and binary numbers creating ambiguities and errors (remember that at a TiB vs. TB level, the difference is 10%!) For example, graphs that show IO size in decimal and bandwidth in binary.

Q. Would you also please talk about different SAN protocols and how they differ/impact the performance?

A. There are additional webcasts under development that discuss the storage networking protocols, their performance trade/offs, and how performance can be affected by their use. Due to time constraints we needed to postpone this portion because it deserves more attention than we could provide in this session.

Q. So even though IOPS could be more, the response time could be less…so it’s important to consider both. Is that correct?

A. Absolutely! But just response time is not enough either! You need more load points, and a business objective to truly assess the value of a solution.

 

 

 

New Webcast: Data Center Congestion Control

How do new architectures being deployed in today’ s data centers affect IP-based storage? Find out on September 15th in our next SNIA Ethernet Storage Forum live Webcast, “Data Center Congestion Control,” where we will discuss new architectures and a new congestion control mechanism called CONGA. Developed from research done at Stanford, CONGA is a network-based distributed congestion-aware load balancing mechanism. It is being researched for use in next generation data centers to help enhance IP-based storage networks and is becoming available in commercial switches. This Webcast will dive into:

  • A definition of CONGA
  • How CONGA efficiently handles load balancing and asymmetry without TCP modifications
  • CONGA as part of a new data center fabric
  • Spine-Leaf/CLOS architectures
  • Affects of 40g/100g in these architectures
  • The CONGA impact on IP storage networks

Discover the new data center architectures that will support the most demanding applications such as big data analytics and large-scale web services. As always, this Webcast will be live. I encourage you to register today and bring your questions.