• Home
  • About
  •  

    An FAQ to Make Your Storage System Hum

    May 23rd, 2017

    In our most recent “Everything You Wanted To Know About Storage But Were Too Proud To Ask” webcast series – Part Sepia – Getting from Here to There, we discussed terms and concepts that have a profound impact on storage design and performance. If you missed the live event, I encourage you to check it our on-demand. We had many great questions on encapsulation, tunneling, IOPS, latency, jitter and quality of service (QoS). As promised, our experts have gotten together to answer them all.

    Q. Is there a way to measure jitter?

    A. Jitter can be measured directly as a statistical function of the latency, typically as the Variance or Standard Deviation of the latency. For example a storage device might show an average latency of 5ms with a standard deviation of 1.5ms. This means roughly 95% of the transactions have a latency between 2ms and 8ms (average latency plus/minus two standard deviations), however many storage customers measure jitter indirectly by showing the 99.9%, 99.99%, or 99.999% latency. For example if my storage system has 99.99% latency of 8ms, it means 99.99% of transactions have latency <=8ms and 1/10,000 of transactions have latency >8ms. Percentile latency is an indirect measure of jitter but often easier to calculate or understand than the actual jitter.

    Q. Can jitter be easily characterized for storage, media, and networks.  How and what tools are available for doing this?

    A. Jitter is usually easy to measure on a network using standard network monitoring and reporting tools. It may or may not be easy to measure on storage systems or storage media, depending on the tools available (either built-in to the storage OS or using an external management or monitoring tool).  If you can record the latency of each transaction or packet, then it’s easy to calculate and show the jitter using standard statistical measures such as Variance or Standard Deviation of the latency. What most customers do is just measure the 99.9%, 99.99%, or 99.999% latency. This is an indirect measure of jitter but is often much easier to report and understand than the actual jitter.

    Q.  Generally IOPS numbers are published for a particular block size like 8k write/read size, but in reality, IO request per second could be of mixed sizes, what is your perspective on this?

    A. Most IOPS benchmarks test only one I/O size at a time. Most individual real workloads (for example databases) also use only one I/O size.  It is true that a storage controller or HDD/SSD might need to support multiple workloads simultaneously, each with a different I/O size.  While it is possible to run benchmarks with a mix of different I/O sizes, it’s rarely done because then there are too many workload combinations to test and publish. Some storage devices do not perform well if they must handle both small random and large sequential workloads simultaneously, so a smart storage controller might assign different workload types to different disk groups.

    Q. One often misconfigured parameter is queue depth. Can you talk about how this relates to IOPS, latency and jitter?

    A. Queue depth indicates how many tasks or I/Os can be lined up for a particular controller, interface, or CPU. Having a higher queue depth ensures the CPU (or controller or interface) always has a new task to do as soon as it finishes its current task(s). This can result in higher IOPS because the CPU is less likely to have idle time between transactions. But it could also increase latency because the CPU is more likely to be multi-tasking and context switching between different tasks or workloads.

    Q. Can you please repeat all your examples of tunneling? GRE, MPLS, what others? How can it be IPv4 via IPv6?

    A. VXLAN, LISP, GRE, MPLS, IPSEC.  Any time you encapsulate and send one protocol over another and decapsulate at the other end to send the original frame that process is tunneling. In the case we showed of IPv6 over IPv4, you are taking an original IPv6 frame with its IPv6 header of source address to destination address all IPv6 and sending it over and IPv4 enabled network we are encapsulating the IPv6 frame with an IPv4 header and “tunneling” IPv6 over the IPv4 network.

    Q. I think it’d be possible to configure QoS to a point that exceeds the system capacity. Are there any safeguards on avoiding this scenario?

    A. Some types of QoS allow over-provisioning and others do not. For example a QoS that imposes only maximum limits (and no minimum guarantees) on workloads might not prevent many workloads from exceeding system capacity. If the QoS allows over-provisioning, then you should use system monitoring and alerts to warn you when system capacity has been exceeded, or when any workloads are not getting their minimum guaranteed performance.

    Q. Is there any research being done on using storage analytics along with artificial intelligence (AI) to assist with QoS?  

    A. There are a number of storage analytics products, both third party and storage vendor specific that help with QoS. Whether any of these tools may be described as using AI is debatable, since we’re in the early days of using AI to do much in the storage arena. There are many QoS research projects, and no doubt they will eventually make their way into commercially available products if they prove useful.

    Q. Are there any methods (measurements) to calculate IOPS/MBps in tier capable storage? Would it be wrong metric if we estimate based on medium level, example tier 2 (between 1 and 3)?

    A. This question needs refinement, since tiering is sometimes a cache model rather than a data movement model. And knowing the answer may not actually help! Vendors do have tools (normally internal, since they are quite complex) that can help with the planning of tiered storage.

    By now, we hope you’re not “too proud” to ask some of these storage networking questions. We’ve produced four other webcasts in this “Everything You Wanted To Know About Storage,” series to date. They are all available on-demand. And you can register here for our next one on July 6th where we’ll bring in experts to discuss:

    • Storage APIs and POSIX
    • Block, File, and Object storage
    • Byte Addressable and Logical Block Addressing
    • Log Structures and Journaling Systems

    The Ethernet Storage Forum team and I hope to see you there!

     

     


    Too Proud to Ask Webcast Series Continues – Getting from Here to There Pod

    May 4th, 2017
    As part of the SNIA Ethernet Storage Forum’s successful “Everything You Wanted To Know About Storage But Were Too Proud To Ask” series, we’ve discussed numerous topics about storage devices, protocols, and networks. As we examine some of these topics further, we begin to tease out some subtle nuances; subtle, yet important nevertheless. On May 9th we’ll take on the terms and concepts that affect Storage Architectures as a whole in “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Sepia – Getting from Here to There.”  Continue Reading...

    Storage Expert Takes on Hyperconverged Questions

    April 17th, 2017
    Last month, we were fortunate enough to have Greg Schulz, analyst and founder of Server Storage IO, as a guest speaker at our SNIA Ethernet Storage Forum webcast, “What Does Hyperconverged Mean to Storage.” If you missed it, it’s now available on-demand. Greg fielded many great questions during the live event, but we didn’t have time to get to them all. So here they are:  Continue Reading...

    Managing Your Computing Ecosystem

    April 12th, 2017

      By George Ericson, Distinguished Engineer, Dell EMC; Member, SNIA Scalable Storage Management Technical Working Group, @GEricson

    Introduction

    This blog is part one of a three-part series recently published on “The Data Cortex”, which represents the thoughts and opinions from members of the CTO Team of Dell EMC’s Data Protection Division.  The author, George Ericson, has been actively participating on the SNIA Scalable Storage Management Technical Working Group which has been developing the SNIA Swordfish storage management specification.  Continue Reading…


    SNIA Ranked #2 for Storage Certifications – and Now You Can Take Exams at 900 Locations Worldwide

    March 29th, 2017

    The SNIA Storage Networking Certification Program (SNCP) provides a strong foundation of vendor-neutral, systems-level credentials that integrate with and complement individual vendor certifications. Its four credentials – SNIA Certified Storage Professional; SNIA Certified Storage Engineer; SNIA Certified Storage Architect; and SNIA Certified Storage Networking Expert  – reflect the advancement and growth of storage networking technologies, and establish a uniform standard by which individual knowledge and skill sets can be evaluated, thereby providing employers in the storage industry with an independent assessment of the individual.  Continue Reading…


    SNIA Swordfish is Swimming Fast – Catch Up Now!

    February 27th, 2017

    If you haven’t caught the updates on SNIA SwordfishTM lately, please read on because it’s swimming fast! The new SNIA specification offers a unified approach to managing storage and servers in environments like hyperscale and cloud infrastructures. SNIA’s Scalable Storage Management Technical Work Group (SSM TWG) just announced completion of Version 1.0.3. The new version reflects specification enhancements in multiple areas plus a User’s Guide, multiple new use cases and a new document section.

    “Because SNIA Swordfish is an extension to DMTF’s (Distributed Management Task Force) open industry Redfish™ standard, it specifies the same RESTful interface and utilizes JavaScript Object Notation and Open Data Protocol to help customers integrate solutions within their existing tool chains,” said Don Deel, Chairman, SNIA Storage Management Initiative. “The SSM TWG members responsible for helping develop SNIA Swordfish represent many of the leading companies in the storage industry today, including Broadcom, Dell EMC, HPE, Intel, Microsoft, NetApp, Nimble Storage and VMware.”

    You can also keep up with the latest Swordfish updates by continually visiting the SNIA Swordfish website. If you’re interested in helping shape the future of storage management by getting involved in the development of SNIA Swordfish, please e-mail storagemanagement@snia.org.


    Clearing Up Confusion on Common Storage Networking Terms

    January 12th, 2017

    Do you ever feel a bit confused about common storage networking terms? You’re not alone. At our recent SNIA Ethernet Storage Forum webcast “Everything You Wanted To Know About Storage But Were Too Proud To Ask – Part Mauve,” we had experts from Cisco, Mellanox and NetApp explain the differences between:

    • Channel vs. Busses
    • Control Plane vs. Data Plane
    • Fabric vs. Network

    If you missed the live webcast, you can watch it on-demand. As promised, we’re also providing answers to the questions we got during the webcast. Between these questions and the presentation itself, we hope it will help you decode these common, but sometimes confusing terms.

    And remember, the “Everything You Wanted To Know About Storage But Were Too Proud To Ask” is a webcast series with a “colorfully-named pod” for each topic we tackle. You can register now for our next webcast: Part Teal, The Buffering Pod, on Feb. 14th.

    Q. Why do we have Fibre and Fiber

    A. Fiber Optics is the term used for the optical technology used by Fibre Channel Fabrics.  While a common story is that the “Fibre” spelling came about to accommodate the French (FC is after all, an international standard), in actuality, it was a marketing idea to create a more unique name, and in fact, it was decided to use the British spelling – “Fibre”.

    Q. Will OpenStack change all the rules of the game?

    A. Yes. OpenStack is all about centralizing the control plane of many different aspects of infrastructure.

    Q. The difference between control and data plane matters only when we discuss software defined storage and software defined networking, not in traditional switching and storage.

    A. It matters regardless. You need to understand how much each individual control plane can handle and how many control planes you have from a overall management perspective. In the case were you have too many control planes SDN and SDS can be a benefit to you.

    Q. As I’ve heard that networks use stateless protocols, would FC do the same?

    A. Fibre Channel has several different Classes, which can be either stateful or stateless. Most applications of Fibre Channel are Class 3, as it is the preferred class for SCSI traffic, A connection between Fibre Channel endpoints is always stateful (as it involves a login process to the Fibre Channel fabric). The transport protocol is augmented by Fibre Channel exchanges, which are managed on a per-hop basis. Retransmissions are handled by devices when exchanges are incomplete or lost, meaning that each exchange is a stateful transmission, but the protocol itself is considered stateless in modern SCSI-transport Fibre Channel.

    iSCSI, as a connection-oriented protocol, creates a nexus between an initiator and a target, and is considered stateful. In addition, SMB, NFSv4, ftp, and TCP are stateful protocols, while NFSv2, NFSv3, http, and IP are stateless protocols.

    Q. Where do CIFS/SMB come into the picture?

    A. CIFFS/SMB is part of a network stack.  We need to have a separate talk about network stacks and their layers.  In this presentation, we were talking primarily about the physical layer of the networks and fabrics.  To overly simplify network stacks, there are multiple layers of protocols that run on top of the physical layer.  In the case of FC, those protocols include the control plane protocols (such as FC-SW), and the data plane protocols.  In FC, the most common data plane protocol is FCP (used by SCSI, FICON, and FC-NVMe).  In the case of Ethernet, those protocols also include the control plan (such as TCP/IP), and data plane protocols.  In Ethernet, there are many commonly used data plane protocols for storage (such as iSCSI, NFS, and CIFFS/SMB)


    Containers, Docker and Storage – An Expert Q&A

    December 19th, 2016

    Containers continue to be a hot topic today as is evidenced by the more than 2,000 people who have already viewed our SNIA Cloud webcasts, “Intro to Containers, Container Storage and Docker“ and “Containers: Best Practices and Data Management Services.” In this blog, our experts, Keith Hudgins of Docker and Andrew Sullivan of NetApp, address questions from our most recent live event.

    Q. What is the major challenge for storage in containerized environment?

    A. Containers move fast. Users can spin up and spin down containers extremely quickly. The biggest challenge in production-bound container environments is simply keeping up with the movement of data.

    Docker Engine does not delete base container images when the container is shut down. Likewise, Registry assumes you’ve got unlimited storage on hand. For containers that push frequent revisions (as would be the case in a continuous delivery environment), that leads to a lot of orphaned container images that can fill up all available storage if left unchecked.

    There are some community-led scripts that will help to keep things in control. That’s the beauty of community-led technology.

    Q. What about the speed of retrieving the data from storage?

    A. That’s where being a solid storage architect comes in. Every storage system has different strengths and weaknesses, so it’s important to engineer your solution to fit your performance goals. Docker containers are running on the main kernel of the host system. IO is not constrained by abstraction, as in the case of virtual machines. Rather, it is constrained more by density – hundreds of containers on a host can push massive IOPS, so you want your pipes fat and data sources close to the host systems.

    Q. Can you expand on moving Docker Volumes from On-Premise bare metal to Cloud Service Providers? Data Migration? Encryption? 

    A. None of these capabilities are built-in to Docker Engine. We rely on external storage systems to provide those features. Private-to-cloud replication is primarily a feature of software-based companies, like Portworx, Blockbridge, or Hedvig. Encryption and migration are both common features across other companies as well. Flocker from ClusterHQ is a service broker system that provides many bolt-on features for storage systems they support. You can also use community-supplied services like Ceph to get you there.

    Q. Are you familiar with “Flocker” that apparently is able to copy persistent data to another container? Can share your thoughts?

    A. Yes. ClusterHQ (makers of Flocker) provide an API broker that sits between storage engines and Docker (and other dynamic infrastructure providers, like OpenStack), and they also provide some bolt-on features like replication and encryption.

    Q. Is there any sort of feature in the volume plugins that allows a persistent volume to re-connect to a container if the container is moved across multiple hosts?

    A. There’s no feature in plugins to cover that specifically. The plugin API is very simple. In practice, what you would do is write your plugin to expose volumes to Docker Engine on every host that it’s possible to mount that volume. In your container specification, whether it’s a Compose file, DAB file, or what have you, specify the name of your volume. Wherever that unique name is encountered, it will be mounted and attached to the container when it’s re-launched.

    If you have more questions on containers, Docker and storage, check out our first Q&A blog: Containers: No Shortage of Interest or Questions.

    I also encourage you to join our Containers opt-in email list. It will be a good way to keep up with all the SNIA Cloud is doing on this important technology.


    No Shortage of Container Storage Questions

    November 29th, 2016

    We covered a lot of ground in out recent SNIA Ethernet Storage Forum webcast, “Current State of Storage in the Container World.” We had a technical discussion on why containers are so compelling, how Docker containers work, persistent shared storage and future considerations for container storage. We received some great questions during the live event, and as promised, here are answers to them all.

    Q. Docker cannot be installed on bare metal and requires a base OS to operate upon right?

    A. That is correct.

    Q. Does the application code need to be changed so that it can “fit and operate” in a container?

    A. No, the application code does not need to change. The challenge most people face when migrating an application to a container is how to maintain the application’s state. One of the motivations for this webcast was to explain how to allow applications within containers to persist data. Hopefully the Docker Volume construct will meet your needs.

    Q. Seems like containers share one OS/kernel… That suggests that there is just one OS in the “containerized” server… And yet there is still mention of hypervisor (or at least Hyper-V)… Can you clarify? If the containers share an OS, is a hypervisor needed?

    A. You are correct, containers are designed to share a single kernel; therefore a hypervisor is not required to run containers. Having said that, VMware and Microsoft both offer options that run a single container in its own virtual machine (running a minimal operating system).

    Q. Can the Docker Hub be compared to something like the GitHub?

    A. Yes, that is a great analogy. Docker Hub (hub.docker.com) is to container images as GitHub (github.com) is to source code.

    Q. What are the differences between the base and the host image?

    A. If you’re referring to the webcast slides; the box labeled “Base Image” is the first layer in an image. The box labeled “Host OS” is not a layer, but represents the hosting operating system (kernel) that is shared by the containers.

    Q. So there is a separate root per container?

    A. In most cases the image will provide a root, therefore each container will have a separate root. This is made possible by a kernel feature called namespaces. Alternatively, Docker does allow you to share a directory between the host operating system and any number of containers though.

    Q. If Deduplication is enabled on the storage LUNs, won’t that affect the performance of the containers?

    A. Well implemented data reduction features (compression and deduplication) should have little to no effect on performance and should provide significant benefit by reducing the space required to store containers.

    Q. Can you please quickly review the concept of copy-on-write with one or two sentences to boil it down?

    A. How the copy-on-write works depends on whether the driver is file or block based. For the sake of simplicity, let’s assume a file-based implementation. Since the image layers are read-only, we need an area to store the changes that the container has made. This area is the copy-on-write layer. When a process reads a file that has not been modified, the file is read from one of the read only layers. When that file is modified and needs to be written back to disk, the new file is written to the copy-on-write layer as is the metadata that describes the file. The next time this file is read, it is read from copy-on-write layer. The graph driver is responsible for this functionality and varies by implementation.

    Q. Can network locations be used for /data? If yes, how does the Docker Engine manage network authentication for the driver?

    A. Yes, network locations can be used. The best practice is to use the Local Volume Driver, where you can pass in the required authentication via the options (see slide 15). Alternatively, the network location can be mounted on the host operating system and exposed to containers (see slides 21 & 22).

    Q. Is this where VAAI like primitives would get implemented?

    A. VAAI defines several in-band primitives.  The Docker Volume plug-in framework is completely out-of-band.  There can be some overlap in features though.  For example, the XCOPY primitive can be used to offload ‘copy jobs’ to an array.  If the vendor chooses to do so, a ‘copy job’ can be offloaded through the Docker Volume plug-in as well.  For example, a plug-in might implement a “clone” option that provides this service.

    Q. Could you share some details about Kubernetes storage ? Persistent volumes and the difference from Docker volumes? Also, what is your perspective of Flocker?

    A. Kubernetes has the concept of persistent storage. This abstraction is also called a volume. In addition, Kubernetes provides a plug-in option as well. The Kubernetes implementation predates the Docker Volume and is currently not compatible.

    Q. Comment on mainframe: IBM runs Linux on zSeries, therefore can run Linux Docker containers.

    A. Thanks, that’s good to know.

    Q. How many operating systems changes on the x86 platform? How many on the mainframe platform? Can x86 architecture run the same code/OS from 40 years ago? Docker on mainframe?

    A. The mainframe architecture has been very solid and consistent for many years.

    Q. What is a big challenge for storage in container environment?

    A. I don’t think storage has a challenge in the container environment. I think, with a properly implemented Docker Volume Plug-in, storage provides a solution to the persistent shared storage need in a container environment.

    Q. Do you ever look into RexRay or VMDK storage drivers?

    A. Yes, these are both examples of Docker Volume plug-in implementations.

     


    The Next Step for Containers: Best Practices and Data Management Services

    October 25th, 2016

    In our first SNIA Cloud webcast on containers, we provided a solid foundation on what containers are, container storage challenges and Docker. If you missed the live event, it’s now available on-demand. I encourage you to check it out, as well as our webcast Q&A blog.

    So now that we have set the stage and you’ve become acquainted with basic container technologies and the associated storage challenges in supporting applications running within containers in production, we will be back on December 7th. This time we will take a deeper dive into what differentiates this technology from what you are used to with virtual machines. Containers can both complement virtual machines and also replace them, as they promise the ability to scale exponentially higher. They can easily be ported from one physical server to another or to one platform—such as on-premise—to another—such as public cloud providers like Amazon AWS.

    At our December 7th webcast, “Containers: Best Practices and Data Management Services,” we’ll explore container best practices to address the various challenges around networking, security and logging. We’ll also look at what types of applications more easily lend themselves to a microservice architecture versus which applications may require additional investments to refactor/re-architect to take advantage of microservices.

    On December 7th, we’ll be on hand to answer your questions on the spot. I encourage you to register today. We hope you can attend!