• Home
  • About

    New Webcast: Hierarchical Erasure Coding: Making Erasure Coding Usable

    May 11th, 2015

    On May 14th the SNIA-CSI (Cloud Storage Initiative) will be hosting a live Webcast “Hierarchical Erasure Coding: Making erasure coding usable.” This technical talk, presented by Vishnu Vardhan, Sr. Manager, Object Storage, at NetApp and myself, will cover two different approaches to erasure coding – a flat erasure code across JBOD, and a hierarchical code with an inner code and an outer code. This Webcast, part of the SNIA-CSI developer’s series, will compare the two approaches on different parameters that impact the IT business and provide guidance on evaluating object storage solutions. You’ll learn:

    • Industry dynamics
    • Erasure coding vs. RAID – Which is better?
    • When is erasure coding a good fit?
    • Hierarchical Erasure Coding- The next generation
    • How hierarchical codes make growth easier
    • Key areas where hierarchical coding is better than flat erasure codes

    Register now and bring your questions. Vishnu and I will look forward to answering them.

    Ethernet Connected Drives Webcast Q&A

    April 1st, 2015

    At our recent SNIA ESF Webcast “Visions for Ethernet Connected Drives” Chris DePuy of the Dell’Oro Group discussed potential benefits, use cases, and challenges of Ethernet connected drives. It’s not surprising that we had a lot of questions given that this market is in its infancy. As promised during our live event, here are answers to questions from the audience. If you think of additional questions, please feel free to comment on this blog.

    Q. Will this also mandate new protocols to be used for storage like RDMA?

    A. We did not receive any feedback from the technology companies we surveyed about RDMA specifically, but new protocols very well may be required to make effective and cost-effective use of eDrives. Storage systems offer many capabilities beyond just standard Ethernet networking and new protocols may be required to deliver those as well as new services in this new storage system architecture.

    Q. Is White Box bought primarily by cloud customers?

    A. Yes, in our research, substantially all purchases of White Box storage devices are purchased by cloud service providers.

    Q. I may have missed it but aren’t we really talking about the HGST Open Ethernet Drive Architecture and the Seagate Kinetic Open Storage Platform? Both use Ethernet interfaces but HGST puts Debian on each HDD and Seagate has a key-value API for applications to directly write to the HDD. The actual deployment of these Ethernet HDDs would be in Ethernet Layer 2 switched backplanes in a 4U chassis being built by Supermicro, Xyratex (Seagate) and several others.

    A. Given this was a presentation made to a neutral industry association; we chose not to discuss specific vendors. To answer your questions, yes, we are talking about Ethernet Connected Drives from HGST and Seagate, but we also integrated feedback from other suppliers of related technology, as well, including Toshiba. To your other question, yes, we have seen enclosures with embedded Ethernet switch technology connecting to the Ethernet drives from various other vendors. In our research for this webinar, we have also seen Ethernet switch technology embedded into enclosures that don’t use Ethernet connected drives, as well, but these would have systems to convert traditional HDD interfaces, but the network would see Ethernet as the outward facing interface.

    Q. Doesn’t that take space on the drive when you put CPU and more memory?

    A. We asked this question, too, but learned that there is sufficient space to maintain the HDD and all the parts in the same form factors we historically have known.

    Q. What can one implement in these internal processors used in Ethernet drives? For instance can we run erasure codes such as Jerasure or XOR based codes yet do the basic tasks needed for the Ethernet drives?

    A. We did not receive specific feedback during the surveys for this webinar about where one would run erasure coding. Generally, though, the decision will lead to design considerations for which CPU and memory choices would be made for each drive, which in turn would change economics as to whether the overall system is affordable/feasible. Note that doing erase coding on the drives increases the amount of intelligence required on the drive, for the arithmetic, for the requisite peer-to-peer networking, and for maintaining state information about other relevant drives required for completing the erasure codes. New software to manage all this would be required as well.

    Q. Can I ran Ceph OSD plus Erasure code based on open source Jerasure in the Ethernet connected drive internal ARM processor?

    A. We did not receive specific feedback during the surveys for this webinar about where one would run erasure coding. Generally, though, the decision will lead to design considerations for which CPU and memory choices would be made for each drive, which in turn would change economics as to whether the overall system is affordable/feasible.

    Q. Erasure coding is more complex compared to RAID, how do I implement erasure coding with Ethernet drives?

    A. We did not receive specific feedback during the surveys for this webinar about where or how one would run erasure coding.

    Q. Does the economics assume including the cost of the Ethernet Ports? If so are you assuming unmanaged or managed Ethernet ports?

    A. In the slides, we portrayed a simplistic capital spending model that considered just servers and hard drives. In reality, there are many other factors that play into both CAPEX and OPEX comparisons between conventional and Ethernet Connected Drive architectures. Examples include the cost differential between using Ethernet switching versus traditional HDD interfaces and how much memory and CPU is needed to support a particular use case.

    Q. How does the increased number of network ports needed influence this price equation?

    A. In the slides, we portrayed a simplistic capital spending model that considered just servers and hard drives. In reality, there are many other factors that play into both CAPEX and OPEX comparisons between conventional and Ethernet Connected Drive architectures. Examples include the cost differential between using Ethernet switching versus traditional HDD interfaces, how much memory and CPU is needed to support a particular use case.

    Q. I’m confused how Power and Cooling could be saved. If you need X number of drives to store data then you would need the same number of drives in the connected drive model wouldn’t you? Perhaps more if the e-drives lack efficiency features?

    A. The general point is that proponents of Ethernet Connected Drives argue there won’t be a need for storage-oriented servers, and so the savings would result from there being fewer of them consuming power.

    Q. I guess the protocol would change commanding the drives?

    A. There is no single approach that has been agreed upon. During the presentation, we said there are multiple technical approaches, one of which includes using Key Value APIs, and the other is to install an Operating System onto each drive that could run whatever you want on it.

    Q. Are Ethernet connected drives JBODS on Ethernet?

    A. Yes, that is the way we view it, too. Sometimes they are even called, “eBODS” where the traditional JBOD controller is replaced with an Ethernet switch.

    Q. How is data protected–i.e., RAID or other mechanism.

    A. In our surveys, we learned that the most common method would be to leverage erasure coding that is commonly associated with object oriented storage systems.

    Q. How will photonics impact this concept?

    A. Photonics is involved in data center Ethernet for higher speed communications. In our surveys, we did not encounter a single instance of a vendor discussing photonics at the Ethernet Connected Drive. For HDDs, 1GbE provides more than enough bandwidth for the drive.

    Q. Are the servers today connecting the storage just dumb boxes that expose storage? Don’t they do processing as well? With Ethernet drives we’re removing that computational node it seems.

    A. This is a very good point. Today’s conventional storage systems have significant computing capabilities – we think these could be used to do computing as well as performing storage-oriented tasks as they do primarily today. We expect that in the future, the servers that are packaged in external storage systems will be organized in a way that allows customers to run storage functions as well as more traditional purposes that would allow us to just call them ‘servers.’ In fact, there are several startups that are popularizing this idea.

    Q. When it comes to HDD manufacturers there are only three left…WD (HGST), Seagate (Samsung) and Toshiba. When it comes to SSD or flash drives there are more manufacturers. Seagate is using a dual Serial Gigabit Media Independent Interface (SGMII) on its Kinetic HDDs. What other ways are there to do Ethernet on an HDD?

    A. We did not receive any feedback from the technology companies we surveyed about this topic. Note, that SNIA recently started an “Object Drive Technical Work Group” to help drive standards for Ethernet-connected drives. If this topic is of interest, we encourage you to join that TWG.

    Q. Have you seen any indication of a ratio between CPU power and Memory vs. the size of the storage? What is the typical White Box? EG Intel (version?) Memory (in GB?) Storage (in TB?)

    A. The uses cases we presented are based on vendor-supplied viewpoints that implicitly incorporate the answers to your question, but don’t specifically address it. What we learned is that in these use-cases, there is an assumed positive TCO savings, but not every vendor agrees with these calculations – again without providing specifics like you are asking about.

    Q. How can you eliminate the object servers? You still need that functionality somewhere if you ever hope to find the data again, or protect it… You may move away from dedicated Object servers but that code has to run somewhere thus saying they are eliminated is wrong…

    A. This is a very good point. The use cases offered to us suggest that this code would either reside in the Ethernet Connected Drive, or on the server running an application itself, or both. This is why we made the point that the applications would have to be re-written to take advantage of the proposed new architecture.

    Q. Is the cost of Ethernet HDDs expected to be the same as current HDDs and why?
    Ethernet HDDs have more processing capabilities so shouldn’t they cost more (is that 10% more?)

    A. Correct. If more components were added to an otherwise identical HDD, then, the cost would be greater. This is paramount to one of the main dissenting views we learned about during the survey process. It does raise the question as to whether it makes sense to deliver underlying HDDs that are NOT identical to traditional HDDs to offset costs somehow – maybe with lower speeds, or whether these Ethernet Connected Drives would be sold at lower margins by the HDD vendors.

    Q. Do Server power TCO numbers take account of lower power consumption of next generation servers as indicated by Intel?

    A. We do not know what version of servers was used in these vendor-supplied TCO calculations.

    Q. If you are planning to offload processing to the processor on the HDD then you are assuming that the HDD vendors will expose those drives for user access – is there any evidence of this?

    A. There is no single approach that has been agreed upon, and therefore no single answer to this question. During the presentation, we said there are multiple technical approaches, one of which includes using Key Value APIs, and the other is to install an Operating System onto each drive that could run whatever you want on it.

    Q. How is redundancy handled on eHDD based appliance… aka a drive fails?

    A. The custom-built software would presumably be developed to handle this. And obviously, the eHDD has to add enough CPU and memory to manage all this — which of course adds cost.

    Q. It seems that with the CPUs on each drive, the archive, object or whatever the application would need to be rewritten to support this specific method of parallel processing. Is anyone doing this now?

    A. During the survey process, we learned that many applications were being ported to this environment, some of which apparently do take advantage of parallel computing. Given we were planning to immediately divulge information to the public, we were not presented with details.

    Q. What is nearline storage?

    A. This is the way it was described to us by some of the technology companies we surveyed, but the meaning is that it represents a more traditional storage system you might see in an enterprise where many drives are stopped (not rotating) and are turned on when a request comes in.

    Q. Why are analytics specifically optimized for Ethernet attached storage devices – the presenter seems to anticipate that processing can be pushed onto the drive, and if this is the case why can’t other drive interfaces do this – PCIe attached storage should be even more amenable for this.

    A. The presenter was sharing views compiled by the responses of various technology companies during a series of interviews conducted before this webcast. Analytics is a large, growing industry today and exists without Ethernet Connected Drives. Some of the companies surveyed offered the view that putting processing capabilities into each HDD may enhance the overall system’s performance.

    Q. Can the presenter comment on the value of scale-out for E-Drives, versus legacy SAN scale out?

    A. Some of the technology companies interviewed by the presenter suggested that systems based on Ethernet Connected Drives may scale to larger capacities than traditional architectures on the basis that the storage-oriented servers no longer present an impediment to scaling.

    Q. Just as object storage addresses RAID smart drives could provide the meta data needed by the swift controllers to do deduplication, or the controller may do deduplication as a pre-process or post process like we have seen on NetApp or Data Domain evolve over years.
    If we use optic connections the port density issue is resolved and this end up looking like something from 2001 (the movie) correct?

    A. Photonics is involved in data center Ethernet for higher speed communications. In our surveys, we did not encounter a single instance of a vendor discussing photonics at the Ethernet Connected Drive. As noted above, 1GbE is more than sufficient for eHDDs.

    Q. FYI…48TB Capacity Kinetic Storage Appliance $5000.00 street price
    White Box 2U Dual Xeon storage server with 48TB RAW…$8000 street price

    A. Thank you for sharing! You may have noticed we did not mention specific vendors during the presentation – perhaps others viewing your question will take note of your viewpoint.

    Q. To the extent that hyperscale cloud environments have servers with open sockets or slots for direct attach storage of drives, how are there financial savings to connect through Ethernet instead of direct attach? Will servers of the future remove these slots and sockets? Are there other cluster wide benefits with regards to performance for data accessed directly through the network instead of through the server with the local storage, when the data is accessed by a large number of servers?

    A. Hyperscalers are buying storage-related hardware at a fraction of the price that systems OEMs are selling them for mainly because they do not demand software that enterprises value so much – they leverage open source and make their own for their very specific needs. If you look at the slide about the ‘White Box Effect’ in the presentation, you get a sense for just how much less they pay – or anyone else who buys a White Box pays – but make no mistake about it, these devices don’t do much unless you integrate them into a working system intended to store and safely retain data. To answer your question, we observe that these hyperscalers are such large customers of components and systems that they could choose to request custom hardware designs with customized specifications – more of this kind of interface, fewer of that kind, etc. As an analogy, in the networking industry, one of the largest buyers of the underlying network technology like processors, Ethernet interfaces and optics are the handful of hyperscalers – and in fact these customers are larger than most vendors.

    Q. Why would each drive not know about other drives storage? How does this differ from existing storage servers?

    A. In the traditional storage architecture, a central system is involved. The dissenting viewpoint we received from some of the technology companies we interviewed was a counterpoint that may exist only under certain design scenarios. Our view is that if a system is designed with the goal in mind to make each drive aware of each other’s contents, then that is technically possible of course. But at a cost, as you add CPU, memory, and software to do this.

    Q. I can see flash and Wi-Fi Ethernet connected drives providing Internet of Things storage for values that can be harvested impendent of when the value was stored. Thus getting a low power system that could live off of USB type power or power over Ethernet being why corporations would look at this.

    A. I think the point you are making is that flash consumes very little power, right? This revolutionary technology (lets just say, non volatile memory to keep it general) is causing all kinds of disruptive changes in the storage industry, and as costs come down for NVM, all kinds of different scenarios become possible.

    Q. Cost model might need to include a simpler lower cost local server with the Ethernet drive clusters by adding a cost item to the left side of their equation, comments?

    A. Agreed – the equation we provided was simplistic and could be expanded to include many other terms and other simultaneous equations as well. We just thought that providing it would frame the discussion on the slide instead of just saying it verbally.

    Q. Obviously, it will be higher, but how do you envision this changing Ethernet bandwidth requirements? Will Ethernet connected drives only become a reality once 40, 25, 100 Gb becomes the mainstream for Ethernet networks?

    A. Network bandwidth needs will be a function of how the servers interact with the drives – I can see scenarios where traffic might be kept more locally, or where asking each drive for ‘the answer’ instead of ‘all of its data’ so it can be processed in a server, might actually cause your premise that traffic increases. The point I’m getting to is that it depends on what applications these Ethernet Connected Drives are used for. Nevertheless, Metcalfe’s law (all available bandwidth installed will be consumed) has not yet been repealed publicly that I’m aware of.

    Q. With Ethernet connected drives are we still stuck with the fundamental issue that HDD are still transactionally inefficient and thus while a novel concept the basic drive unless improvements are made in transactional efficiencies are improved remain the bottleneck?

    A. We think HDDs will co-exist with Flash/NVM for a very long time. Some very smart engineers are working to make this co-existence increasingly efficient, taking into account the strengths and weaknesses of both storage media.



    New Webcast: Visions For Ethernet Connected Drives

    February 20th, 2015

    Mark your calendar for March 25th as SNIA-ESF, together with the Dell’Oro Group, will be hosting a live Webcast, “Visions for Ethernet Connected Drives.” The arrival of mass-storage services, the emergence of analytics applications and the adoption of object storage by the cloud-services industry have provided an impetus for new storage hardware architectures. One such underlying hardware technology is the Ethernet connected hard drive, which is in early stages of availability.

    Please join us on March 25th to hear Chris DePuy, Vice President of Dell’Oro Group share findings from interviews with storage-related companies, including those selling hard drives, semiconductors, peripherals and systems, as he will present some common themes uncovered, including:

    • What system-level architectural changes may be needed to support Ethernet connected drives
    • What capabilities may emerge as a result of the availability of these new drives
    • What part of the value chain spends the time and money to package working solutions

    We will also present some revenue and unit statistics about the storage systems and hard drive markets and will discuss potential market scenarios that may unfold as a result of the object storage and Ethernet connected drive trends.

    I’ll be hosting the event and together with Chris, taking your questions. I hope you’ll join us.


    OpenStack Cloud Storage Q&A

    January 21st, 2015

    More than 300 people have seen our Webcast “OpenStack Cloud Storage.” If you missed it, it’s now available on demand. It was a great session with a lot of questions from attendees. We did not have time to address them all – so here is a complete Q&A. If you think of any others, please comment on this blog. Also, mark your calendar for January 29th when the SNIA Cloud Storage Initiative will continue its Developers Tutorial Series with a live Webcast on OpenStack Manila.

    Q. Is it correct to say that one can use OpenStack on any vendor’s hardware?

    A. Servers, yes, assuming the hardware is supported by Linux. Block storage requires a driver, and not all vendor systems have Cinder drivers.

    Q. Is there any OpenStack investigation and/or development in the storage networking area?

    A. Cinder includes support for FC and iSCSI. As of Icehouse, the FC support also includes auto-zoning. 

    Q. Is there any monetization going on around OpenStack, like we see for distros of Linux?

    A. Yes, there are already several commercial distributions available.

    Q. Is erasure code needed to get a positive business case for Swift, when compared with traditional storage systems?

    A. It is a way to reduce the cost of replication. Traditional storage systems typically already have erasure coding, in the form of RAID. Systems without erasure coding end up using more storage to achieve the same level of protection due to their use of 3-way replication.

    Q. Is erasure code currently implemented in the current Swift release?

    A. No, it is a separate development stream, which has not been merged yet.

    Q. Any limitation on the number of objects per container or total number of objects per Swift cluster?

    A. Technically there are no limits. However, in practice, the fact that the containers are implemented using SQL lite limits their size to a million or maybe a few million objects per container. However, due to the way that Swift partitions its metadata, each user can also have millions of containers, and there can be millions of users. So practically speaking, the total system can support an unlimited number of objects.

    Q. What are some of the technical reasons for an enterprise to select Swift vs. Amazon S3? In other words, are they pretty much direct alternatives, or does each have its own preferred use cases?

    A. They are more or less direct alternatives. There are some minor differences, but they are made for the same purpose. That said, S3 is only available from Amazon. There are some S3 compatible systems, but most of those also support Swift. Swift, on the other hand, is available open source or from multiple vendors. So if you want to run it in your own data center, or in a public cloud other than Amazon, you probably want Swift.

    Q. If I wanted to play around with Open Stack, Cinder, and Swift in a lab environment (or in my basement), what do I need and how do I get started?

    A. openstack.org is the best place to start. The “devstack” distribution is also good for playing around.

    Q. Will you be showing any features for Kilo?

    A. The “Futures” I showed will likely be Kilo features, though the final decision of what will be in Kilo won’t happen until just before release.

     Q. Are there any plans to implement data encryption in Cinder?

    A. I believe some of the back ends can support encryption already. Cinder is really just a provisioning and orchestration layer. Encryption is a data path feature, so it would need to be implemented in the back end.

    Q. Some time back I heard OpenStack Swift is going to come up with block storage as well, any timeline for that?

    A. I haven’t heard this, Swift is object storage.

    Q. The performance characteristics of Cinder block services can vary quite widely. Is there any standard measure proposed within OpenStack to inform Nova or the application about the underlying Cinder block performance characteristics?

    A. Volume types were designed to enable clouds to provide different levels of service. The meaning of these types is up to the cloud administrator. That said, Cinder does expose QoS features like minimum/maximum IOPS.

    Q. Is the hypervisor talking to a cinder volume or to (for example) a NetApp or EMC volume?

    A. The hypervisor talks to a volume the same way it does outside of OpenStack. For example, the KVM hypervisor can talk to volumes through LVM, or can mount SAN volumes directly.

    Q. Which of these projects are most production-ready?

    A. This is a hard question, and depends on your definition of production ready. It’s hard to do much without Nova, Glance, and Horizon. Most people use Cinder too, and Swift has been in production at HP and Rackspace for years. Neutron has a lot of complexity, so some people still use Nova network, but that has many limitations. For toy clouds you can avoid using Keystone, but you need it for a “production” cluster. The best way to get a “production ready” OpenStack is to get a supported commercial distribution.

    Q. Are there any Plugfests?

    A. No, however, the Cinder team has a fairly extensive and continuous integration process that drivers need to pass through. Swift does not because it doesn’t officially “support” any plugins.




    Object Storage 201 Q&A

    October 29th, 2014

    Now available on-demand, our recent live CSI Webcast, “Object Storage 201: Understanding Architectural Trade-Offs,” was a highly-rated event that almost 250 people have seen to date. We did not have time to address all of the questions, so here are answers to them. If you think of additional questions, please feel free to comment on this blog.

    Q. In terms of load balancers, would you recommend a software approach using HAProxy on Linux or a hardware approach with proprietary appliances like F5 and NetScaler?

    A. This really depends on your use case. If you need HA load balancers, or load balancers that can maintain sessions to particular nodes for performance, then you probably need commercial versions. If you just need a basic load balancer, using a software approach is good enough.

    Q. With billions of objects what Erasure Codes are more applicable in the long term? Reed Solomon where code words are very small resulting in many billions of code words or Fountain type codes such as LDPC where one can utilize long code words to manage billions of objects more efficiently?

    A. Tracking Erase Code fragments have a higher cost than replication but the tradeoff is higher HDD utilization. Using Rateless coding lowers this overhead because each Fragment has equal value. Reed Solomon requires knowledge of fragment placement for repair.

    Q. What is the impact of having HDDs of varying capacity within the object store?  Does that affect hashing algorithms in any way?

    A. The smallest logical storage unit is a Volume. Because Scale-Out does not stripe volumes there is no impact. Hashing, being used for location would not understand volume size, so a separate Database is used, on a volume basis, to track open space. Hashing algorithms can be modified to suit the underlying disk. The problem is not so much whether they can be designed a priority for the underlying system, but really the rigidity they introduce by tying placement very tightly with topology. That makes failure / exception handling hard.

    Q. Do you think RAID6 is sufficient protection with these types of Object Storage Systems or do we need higher parity based Erasure codes?

    A. RAID6 makes sense for a Direct Attached storage solution where all drives in the RAID Set can maintain sync. Unlike filesystems (with a few exceptions) Scale-Out Object Storage systems are “Storage as a workload” systems that already have protection as part of the system. So the question is what data protection method is used on solution x as apposed to solution y. You must also think about what you are trying to do.  Are you trying to protect against a single disk failure, or are you trying to protect against a node failure, or are you trying to protect against a site failure. Disk failures – RAID is great, but not if you’re trying to do node failure or site failure. Site failure is an EC sweet spot, but hard to solve from a deployment perspective.

    Q. Is it possible to brief how this hash function decides the correct data placement order among the available storage nodes?

    A. Take a look at the following links: “http://en.wikipedia.org/wiki/Consistent_hashing“; https://swiftstack.com/openstack-swift/architecture/

    Q. What do you consider to be a typical ratio of controller to storage nodes? Is it better to separate the two, or does it make sense to consolidate where a node is both controller and storage?

    A. The flexibility of Scale-Out Object Storage makes these two components independently scalable. The systems we test all have separate controllers and storage nodes so we can test this independence. This is also very dependent on the Object Store technology you use. We know of some object stores where there is a 1GB RAM / TB of data, while there are others that use 1/10 of that.  The compute is dependent on whether you are using erasure coding, and what codes. There is no one answer.

    Q. Is the data stored in the Storage depository interchangeable with other vendor’s controller units? For instance, can we load LTO tapes from vendor A’s library to Vendor B’s library and have full access to data?

    A. The data stored in these systems are part of the “Storage as a workload” principle. So system metadata used to track Objects stored as a function within the Controller. I would not expect any content stored to be interchangeable with another system architecture.

    Q. Would you consider the Seagate Kinetic Open Storage Platform a radical architectural shift in how object storage can be done?  Kinetic basically eliminates the storage server, POSIX and RAID or all of the “busy work” that storage servers are involved in today.

    A. Ethernet drives with key value interface provides a new approach to design object storage solution. It is yet to be seen how compelling they are for TCO and infrastructure availability.

    Q. Will the inherent reduction in blast radius by the move towards Ethernet-interface HDDs be a major driver of the Ethernet HDD in object stores?

    A. Yes. We define Blast Radius by a compute failure that impacts access to connected hard drives. As we lower the Number of Connected Hard Drives to compute the Blast Radius is reduced. For Ethernet drives, you may need redundant Ethernet switches to minimize the blast radius.  Blast radius can be also minimized with intelligent data placements with software as well.

    New Webcast: Object Storage – Understanding Architectural Trade-Offs

    September 30th, 2014

    The Cloud Storage Initiative (CSI) is excited to announce a live Webcast as part of the upcoming BrightTalk Cloud Storage Summit on October 16thObject Storage 201: Understanding Architectural Trade-Offs. It’s a follow-up to the SNIA Ethernet Storage Forum’s Object Storage 101: Understanding the What, How and Why behind Object Storage Technologies.

    Object-based storage systems are fast becoming one of the key building blocks for a cloud storage infrastructure. They address some of the shortcomings and provide an alternative to more traditional file- and block-based storage for unstructured data.

    An object storage system must accommodate growth (and yes, the rumors are true – data growth is a huge and accelerating problem), be flexible in their provisioning, provide support multiple geographies and legal frameworks, and cope with the inevitable issues of resilience, performance and availability.

    Register now for this Webcast. Experts from the SNIA Cloud Storage Initiative will discuss:

    • Object Storage Architectural Considerations
    • Replication and Erasure Encoding for resilience
    • Pros and Cons of Hash Tables and Key-Value Databases
    • And more…

    This is a live presentation, so please bring your questions and we’ll do our very best to answer them. We hope you’ll join us on October 16th for an unbiased, deep dive into the design considerations for object storage systems.


    Object Storage 101 – Questions and Answers

    June 19th, 2014

    At our recent live ESF Webcast, “Object Storage 101,” we talked about the what, how, and why behind storage technologies. Over 200 people attended the event. If you missed it, it’s now available on-demand. It was an interactive session and we did not have time to address all the questions, so here are answers to them all. If you think of additional questions, please feel free to comment on this blog.

    Q. Would Object Storage be a feasible solution for only the nearline storage tier?

    Typically Yes. If we think about the latency needed for real-time transactions, these are best served using a cache storage tier such as NAND or large arrays of RAM. Object stores are excellent methods to store and retrieve large data sets within single/multiple containers. Note: most systems support offset reads so you don’t need to access an entire object to get to the section of interest.

    Q. Where is the index to find the location of an object that is stored? Is it stored locally or stored distributedly or replicated among each clusters?

    Storage of the Index or Metadata of objects that are stored, if used, typically is replicated throughout the system. Also, if the Metadata is lost, typically, these can be re-built as a maintenance function.

    Q. How is the object stored/broken up? Aside from being stored by metadata (like name, size, etc) … what is the process of the fragmentation…breaking it up …as described during this erasure coding segment?  Once it’s assigned some unique identifier … ie. an x-ray picture…. how is it addressed? (if not by block/bit/byte/level)?

    Currently, Objects are stored using one of two methods of data protection either Replication or Erasure Coding. Some systems use both. That said, there are several algorithms used today to Erasure Code protect Objects. When using Reed-Solomon methods, you need to specify the number of “Data” Fragments and the number of the “Parity” fragments that will be created. The Size of each “Data” fragment is closely related to the Object size divided by the number of “Data” fragments requested. Each “Parity” fragment will be same size of each of the “Data” fragments created. The protected Object size is the sum of the “Data” fragments plus the “Parity” fragments created. Each of these fragments (Data and Parity) is stored on a different server for the purpose of avoiding a single point failure. The application that created the Object that will be accessing the Object store is responsible for keeping track of the ID of the Object and the Namespace the ID was stored in. Typically the Application will create an ID however, when an Application “Puts” an Object using an existing ID, the older stored Object using that same ID is overwritten. Typically, access into an Object Store using a RESTful Interface using commands like “Put, Get, Delete, List” over HTTP.

    Q. Will Object storage drive network scale—further adoption of 10GE and 40GE or is 1GE enough?

    Yes. If we think about the interconnection between the Control Plane and Data Plane of these systems (Orchestration and Object Storage Devices), better the connectivity the higher the performance.

    Q. Is the number of fragments set or configurable?  What are the trade-offs of requiring fewer fragments for recovery besides perhaps processing overhead?  Are there any gotchas to watch out for/consider?

    Yes. Storage policies are configurable. The number of “Parity” fragments defines the data loss risk. The more “Parity” fragments requested the lower this risk but this increases the storage resource needed for the Object. Eliminating single point failures is a key consideration. For example, if your Object Storage system has 10 servers, a storage policy using 9 of 12 will have 2 fragments of this Object located on 2 servers. In this case any single server failure would not cause data loss but may cause higher latency. However, if 3 servers would fail, you would lose access to your data until the servers were recovered. If the drives of the failed servers were not recovered then data loss would occur.

    Q. Is erasure encoding used instead of Hash tagging?

    No. Hash Tagging is a method of generating a unique number given a specific input of data, this number is used to find the location of the Object to be stored. Erasure Coding is the method used to create the fragments. So think of Hash tag as the seed to the address needed to find the fragments.

    Q. How large are the fragments?

    A rough estimate is the Object size divided by the number of fragments to re-hydrate the object. (e.g. 1GByte Object stored using a 8 of 12 policy would have a fragment size of 1GByte/8 =~ 125MByte

    Q. What do you see as the requirement for the interconnect between the Object storage arrays/boxes to be? Very large pipes as in multiple 40G links or something lower?

    It depends on the use case or Service Level Objective for the system. If your system design uses a Proxy service and Erasure Coding, then your back end network throughput (the network connecting the Proxy and Object Storage Devices – Storage Servers) will aggregated (Multiply). In this case the network throughput is based on the number of “Data” fragments being used. If you use Replication, then the back end network throughput will not aggregate. This multiplication factor, if present, is key to an efficient network strategy. In Non-Proxy based Object Storage designs or replication based Object Storage systems the network strategy will scale with network bandwidth to the limitation of the HDDs ability to server data.

    Q. What about access control and security at the object level?  Is that typically part of the model?

    Typically, access control methods are at the gateway or entry point of a Namespace. The access method used is up to the vendor of the Object Store.

    Q. What is the presentation mode at the host level? i.e. a drive mapping or similar

    Typically presentation methods are a RESTful API via HTTP. This used “PUT, GET, DELETE, LIST” semantics.

    Q. Can you explain the differences/similarities between object storage, CDMI and software defined storage?

    Object Stooge defined a system (Software + Hardware) to storage Objects. CDMI defends a method used to access/connect your application to an Object Storage system. Software Defined Storage describes using standard high volume servers with software for the purpose of storing data.

    Q. Why can’t a traditional approach be used to Object Storage for its durability?

    Traditional storage approaches such as direct attached storage (RAID Sets) do not scale. Once you run out of space, managing additional storage on separate systems becomes the issue.

    Q. Aren’t all types of data going to need the accessibility required by users? For example, isn’t everything going to need to placed in an object store?

    There is a lot of debate on this issue. The goal of an Object Store is two fold. 1) Drive down the cost/Byte and 2) keep content readily accessible.

    Q. How to we avoid losing the Metadata from the data? Also, is there something like sub-meta data, where a small amount of Metadata is contained within the data and the larger Metadata is stored somewhere else?

    Some Object storage systems support Extended File Attributes, which is a file system feature that allows the Applications to store “Metadata” about an Object which is then bound to the Object within the storage environment. These Extended File Attributes (XATTR’s) can be queried separately and can be used by your application as you see fit. The management of the XATTR’s is handled by the local file system and accessed by the Object Storage software via the RESTful API using HTTP.

    Q. Is maintaining multiple copies mainly for durability or can it be used for performance enhancement (parallel access), or is that irrelevant?

    Absolutely!  Management of copies/replicas can serve multiple purposes.  Replication across racks, datacenters, geographies, etc. can provide resiliency against failures at those levels.  Replication can also be used to provide object access in close proximity to the requester.  In the X-ray example discussed in the Webcast, we might set up a replica local to the medical practice for the first 90 days, in order to provide a low latency (time to first byte) copy during the initial treatment.  Additional copies can be kept at remote sites in order to provide fault tolerance.

    Q. Is there a standard methodology for migrating from a file-system based methodology to an object store?

    The short answer is no.  In general an application that is currently developed to use file or block based storage will need to be re-architected in order to take advantage of an object storage system/service.  There is, however, a growing category of products referred to as “cloud gateways” that can provide a bridge to object storage by presenting a filesystem to the existing application, while writing and reading via a RESTful API to a backend object storage system/service.

    Q. Is it safe to say that in order to use object storage the application needs to be “object storage aware”? Unlike a traditional storage where the application doesn’t necessarily need to be familiar with the storage or file system since that is handled at a lower layer.

    Yes, however as indicated in the question regarding migration of applications above, it is possible to implement a “cloud gateway” solution that will provide the translation from RESTful API to a CIFS/NFS fileshare, thus not requiring any application changes.  I would disagree with the premise that traditional applications don’t need to be familiar with the underlying storage.  Traditional file-based applications must understand the location (fileserver, folder, filename, etc.) in order to gain access to the appropriate data.

    Q. I’m hearing a lot of ‘what’ and ‘how’ but not so much ‘why’ about object storage. Can we hear some real-world examples of applications in industry today that are running better because of object storage?

    An example of an application running today with object storage behind it, and why:  Web Based Media Asset Management/Distribution.  This particular use case tends to deal with billions of files/objects that can vary in size from very small thumbnail images to massive 4k HD movie files.  The ability to deliver these to multiple platforms (phone, laptop, set top box, etc.) across multiple geographies is something that is well suited for object storage.  Traditional file and/or block based storage environments may hit scale limitations in dealing with the number of files/objects, in addition the ability to have a single namespace maintained across multiple locations/datacenters is something that is exceedingly complex for storage environments other than object stores.

    Q. Replicating an object two or three times would exponentially increase storage costs, wouldn’t it?  The more copies the higher the costs?

    Certainly more copies would use more storage, and as a result most object stores provide different durability schemes based upon the performance/availability tradeoffs the data owner is willing to make.  Recovering a single object from a replica is significantly faster than rebuilding an object from geo-distributed EC fragments. Also, as discussed in the question above related to replicas to drive performance, replication can serve the purpose of placing objects as close to the consumer as possible, minimizing time to first bye and increasing the overall throughput of an application.

    Q. If I have an app that access a CIFS share, is there a way to translate it into object store?

    Please see answer to question: “Is there a standard methodology for migrating from a file-system based methodology to an object store?” Short answer: Yes, via a “cloud gateway” product.

    Q. Is there a confluence point of Object and File based storage – specifically in NAS where object storage can be multi-protocol (NFS, and REST)?

    While there are some object storage solutions that provide their own native cloud-gateway capability (NAS protocol to the application, RESTful API to the object store).  There are very few that provide a “file/object duality” capability allowing applications to manipulate an object as both an object and a file.


















    Ethernet Meets Enterprise Storage – Finally

    May 27th, 2014

    Presumptuous, yes, because Ethernet has been a mainstay in enterprises since its early days over 40 years ago.  It initially grew to prominence as the local area network (LAN) connection in the enterprise. More recent advances have enabled Ethernet to become a standard for mission critical storage connectivity for block, file and object storage in many enterprises.

    Block storage in large enterprises has long been focused on Fibre Channel due to its performance capabilities.   In order to bring the same performance benefits to Ethernet, the IEEE 802.1 Data Center Bridging Task Group proposed a number of new standards to enhance Ethernet reliability.  For example, 802.1Qbb Priority-based Flow Control (PFC) provides a link level flow control mechanism to ensure lossless transmission under congestion, 802.1Qaz Enhanced Transmission Selection (ETS) provides a management framework for prioritized bandwidth and Data Center Bridging Exchange Protocol (DCBX) enabled these features to be used between neighbors to ensure consistency on the network. Collectively, these and other enhancements have brought those enterprise-class storage networking features to the Ethernet platform.

    In addition, the International Committee for Information Technology Services (INCITS) T11 Fibre Channel committee developed a specification for Fibre Channel over Ethernet (FCoE) in its FC-BB-5 standard in 2009, which allows the Fibre Channel protocol to run directly on top of Ethernet, eliminating the TCP/IP stack and allowing for efficient performance of the Fibre Channel protocol.  FCoE also depends on the Data Center Bridging standards from IEEE 802.1 in order to ensure the “losslessness” and flow control needed by Fibre Channel.

    An alternative to FCoE, iSCSI, was designed to run over standard Ethernet with TCP/IP and was designed to tolerate the “lossy” aspects of Ethernet.  Its architecture and the additional layers of encapsulation involved can impact latency and performance. However, more recent innovations in iSCSI have enabled it to run over a DCB Ethernet network, which enables iSCSI to inherit some of the enterprise storage features which have always been inherent in Fibre Channel.  For more on this, read last year’s blog “How DCB Makes iSCSI Better ” from Allen Ordoubadian.

    In 2013, INCITS submitted the FC-BB-6 standard for review which introduced, among other things, the VN2VN standard.  The VN2VN proposal will allow FCoE to work in a standard DCB switching environment without the presence of a Fibre Channel Forwarder (FCF).  An FCF allows for bridging between servers which are communicating with FCoE and storage devices which are communicating with traditional Fibre Channel.  As DCB switches and FCoE storage become more prevalent, the FC-BB-6 standard will allow for end-to-end FCoE connectivity in either a point to point (P2P) or DCB mesh environment. This will result in lower cost for FCoE environments. Products are beginning to appear which support VN2VN and over the next 18 months it is likely that all major vendors will support it. Check out our ESF Webcast “How VN2VN Will Help Accelerate Adoption of FCoE” for more details.

    The availability of CNAs with processing capability allows for offloading storage protocol processing from the host processor, though some CNAs use host-based storage protocol initiators in system software and do selective stateless offloads in the data path.  Both FCoE and iSCSI require the storage protocol to be encapsulated in a frame to be sent across the Ethernet network.  In an enterprise environment, especially a virtual server environment, CPU utilization is tracked closely and target CPU thresholds are often set.  Anything which can minimize spikes in CPU utilization can allow for more workloads to be placed on servers and allows for predictable energy consumption.

    For file storage, Ethernet has traditionally been the connectivity option of choice for file servers used as “shares” for centralized employee document storage. In the 21st century, usage of network attached storage (NAS) with the Network File System (NFS) has increased for enterprise databases and Hadoop clusters, especially with the availability of 10Gb Ethernet.  New features in NFS 4 and later introduced security and stateful protocol support after development of NFS was taken over by the Internet Engineering Task Force (IETF).

    Object storage, has been around for nearly 20 years as a repository for storing data as objects which include not only the original file, but also a globally unique identifier and metadata which describes the object and various parameters about the object.  It has been used to store many forms of unstructured data, but found niches in certain areas, such as legal documents with retention policies and archiving photos and videos.  More recently, there seems to be a resurgence in object storage as the amount of unstructured data generated by enterprises continues to skyrocket.  Open source object storage in Ceph and OpenStack are also helping to drive the adoption. SNIA ESF is hosting a live Webcast on object storage on June 11, 2014, called “Object Storage 101.” I encourage you to register for this presentation for an unbiased look at the what, how and why of object storage technologies.

    When combined with the advances in link speed, throughput capabilities, latency and input/output operations per second (IOPS) in modern 10Gb/s and 40Gb/s Ethernet, these existing and emerging Ethernet standards and storage architectures are having a profound effect on the ability of Ethernet as an enterprise class storage networking platform.  Vendors and customers are seeing the advantage in one wire, the Ethernet cable, carrying all LAN, WAN and storage traffic.




    New ESF Live Webcast – Object Storage 101

    May 22nd, 2014

    Understanding the what, how and why behind object storage technologies.

    Object storage systems are gaining quite a bit of attention as workloads continue to push scalability and availability limits of massive unstructured data repositories.  For some emerging workloads, object counts are measured by the 100’s of billions and capacities start in petabytes!

    Need a tutorial on object storage? Join us on June 11th at 2:00 p.m. ET, 11:00 a.m. PT for our live Webcast, “Object Storage 101” as we take an unbiased look at the what, how and why behind object storage technologies. In this object storage primer, we’ll cover:

    • What is object storage
    • Where is it being deployed successfully
    • Key attributes of today’s object storage solutions
    • How object storage differs from traditional file or block technologies
    • Common enterprise use-cases and deployment approaches
    • Key considerations before deploying an object store

    This will be a vendor-neutral live and lively discussion. Register now and please bring your questions for our expert panel.


    Object Storage is a Big Deal (and Ethernet Matters)

    January 14th, 2013

    A significant challenge in managing large amounts of data (or Big Data) is a lack of what I like to call “total data awareness”. It’s a situation where you know (or suspect) that you have data – you just can’t find it. When you think about many current IT environments, they are often not built for total data awareness. This starts with core elements of the IT infrastructure, such as file systems. Traditional file systems and access methods were not designed to store hundreds of millions or billions of files in a single namespace. This leads to admins storing data in multiple file systems, multiple shares, complex directory structures – not because the data should be logically organized in that way, but simply because of limitations in file system architectures. This issue becomes even more pressing when data sits in multiple locations, maybe even across on-premise and off-premise, cloud-based storage.

    Is object-based storage the answer?

    Think about how you find data on your computer. Do you navigate complex directory structures, trying to remember the file name of the file that hopefully has the data you are looking for – or have you moved on and just use search tools like Spotlight? Imagine you have hundreds of millions of files, scattered across dozens or hundreds of sites. How about just searching across these sites and immediately finding the data you are looking for? With object storage technology you have the ability to store data in objects, along with metadata that describes the object. Now you can just search for your data based on metadata tags (like a filename – or even better an account number and document type) – as well as manage data based on policies that leverage that metadata.

    However, this often means that you have to consider interfacing with your storage system through APIs, as opposed to NFS and CIFS – so your applications need to support whatever API your storage vendor offers.

    CDMI to the rescue?

    Today, storage vendors often use proprietary APIs. This means that application vendors would have to support a plethora of APIs from a number of different vendors, leading to a lack of commitment from application vendors to support more innovative, object-based storage architectures.

    A key path to solve this issue is to leverage technology and standards that have been specifically developed to provide this idea of a single namespace for billions of data sets and across locations and even managed services that might reside off-premise.

    Relatively new on the standards side you have CDMI (http://www.snia.org/cdmi), the Cloud Data Management Interface. CDMI is a standard developed by SNIA (http://www.snia.org), the Storage Networking Industry Association, with heavy involvement from a number of leading storage vendors. CDMI not only introduces a standard interface to ingest and retrieve data into and out of a large-scale repository, it also enables applications to easily manage this repository and where the data sits.

    CDMI is the new NFS

    Forgive the provocation, but when it comes to creating and managing large, distributed content repositories it quickly becomes clear that NFS and CIFS are not ideally suited for this use case. This is where CDMI shines, especially with an object-based storage architecture behind it that was built to support multi-petabyte environments with billions of data sets across hundreds of sites and accommodates retention policies that can reach to “forever”.

    CDMI and NFS have something in common – Ethernet

    One of the key commonalities between CDMI and NFS is that they both are ideally suited to be deployed in an Ethernet infrastructure. CDMI, specifically, is a RESTful HTTP interface, so it runs on standard Ethernet networks. Even for object storage deployments that don’t support CDMI, practically all of these multi-site, long-term repositories support HTTP (and thus Ethernet) through proprietary APIs based on REST or SOAP.

    Why does this matter

    Ethernet infrastructure is a great foundation to run any number of workloads, including access to data that sits in large, multi-site content repositories that are based on object storage technologies. So if you are looking at object storage, chances are that you will be able to leverage existing Ethernet infrastructure.