Object Storage 101 – Questions and Answers

At our recent live ESF Webcast, “Object Storage 101,” we talked about the what, how, and why behind storage technologies. Over 200 people attended the event. If you missed it, it’s now available on-demand. It was an interactive session and we did not have time to address all the questions, so here are answers to them all. If you think of additional questions, please feel free to comment on this blog.

Q. Would Object Storage be a feasible solution for only the nearline storage tier?

Typically Yes. If we think about the latency needed for real-time transactions, these are best served using a cache storage tier such as NAND or large arrays of RAM. Object stores are excellent methods to store and retrieve large data sets within single/multiple containers. Note: most systems support offset reads so you don’t need to access an entire object to get to the section of interest.

Q. Where is the index to find the location of an object that is stored? Is it stored locally or stored distributedly or replicated among each clusters?

Storage of the Index or Metadata of objects that are stored, if used, typically is replicated throughout the system. Also, if the Metadata is lost, typically, these can be re-built as a maintenance function.

Q. How is the object stored/broken up? Aside from being stored by metadata (like name, size, etc) … what is the process of the fragmentation…breaking it up …as described during this erasure coding segment?  Once it’s assigned some unique identifier … ie. an x-ray picture…. how is it addressed? (if not by block/bit/byte/level)?

Currently, Objects are stored using one of two methods of data protection either Replication or Erasure Coding. Some systems use both. That said, there are several algorithms used today to Erasure Code protect Objects. When using Reed-Solomon methods, you need to specify the number of “Data” Fragments and the number of the “Parity” fragments that will be created. The Size of each “Data” fragment is closely related to the Object size divided by the number of “Data” fragments requested. Each “Parity” fragment will be same size of each of the “Data” fragments created. The protected Object size is the sum of the “Data” fragments plus the “Parity” fragments created. Each of these fragments (Data and Parity) is stored on a different server for the purpose of avoiding a single point failure. The application that created the Object that will be accessing the Object store is responsible for keeping track of the ID of the Object and the Namespace the ID was stored in. Typically the Application will create an ID however, when an Application “Puts” an Object using an existing ID, the older stored Object using that same ID is overwritten. Typically, access into an Object Store using a RESTful Interface using commands like “Put, Get, Delete, List” over HTTP.

Q. Will Object storage drive network scale—further adoption of 10GE and 40GE or is 1GE enough?

Yes. If we think about the interconnection between the Control Plane and Data Plane of these systems (Orchestration and Object Storage Devices), better the connectivity the higher the performance.

Q. Is the number of fragments set or configurable?  What are the trade-offs of requiring fewer fragments for recovery besides perhaps processing overhead?  Are there any gotchas to watch out for/consider?

Yes. Storage policies are configurable. The number of “Parity” fragments defines the data loss risk. The more “Parity” fragments requested the lower this risk but this increases the storage resource needed for the Object. Eliminating single point failures is a key consideration. For example, if your Object Storage system has 10 servers, a storage policy using 9 of 12 will have 2 fragments of this Object located on 2 servers. In this case any single server failure would not cause data loss but may cause higher latency. However, if 3 servers would fail, you would lose access to your data until the servers were recovered. If the drives of the failed servers were not recovered then data loss would occur.

Q. Is erasure encoding used instead of Hash tagging?

No. Hash Tagging is a method of generating a unique number given a specific input of data, this number is used to find the location of the Object to be stored. Erasure Coding is the method used to create the fragments. So think of Hash tag as the seed to the address needed to find the fragments.

Q. How large are the fragments?

A rough estimate is the Object size divided by the number of fragments to re-hydrate the object. (e.g. 1GByte Object stored using a 8 of 12 policy would have a fragment size of 1GByte/8 =~ 125MByte

Q. What do you see as the requirement for the interconnect between the Object storage arrays/boxes to be? Very large pipes as in multiple 40G links or something lower?

It depends on the use case or Service Level Objective for the system. If your system design uses a Proxy service and Erasure Coding, then your back end network throughput (the network connecting the Proxy and Object Storage Devices – Storage Servers) will aggregated (Multiply). In this case the network throughput is based on the number of “Data” fragments being used. If you use Replication, then the back end network throughput will not aggregate. This multiplication factor, if present, is key to an efficient network strategy. In Non-Proxy based Object Storage designs or replication based Object Storage systems the network strategy will scale with network bandwidth to the limitation of the HDDs ability to server data.

Q. What about access control and security at the object level?  Is that typically part of the model?

Typically, access control methods are at the gateway or entry point of a Namespace. The access method used is up to the vendor of the Object Store.

Q. What is the presentation mode at the host level? i.e. a drive mapping or similar

Typically presentation methods are a RESTful API via HTTP. This used “PUT, GET, DELETE, LIST” semantics.

Q. Can you explain the differences/similarities between object storage, CDMI and software defined storage?

Object Stooge defined a system (Software + Hardware) to storage Objects. CDMI defends a method used to access/connect your application to an Object Storage system. Software Defined Storage describes using standard high volume servers with software for the purpose of storing data.

Q. Why can’t a traditional approach be used to Object Storage for its durability?

Traditional storage approaches such as direct attached storage (RAID Sets) do not scale. Once you run out of space, managing additional storage on separate systems becomes the issue.

Q. Aren’t all types of data going to need the accessibility required by users? For example, isn’t everything going to need to placed in an object store?

There is a lot of debate on this issue. The goal of an Object Store is two fold. 1) Drive down the cost/Byte and 2) keep content readily accessible.

Q. How to we avoid losing the Metadata from the data? Also, is there something like sub-meta data, where a small amount of Metadata is contained within the data and the larger Metadata is stored somewhere else?

Some Object storage systems support Extended File Attributes, which is a file system feature that allows the Applications to store “Metadata” about an Object which is then bound to the Object within the storage environment. These Extended File Attributes (XATTR’s) can be queried separately and can be used by your application as you see fit. The management of the XATTR’s is handled by the local file system and accessed by the Object Storage software via the RESTful API using HTTP.

Q. Is maintaining multiple copies mainly for durability or can it be used for performance enhancement (parallel access), or is that irrelevant?

Absolutely!  Management of copies/replicas can serve multiple purposes.  Replication across racks, datacenters, geographies, etc. can provide resiliency against failures at those levels.  Replication can also be used to provide object access in close proximity to the requester.  In the X-ray example discussed in the Webcast, we might set up a replica local to the medical practice for the first 90 days, in order to provide a low latency (time to first byte) copy during the initial treatment.  Additional copies can be kept at remote sites in order to provide fault tolerance.

Q. Is there a standard methodology for migrating from a file-system based methodology to an object store?

The short answer is no.  In general an application that is currently developed to use file or block based storage will need to be re-architected in order to take advantage of an object storage system/service.  There is, however, a growing category of products referred to as “cloud gateways” that can provide a bridge to object storage by presenting a filesystem to the existing application, while writing and reading via a RESTful API to a backend object storage system/service.

Q. Is it safe to say that in order to use object storage the application needs to be “object storage aware”? Unlike a traditional storage where the application doesn’t necessarily need to be familiar with the storage or file system since that is handled at a lower layer.

Yes, however as indicated in the question regarding migration of applications above, it is possible to implement a “cloud gateway” solution that will provide the translation from RESTful API to a CIFS/NFS fileshare, thus not requiring any application changes.  I would disagree with the premise that traditional applications don’t need to be familiar with the underlying storage.  Traditional file-based applications must understand the location (fileserver, folder, filename, etc.) in order to gain access to the appropriate data.

Q. I’m hearing a lot of ‘what’ and ‘how’ but not so much ‘why’ about object storage. Can we hear some real-world examples of applications in industry today that are running better because of object storage?

An example of an application running today with object storage behind it, and why:  Web Based Media Asset Management/Distribution.  This particular use case tends to deal with billions of files/objects that can vary in size from very small thumbnail images to massive 4k HD movie files.  The ability to deliver these to multiple platforms (phone, laptop, set top box, etc.) across multiple geographies is something that is well suited for object storage.  Traditional file and/or block based storage environments may hit scale limitations in dealing with the number of files/objects, in addition the ability to have a single namespace maintained across multiple locations/datacenters is something that is exceedingly complex for storage environments other than object stores.

Q. Replicating an object two or three times would exponentially increase storage costs, wouldn’t it?  The more copies the higher the costs?

Certainly more copies would use more storage, and as a result most object stores provide different durability schemes based upon the performance/availability tradeoffs the data owner is willing to make.  Recovering a single object from a replica is significantly faster than rebuilding an object from geo-distributed EC fragments. Also, as discussed in the question above related to replicas to drive performance, replication can serve the purpose of placing objects as close to the consumer as possible, minimizing time to first bye and increasing the overall throughput of an application.

Q. If I have an app that access a CIFS share, is there a way to translate it into object store?

Please see answer to question: “Is there a standard methodology for migrating from a file-system based methodology to an object store?” Short answer: Yes, via a “cloud gateway” product.

Q. Is there a confluence point of Object and File based storage – specifically in NAS where object storage can be multi-protocol (NFS, and REST)?

While there are some object storage solutions that provide their own native cloud-gateway capability (NAS protocol to the application, RESTful API to the object store).  There are very few that provide a “file/object duality” capability allowing applications to manipulate an object as both an object and a file.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

M.2 Webcast – Get the Latest Info on the New SSD Card Form Factor!

The SNIA Solid State Storage Initiative is partnering with SATA-IO and NVM Express to present a panel of experts from Objective Analysis, Micron, TE Connectivity, Intel, Calypso, and Coughlin Associates to give you the latest information on M.2, the new SSD card form factor. You will leave this webinar with an understanding of the M.2 market, M.2 cards and connection schemes, NVM Express, and M.2 performance; you’ll also be able to ask questions of the experts.

Join us on June 10, 2014 at 10:00 am.  Register at http://snia.org/news_events/multimedia#webcasts!

It’s “All About M.2 SSDs” In a New SSSI Webcast June 10

Interested in M.2, the new SSD card form factor?

The SNIA Solid State Storage Initiative is partnering with SATA-IO and NVM Express to give you the latest information on M.2, the new SSD card form factor.  Join us “live” on Tuesday, June 10, at 10:00 am Pacific time/1:00 pm Eastern time.

Hear from a panel of experts, including Tom Coughlin of Coughlin Associates, Jim Handy of Objective Analysis, Jon Tanguy of Micron, Jaren May of TE Connectivity, David Akerson of Intel, and Eden Kim of Calypso Systems.  You will leave this webinar with an understanding of the M.2 market, M.2 cards and connection schemes, NVM Express, and M.2 performance. You’ll also be able to ask questions of the experts.

You can access this webcast via the internet.  Click here, or visit http://snia.org/news_events/multimedia#webcasts

Hear the Latest on UltraDIMM SSDs Monday June 9 at 4:00 pm PDT

Join the SNIA Solid State Storage Initiative for an “open-to-all interested” PCIe SSD Committee call and a discussion on UltraDIMM SSDs.

SSSI member Rob Callaghan of SanDisk will discuss:

* What is an UltraDIMM SSD?
* How is it different from other SSDs?
* How does a Block IO SSD reside on a DIMM channel?
* Do I need to modify/update my BIOS to use it?
* How many do I use, and what is the scalability?
* What kind of performance can I expect to get?
* What is the cost benefit of using this technology?
* What applications will benefit from using an UltraDIMM SSD?

Below are the dial-in and WebEx details.  Eden Kim and the PCIe SSD Committee hope you will join the SSSI for this interesting topic!

UltraDIMM Discussion

Monday June 9, 2014 at 4PM PDT

Log in to:  snia.webex.com Meeting Number: 792 152 928 password:  sssipcie

Dial-in to: Teleconference: 1-866-439-4480 Passcode: 57236696#

Ethernet Meets Enterprise Storage – Finally

Presumptuous, yes, because Ethernet has been a mainstay in enterprises since its early days over 40 years ago.  It initially grew to prominence as the local area network (LAN) connection in the enterprise. More recent advances have enabled Ethernet to become a standard for mission critical storage connectivity for block, file and object storage in many enterprises.

Block storage in large enterprises has long been focused on Fibre Channel due to its performance capabilities.   In order to bring the same performance benefits to Ethernet, the IEEE 802.1 Data Center Bridging Task Group proposed a number of new standards to enhance Ethernet reliability.  For example, 802.1Qbb Priority-based Flow Control (PFC) provides a link level flow control mechanism to ensure lossless transmission under congestion, 802.1Qaz Enhanced Transmission Selection (ETS) provides a management framework for prioritized bandwidth and Data Center Bridging Exchange Protocol (DCBX) enabled these features to be used between neighbors to ensure consistency on the network. Collectively, these and other enhancements have brought those enterprise-class storage networking features to the Ethernet platform.

In addition, the International Committee for Information Technology Services (INCITS) T11 Fibre Channel committee developed a specification for Fibre Channel over Ethernet (FCoE) in its FC-BB-5 standard in 2009, which allows the Fibre Channel protocol to run directly on top of Ethernet, eliminating the TCP/IP stack and allowing for efficient performance of the Fibre Channel protocol.  FCoE also depends on the Data Center Bridging standards from IEEE 802.1 in order to ensure the “losslessness” and flow control needed by Fibre Channel.

An alternative to FCoE, iSCSI, was designed to run over standard Ethernet with TCP/IP and was designed to tolerate the “lossy” aspects of Ethernet.  Its architecture and the additional layers of encapsulation involved can impact latency and performance. However, more recent innovations in iSCSI have enabled it to run over a DCB Ethernet network, which enables iSCSI to inherit some of the enterprise storage features which have always been inherent in Fibre Channel.  For more on this, read last year’s blog “How DCB Makes iSCSI Better ” from Allen Ordoubadian.

In 2013, INCITS submitted the FC-BB-6 standard for review which introduced, among other things, the VN2VN standard.  The VN2VN proposal will allow FCoE to work in a standard DCB switching environment without the presence of a Fibre Channel Forwarder (FCF).  An FCF allows for bridging between servers which are communicating with FCoE and storage devices which are communicating with traditional Fibre Channel.  As DCB switches and FCoE storage become more prevalent, the FC-BB-6 standard will allow for end-to-end FCoE connectivity in either a point to point (P2P) or DCB mesh environment. This will result in lower cost for FCoE environments. Products are beginning to appear which support VN2VN and over the next 18 months it is likely that all major vendors will support it. Check out our ESF Webcast “How VN2VN Will Help Accelerate Adoption of FCoE” for more details.

The availability of CNAs with processing capability allows for offloading storage protocol processing from the host processor, though some CNAs use host-based storage protocol initiators in system software and do selective stateless offloads in the data path.  Both FCoE and iSCSI require the storage protocol to be encapsulated in a frame to be sent across the Ethernet network.  In an enterprise environment, especially a virtual server environment, CPU utilization is tracked closely and target CPU thresholds are often set.  Anything which can minimize spikes in CPU utilization can allow for more workloads to be placed on servers and allows for predictable energy consumption.

For file storage, Ethernet has traditionally been the connectivity option of choice for file servers used as “shares” for centralized employee document storage. In the 21st century, usage of network attached storage (NAS) with the Network File System (NFS) has increased for enterprise databases and Hadoop clusters, especially with the availability of 10Gb Ethernet.  New features in NFS 4 and later introduced security and stateful protocol support after development of NFS was taken over by the Internet Engineering Task Force (IETF).

Object storage, has been around for nearly 20 years as a repository for storing data as objects which include not only the original file, but also a globally unique identifier and metadata which describes the object and various parameters about the object.  It has been used to store many forms of unstructured data, but found niches in certain areas, such as legal documents with retention policies and archiving photos and videos.  More recently, there seems to be a resurgence in object storage as the amount of unstructured data generated by enterprises continues to skyrocket.  Open source object storage in Ceph and OpenStack are also helping to drive the adoption. SNIA ESF is hosting a live Webcast on object storage on June 11, 2014, called “Object Storage 101.” I encourage you to register for this presentation for an unbiased look at the what, how and why of object storage technologies.

When combined with the advances in link speed, throughput capabilities, latency and input/output operations per second (IOPS) in modern 10Gb/s and 40Gb/s Ethernet, these existing and emerging Ethernet standards and storage architectures are having a profound effect on the ability of Ethernet as an enterprise class storage networking platform.  Vendors and customers are seeing the advantage in one wire, the Ethernet cable, carrying all LAN, WAN and storage traffic.

 

 

 

New ESF Live Webcast – Object Storage 101

Understanding the what, how and why behind object storage technologies.

Object storage systems are gaining quite a bit of attention as workloads continue to push scalability and availability limits of massive unstructured data repositories.  For some emerging workloads, object counts are measured by the 100’s of billions and capacities start in petabytes!

Need a tutorial on object storage? Join us on June 11th at 2:00 p.m. ET, 11:00 a.m. PT for our live Webcast, “Object Storage 101” as we take an unbiased look at the what, how and why behind object storage technologies. In this object storage primer, we’ll cover:

  • What is object storage
  • Where is it being deployed successfully
  • Key attributes of today’s object storage solutions
  • How object storage differs from traditional file or block technologies
  • Common enterprise use-cases and deployment approaches
  • Key considerations before deploying an object store

This will be a vendor-neutral live and lively discussion. Register now and please bring your questions for our expert panel.

 

Getting Started with the CDMI Conformance Test Program

Together with our partner, TATA Consultancy Services, we recently had a great live Webcast to launch the Conformance Test Program (CTP) for the SNIA Cloud Data Management Interface (CDMI). CDMI is an ISO/IEC standard that offers end users simplicity and data storage interoperability across a wide range of cloud solutions. Interoperability and portability of data stored in the cloud has become a top IT priority. The CTP tests for conformance against the specification, and provides purchasers of certified cloud storage solutions the assurance that these solutions meet CDMI interoperability standards. Our Webcast is now available on demand. It details the benefits of the CDMI CTP program and explains how any cloud storage vendor or ISV can begin the CTP process. I encourage you to check it out to learn:

  • Key benefits of the CDMI standard for vendors and end users
  • Growing adoption of the CDMI standard
  • The suite of conformance tests required to achieve CDMI CTP certification
  • How to begin the CTP process

In addition to the Webcast replay, I encourage you to check out our CDMI CTP Frequently Asked Questions (FAQ). Getting started is easy. Just fill out the CTP form and you’ll be on your way.  

Help Develop Next Generation SSDs and Win Cool Stuff!

The SNIA Solid State Storage Initiative (SNIA SSSI) is working to better understand disk drive use in everyday computer actions. You can help by participating in our Workload I/O Capture Project – WIOCP – and get rewarded!

 

The WIOCP captures I/O statistics unobtrusively and without compromising your PC’s performance. No personal data or content is captured – only statistics on the types of data transfers that occur.  This helps the SNIA SSSI and the industry understand what actually takes place with your drive when you use your PC.

 

Collecting I/O statistics helps computer scientists determine the type of workloads your drive is experiencing.  By capturing statistics from a large number of computer users, designers can optimize both the drive and the host computer system to improve your overall computing experience.  You can be a part of history!

 

Participate NOW and return one set of statistics to qualify to win a $10 Amazon gift card and to be  entered in a drawing for a free Intel 120GB SSD.  Submit more sets and increase your chance of winning a SSD!

 

Go to http://www.snia.org/forums/sssi/wiocp for a FAQ and details on participating.  And see results at http://iotta.snia.org/

The IETF, Consensus and NFSv4

The Internet Engineering Task Force is one of the older – and more unusual – internet organizations. It first met in 1986, and has regularly met since then several times a year. The last meeting was the March 2-7, 2014 IETF89 in London,  and I was fortunate to be in attendance.

What Makes the IETF Unique

What’s unusual about the IETF? From my perspective as someone who spends most of his working day dealing with more traditional standards bodies, two things stand out.

One, (in its own words) “it exists as a collection of happenings, but is not a corporation and has no board of directors, no members, and no dues.” The non-members divide themselves into loosely organized groups that agree on an agenda, discuss the stuff of the internet on mailing lists, generate documents that reflect consensus, and then agree to them as standards.

Two, the London IETF89 meeting was not a conference. The IETF doesn’t do conferences; there are no formal papers given by luminaries or industry experts. There is an agenda, agreed beforehand by consensus (there’s that word again) and then a few short and brief presentations on topics of interest. There are questions from the floor, discussions, and agreement of one form or another. I didn’t see a single formal vote; just that ill-defined and unquantifiable consensus where the outcome is just, well, agreed on.

Why the IETF Works

Revolution! Anarchy! This is unusual for a standards body, and it sounds like a recipe for disaster. But strangely, it isn’t, and from what I saw of the process, I think I see why.

It’s because it’s attended by software and network engineers who see code as the concrete representation of a good idea. They value running code, or stuff that works. That’s a powerful advantage over academic discussions, or codifying and formalizing a good (sometimes not-so-good) idea that no-one has yet implemented or is ever likely to.

Why face to face though? I reckon that even revolutionaries and anarchists need validation and a sense of community, and there was much of that in evidence in the corridors and public spaces outside of the formal meeting. Everyone talks like there’s no tomorrow. Ideas everywhere, grounded in what can be shown to actually work.

I attended, amongst others, the NFSv4 workgroup meetings. The agenda and notes from the meeting give some flavor of this consensus, and I am truly impressed by the process. I’m also thankful that there is some organization; Sorin Faibish (EMC) took notes, Tome Haynes (NetApp) chaired the meeting and kept it moving along, and all in all it was a great illustration of the best the industry can do.

As to the technical content… well, you can read the minutes. There are notes on security discussions led by Andy Adamson, on features proposed for NFSv4.2, and getting an RFC in place that accurately reflects implementations of earlier versions of NFSv4 and more. I’ll be blogging about this and more over the next few months. In the meanwhile, in the spirit of the IETF that favors working code over ideas and the concrete over the abstract, I’ll be presenting “Practical Steps to Implementing pNFS and NFSv4.1” at DSIcon on April 22-24 in Santa Clara, CA. OK, this one’s a conference, and anarchy will be in short supply, but we can still have great discussions and arguments in the corridors and public spaces outside of the formal meetings. I look forward to seeing you there!

Relentless Advance Of Ethernet – And Ethernet Storage Networking

As one Cisco colleague once said to me, “After the nuclear holocaust, there will be two things left: cockroaches and Ethernet.”  Not sure I like Ethernet’s unappealing company in that statement, but the truth it captures is that Ethernet, now entering its fifth decade (wow!), is ubiquitous and still continuing to advance at a breathtaking pace.  And as it advances, it advances the capabilities of storage networking based on the Ethernet backbone, be it file storage like NFS or SMB or block storage like iSCSI or FCoE.

Most recent evidence of Ethernet’s continuing and relentless evolution is illustrated in the 28 March 2014 announcement from the Ethernet Alliance congratulating the IEEE on formation of their IEEE P802.3bs™ Task Force:

The new group is chartered with the development of the IEEE P802.3bs 400 Gigabit Ethernet (GbE) project, which will define Ethernet Media Access Control (MAC) parameters, physical layer specifications, and management parameters for the transfer of Ethernet format frames at 400 Gb/s. As the leading voice of the Ethernet ecosystem, the Ethernet Alliance is ideally positioned to support this latest move towards standardizing and advancing 400Gb/s technologies through efforts such as the launch of the Ethernet Alliance’s own 400 GbE Subcommittee.

Ethernet is in production today from multiple vendors at 40GbE and supports all storage protocols, including FCoE, at those speeds.  Market forecasters expect the first 100GbE adapters to appear in 2015.  Obviously, it is too early to forecast when 400GbE will arrive, but the train is assuredly in motion.  And support for all the key storage protocols we see today on 10GbE and 40GbE will naturally extend to 100GbE and 400GbE.  Jim O’Reilly makes similar points in his recent Information Week article, “Ethernet: The New Storage Area Network where he argues, “Ethernet wins on schedule, cost, and performance.”

Beyond raw transport speed, the rich Ethernet infrastructure offers techniques to catapult your performance even beyond the fastest single-pipe speed.  The Ethernet world has established techniques for what is alternately referred to as link aggregation, channel bonding, or teaming.  The levels available are determined by the capabilities provided in system software and what switch vendors will support.  And those capabilities, in turn, are determined by what they respectively see as market demand.  VMware, for example, today will let you bond eight 10GbE channels into a single 80GbE pipe.  And that’s today with mainstream 10GbE technology.

Ethernet will continue to evolve in many different ways to support the needs of the industry.  Serving as a backbone for all storage networking traffic is just one of many such roles for Ethernet.  In fact, precisely because of the increasing breadth of usage models Ethernet supports, it will also continue to offer cost advantages.  The argument here is a very simple volume argument:

Total Server-class Adapter and LOM Market Ports

crehan-relentless-ethernet-420

Enough said, except to also note that volume is what funds speed roadmaps.