OpenStack Cloud Storage Q&A

More than 300 people have seen our Webcast “OpenStack Cloud Storage.” If you missed it, it’s now available on demand. It was a great session with a lot of questions from attendees. We did not have time to address them all – so here is a complete Q&A. If you think of any others, please comment on this blog. Also, mark your calendar for January 29th when the SNIA Cloud Storage Initiative will continue its Developers Tutorial Series with a live Webcast on OpenStack Manila.

Q. Is it correct to say that one can use OpenStack on any vendor’s hardware?

A. Servers, yes, assuming the hardware is supported by Linux. Block storage requires a driver, and not all vendor systems have Cinder drivers.

Q. Is there any OpenStack investigation and/or development in the storage networking area?

A. Cinder includes support for FC and iSCSI. As of Icehouse, the FC support also includes auto-zoning. 

Q. Is there any monetization going on around OpenStack, like we see for distros of Linux?

A. Yes, there are already several commercial distributions available.

Q. Is erasure code needed to get a positive business case for Swift, when compared with traditional storage systems?

A. It is a way to reduce the cost of replication. Traditional storage systems typically already have erasure coding, in the form of RAID. Systems without erasure coding end up using more storage to achieve the same level of protection due to their use of 3-way replication.

Q. Is erasure code currently implemented in the current Swift release?

A. No, it is a separate development stream, which has not been merged yet.

Q. Any limitation on the number of objects per container or total number of objects per Swift cluster?

A. Technically there are no limits. However, in practice, the fact that the containers are implemented using SQL lite limits their size to a million or maybe a few million objects per container. However, due to the way that Swift partitions its metadata, each user can also have millions of containers, and there can be millions of users. So practically speaking, the total system can support an unlimited number of objects.

Q. What are some of the technical reasons for an enterprise to select Swift vs. Amazon S3? In other words, are they pretty much direct alternatives, or does each have its own preferred use cases?

A. They are more or less direct alternatives. There are some minor differences, but they are made for the same purpose. That said, S3 is only available from Amazon. There are some S3 compatible systems, but most of those also support Swift. Swift, on the other hand, is available open source or from multiple vendors. So if you want to run it in your own data center, or in a public cloud other than Amazon, you probably want Swift.

Q. If I wanted to play around with Open Stack, Cinder, and Swift in a lab environment (or in my basement), what do I need and how do I get started?

A. openstack.org is the best place to start. The “devstack” distribution is also good for playing around.

Q. Will you be showing any features for Kilo?

A. The “Futures” I showed will likely be Kilo features, though the final decision of what will be in Kilo won’t happen until just before release.

 Q. Are there any plans to implement data encryption in Cinder?

A. I believe some of the back ends can support encryption already. Cinder is really just a provisioning and orchestration layer. Encryption is a data path feature, so it would need to be implemented in the back end.

Q. Some time back I heard OpenStack Swift is going to come up with block storage as well, any timeline for that?

A. I haven’t heard this, Swift is object storage.

Q. The performance characteristics of Cinder block services can vary quite widely. Is there any standard measure proposed within OpenStack to inform Nova or the application about the underlying Cinder block performance characteristics?

A. Volume types were designed to enable clouds to provide different levels of service. The meaning of these types is up to the cloud administrator. That said, Cinder does expose QoS features like minimum/maximum IOPS.

Q. Is the hypervisor talking to a cinder volume or to (for example) a NetApp or EMC volume?

A. The hypervisor talks to a volume the same way it does outside of OpenStack. For example, the KVM hypervisor can talk to volumes through LVM, or can mount SAN volumes directly.

Q. Which of these projects are most production-ready?

A. This is a hard question, and depends on your definition of production ready. It’s hard to do much without Nova, Glance, and Horizon. Most people use Cinder too, and Swift has been in production at HP and Rackspace for years. Neutron has a lot of complexity, so some people still use Nova network, but that has many limitations. For toy clouds you can avoid using Keystone, but you need it for a “production” cluster. The best way to get a “production ready” OpenStack is to get a supported commercial distribution.

Q. Are there any Plugfests?

A. No, however, the Cinder team has a fairly extensive and continuous integration process that drivers need to pass through. Swift does not because it doesn’t officially “support” any plugins.

 

 

 

OpenStack Manila Webcast – Shared File Services for the Cloud

On January 29th, we continue our Cloud Developer’s series by hosting a live Webcast on OpenStack Manila – the OpenStack file share service. Manila provides the management of file shares (for example, NFS and CIFS) as a core service to OpenStack. Manila currently works with a variety of vendors’ storage products, including NetApp, Red Hat, EMC, IBM, and with the Linux NFS server.

In this Webcast we will:

  • Introduce the Manila file share service
  • Review key Manila concepts
  • Describe the logical architecture of Manila and its API structure
  • Explain what’s new in Juno, the latest release of OpenStack
  • Highlight the roadmap for Manila in the next release, OpenStack Kilo, and beyond

Register now for this live event that we expect will be informative and interactive. I hope you’ll join us.

OpenStack Cloud Storage Webcast Preview

On January 14, 2015, the CSI continues its Developer Tutorial series by hosting a live Webcast on OpenStack Cloud Storage. As you likely know, OpenStack is an open source cloud operating system that provides pools of compute, storage, and networking.

OpenStack is currently being developed by thousands of developers from hundreds of companies across the globe, and is the basis of multiple public and private cloud offerings.  Register now for this SNIA-CSI Webcast to hear Sam Fineberg, Distinguished Technologist at HP discuss:

  • Storage aspects of OpenStack including the core projects for block storage (Cinder) and object storage (Swift)
  • Emerging shared file service
  • Common configurations and use cases for these technologies
  • Interaction with the other parts of OpenStack
  • New developments in Cinder and Swift that enable advanced array features, QoS, new storage fabrics, and new types of drives.

I’ll be moderating this live event and Sam and I will be available to answer your specific questions. It should be an informative and interactive session. I hope you’ll join us!

What’s New in the CDMI 1.1 Cloud Storage Standard

On December 2, 2014, the CSI is hosting a Developer Tutorial Webcast “Introducing CDMI 1.1” to dive into all the capabilities of CDMI 1.1.

Register now to join David Slik, Co-Chair, SNIA Cloud Storage Technical Work Group and me, Alex McDonald, as we’ll explore what’s in this major new release of the CDMI standard, with highlights on what you need to know when moving from CDMI 1.0.2 to CDMI 1.1.

The latest release – CDMI 1.1 –  includes:

  • Enabling support for other popular industry supported cloud storage protocols such as OpenStack Swift and Amazon S3
  • A variety of extensions, some part of the core specification and some stand-alone, that include a CIMI standard extension, support for immediate queries , an LTFS Export extension, an OVF extension, along with multi-part MIME and versioning extensions. A full list can be found here.
  • 100% backwards compatibility with ISO certified CDMI v. 1.0.2 to ensure continuity and backward compatibility with existing CDMI systems
  • And more

This event on December 2nd will be live, so please bring your specific questions. We’ll do our best to answer them on the spot. I hope you’ll join us!

 

Implementing Multiple Cloud Storage APIs

OpenStack Summit Paris

The beauty of cloud storage APIs is that there are so many to choose from. Of course if you are implementing a cloud storage API for a customer to use, you don’t want to have to implement too many of these. When customers ask for support of a given API, can a vendor survive if they ignore these requests? A strategy many vendors are taking is to support multiple APIs with a single implementation. Besides the Swift API, many support the S3 defacto and CDMI standard APIs in their implementation. What is needed for these APIs to co-exist in an implementation? There are basic operations that are nearly identical between them, but what about semantics that have multiple different expressions such as metadata?

Mark Carlson, Alex McDonald and Glyn Bowden lead the discussion of this at the Paris summit.

SummitSlideFront

 

For the implementers of a cloud storage solution, it is not just the semantics of the APIs, but also the Authentication and Authorization mechanisms related to those APIs need to be supported as well. This is typically done by hosting the services that are required somewhere on the network and syncronizing them with a back end Directory service.

Multiple APIs

 Swift leverages Keystone for authentication, and in order to support Swift Clients, you would need to run a Keystone instance on your Auth Server. If you want to support S3 clients, you need a service that is compatible with Signature Version 4 from Amazon. When creating a client, you might use a common library/proxy to insulate your code from the underlying semantic differences of these APIs. Jclouds is such a tool. The latest version of the CDMI API (version 1.1) has capability metadata (like a service catalog) that shows which Auth APIs any given cloud supports. This allows a CDMI Client to use Keystone, for example, as it’s auth mechanism while using the standard HTTP based storage operations and the advanced metadata standards from CDMI. To address the requirements for multiple APIs with the least amount of code duplication, there are some synergies that can be realized.

Storage Operations

  • CRUD – All pretty much determined by HTTP standard (common code)
  • Headers are API unique however (handle in API specific modules)

Security Operations

  • Client communication with Auth Server (API unique)
  • Multiple separate services running in Auth Server

 Looking at two of the interfaces in particular, this chart shows the relationship of the Swift API model and that from the CDMI standard.

CDMISwift

 When an object with a name that includes one or more “/“ characters is stored in a cloud, the model viewed via Swift and the view that CDMI shows are similar. Using CDMI, however, the client has access to additional capabilities to manage each level of “/“ containers and subcontainers. CDMI also standardizes a rich set of metadata that is understood and interpreted by the system implementing the cloud.

If you are looking for information that compares the Amazon S3 API with the CDMI standard one, there is a white paper available.

NewImage

 

 

 

  

The latest version of CDMI – http://www.snia.org/sites/default/files/CDMI_Spec_v1.1.pdf makes this even easier:

  • Spec text that explicitly forbid (in 1.0) functionality required for S3/Swift integration has been removed from the spec (“/”s may create intervening CDMI Containers)
  • Baseline operations (mostly governed by RFC 2616) now documented in Clause 6 (pgs. 28-35)
  • CDMI now uses content type to indicate CDMI-style operations (as opposed to X-CDMI-Specification-Version)
  • Specific authentication is no longer mandatory. CDMI implementations can now use S3 or Swift authentication exclusively, if desired.

CDMI 1.1 now includes a standard means of discovering what auth methods are available: cdmi_authentication_methods (Data System Metadata) 12.1.3   “If present, this capability contains a list of server-supported authentication methods that are supported by a domain. The following values for authentication method strings are defined: 

• “anonymous”-Absence of authentication supported

• “basic”-HTTP basic authentication supported (RFC2617)

• “digest”-HTTP digest authentication supported (RFC2617)

• “krb5″-Kerberos authentication supported, using the Kerberos Domain specified in the CDMI domain (RFC 4559)

• “x509″-certificate-based authentication via TLS (RFC5246)”

The following values are examples of other widely used authentication methods that may be supported by a CDMI server: 

“s3″-S3 API signed header authentication supported 

“openstack”-OpenStack Identity API header authentication supported

Interoperability with these authentication methods are not defined by this international standard. Servers may include other authentication methods not included in the above list. In these cases, it is up to the CDMI client and CDMI server (implementations themselves) to ensure interoperability. When present, the cdmi_authentication_methods data system metadata shall be supported for all domains. 

NewImage

 

 

 

Other resources that are available for developers include:

CDMI for S3 Developers

Comparison of S3/Swift functions

Implementation of CDMI filter driver for Swift

Implementation of S3 filter driver for Swift

 For the slides from the talk, the site snia.org/cloud has the slideshare and .pdf links.

 

 

Object Storage 201 Q&A

Now available on-demand, our recent live CSI Webcast, “Object Storage 201: Understanding Architectural Trade-Offs,” was a highly-rated event that almost 250 people have seen to date. We did not have time to address all of the questions, so here are answers to them. If you think of additional questions, please feel free to comment on this blog.

Q. In terms of load balancers, would you recommend a software approach using HAProxy on Linux or a hardware approach with proprietary appliances like F5 and NetScaler?

A. This really depends on your use case. If you need HA load balancers, or load balancers that can maintain sessions to particular nodes for performance, then you probably need commercial versions. If you just need a basic load balancer, using a software approach is good enough.

Q. With billions of objects what Erasure Codes are more applicable in the long term? Reed Solomon where code words are very small resulting in many billions of code words or Fountain type codes such as LDPC where one can utilize long code words to manage billions of objects more efficiently?

A. Tracking Erase Code fragments have a higher cost than replication but the tradeoff is higher HDD utilization. Using Rateless coding lowers this overhead because each Fragment has equal value. Reed Solomon requires knowledge of fragment placement for repair.

Q. What is the impact of having HDDs of varying capacity within the object store?  Does that affect hashing algorithms in any way?

A. The smallest logical storage unit is a Volume. Because Scale-Out does not stripe volumes there is no impact. Hashing, being used for location would not understand volume size, so a separate Database is used, on a volume basis, to track open space. Hashing algorithms can be modified to suit the underlying disk. The problem is not so much whether they can be designed a priority for the underlying system, but really the rigidity they introduce by tying placement very tightly with topology. That makes failure / exception handling hard.

Q. Do you think RAID6 is sufficient protection with these types of Object Storage Systems or do we need higher parity based Erasure codes?

A. RAID6 makes sense for a Direct Attached storage solution where all drives in the RAID Set can maintain sync. Unlike filesystems (with a few exceptions) Scale-Out Object Storage systems are “Storage as a workload” systems that already have protection as part of the system. So the question is what data protection method is used on solution x as apposed to solution y. You must also think about what you are trying to do.  Are you trying to protect against a single disk failure, or are you trying to protect against a node failure, or are you trying to protect against a site failure. Disk failures – RAID is great, but not if you’re trying to do node failure or site failure. Site failure is an EC sweet spot, but hard to solve from a deployment perspective.

Q. Is it possible to brief how this hash function decides the correct data placement order among the available storage nodes?

A. Take a look at the following links: “http://en.wikipedia.org/wiki/Consistent_hashing“; https://swiftstack.com/openstack-swift/architecture/

Q. What do you consider to be a typical ratio of controller to storage nodes? Is it better to separate the two, or does it make sense to consolidate where a node is both controller and storage?

A. The flexibility of Scale-Out Object Storage makes these two components independently scalable. The systems we test all have separate controllers and storage nodes so we can test this independence. This is also very dependent on the Object Store technology you use. We know of some object stores where there is a 1GB RAM / TB of data, while there are others that use 1/10 of that.  The compute is dependent on whether you are using erasure coding, and what codes. There is no one answer.

Q. Is the data stored in the Storage depository interchangeable with other vendor’s controller units? For instance, can we load LTO tapes from vendor A’s library to Vendor B’s library and have full access to data?

A. The data stored in these systems are part of the “Storage as a workload” principle. So system metadata used to track Objects stored as a function within the Controller. I would not expect any content stored to be interchangeable with another system architecture.

Q. Would you consider the Seagate Kinetic Open Storage Platform a radical architectural shift in how object storage can be done?  Kinetic basically eliminates the storage server, POSIX and RAID or all of the “busy work” that storage servers are involved in today.

A. Ethernet drives with key value interface provides a new approach to design object storage solution. It is yet to be seen how compelling they are for TCO and infrastructure availability.

Q. Will the inherent reduction in blast radius by the move towards Ethernet-interface HDDs be a major driver of the Ethernet HDD in object stores?

A. Yes. We define Blast Radius by a compute failure that impacts access to connected hard drives. As we lower the Number of Connected Hard Drives to compute the Blast Radius is reduced. For Ethernet drives, you may need redundant Ethernet switches to minimize the blast radius.  Blast radius can be also minimized with intelligent data placements with software as well.

Join SNIA-CSI at the OpenStack Summit

Get the tips needed when implementing multiple cloud storage APIs. The SNIA Cloud Storage Initiative (CSI) is hosting a Birds of a Feather session – Tips to Implementing Multiple Cloud Storage APIs at the OpenStack Summit in Paris on November 5th at 9:00 a.m. Room 212/213.

There are three main object storage APIs today; OpenStack’s Swift (open but not standardized), Amazon’s S3 (proprietary yet a defacto standard) and SNIA’s CDMI (an ISO standard). With three APIs to support, it might sound expensive or difficult to support all of them, yet not doing so could be costly when customers want innovation and industry standard solutions and interoperability in your product.

What about the similarities and differences between the APIs, and can they be reconciled? Can these APIs be effectively and efficiently implemented in a single product? I hope you’ll join us at this session to learn about and discuss various ways to cope with this situation. You will discover best practices and tips on how to implement these three protocols in your cloud storage solution.

Register now. I look forward to seeing you on November 5th at the OpenStack Summit.

 

 

New Webcast: Object Storage – Understanding Architectural Trade-Offs

The Cloud Storage Initiative (CSI) is excited to announce a live Webcast as part of the upcoming BrightTalk Cloud Storage Summit on October 16thObject Storage 201: Understanding Architectural Trade-Offs. It’s a follow-up to the SNIA Ethernet Storage Forum’s Object Storage 101: Understanding the What, How and Why behind Object Storage Technologies.

Object-based storage systems are fast becoming one of the key building blocks for a cloud storage infrastructure. They address some of the shortcomings and provide an alternative to more traditional file- and block-based storage for unstructured data.

An object storage system must accommodate growth (and yes, the rumors are true – data growth is a huge and accelerating problem), be flexible in their provisioning, provide support multiple geographies and legal frameworks, and cope with the inevitable issues of resilience, performance and availability.

Register now for this Webcast. Experts from the SNIA Cloud Storage Initiative will discuss:

  • Object Storage Architectural Considerations
  • Replication and Erasure Encoding for resilience
  • Pros and Cons of Hash Tables and Key-Value Databases
  • And more…

This is a live presentation, so please bring your questions and we’ll do our very best to answer them. We hope you’ll join us on October 16th for an unbiased, deep dive into the design considerations for object storage systems.

 

What the CSI is Up to at SDC

What the Cloud Storage Initiative Is Doing At SDC

The SNIA Storage Developer Conference (SDC) is less than a week away. We’re looking forward to the conference and in particular want to make note of some exciting news and events that pertain to work the CSI is doing to promote standards that will increase the adoption, interoperability and portability of data stored in the cloud.

SDC Conference session: Introducing CDMI v1.1 – Tuesday, September 16th, 1:00 p.m. by David Silk. This session introduces the new CDMI 1.1 and provides an overview of capabilities the Technical Work Group have added to the standard, and what CDMI implementers need to know when moving from CDMI 1.0.2 to CDMI 1.1.

Cloud Interoperability Plugfest – Participants at the 12th Cloud Interoperability Plugfest will be testing the interoperability of their cloud storage interfaces based on CDMI. We always have a large showing of CDMI implementations at this event, but are also looking for implementations of Amazon S3, and OpenStack Swift, Cinder and Manila interfaces.

It’s not too late to register for this Plugfest. Find out how here.

SDC 2014 is going to be exciting and educational. It’s “one stop shopping” for IT professionals who focus on the tools, technologies and developments needed for understanding and implementing efficient data storage, management and security. The CSI hopes to see you there.

 

Getting Started with the CDMI Conformance Test Program

Together with our partner, TATA Consultancy Services, we recently had a great live Webcast to launch the Conformance Test Program (CTP) for the SNIA Cloud Data Management Interface (CDMI). CDMI is an ISO/IEC standard that offers end users simplicity and data storage interoperability across a wide range of cloud solutions. Interoperability and portability of data stored in the cloud has become a top IT priority. The CTP tests for conformance against the specification, and provides purchasers of certified cloud storage solutions the assurance that these solutions meet CDMI interoperability standards. Our Webcast is now available on demand. It details the benefits of the CDMI CTP program and explains how any cloud storage vendor or ISV can begin the CTP process. I encourage you to check it out to learn:

  • Key benefits of the CDMI standard for vendors and end users
  • Growing adoption of the CDMI standard
  • The suite of conformance tests required to achieve CDMI CTP certification
  • How to begin the CTP process

In addition to the Webcast replay, I encourage you to check out our CDMI CTP Frequently Asked Questions (FAQ). Getting started is easy. Just fill out the CTP form and you’ll be on your way.