New Cloud Storage Meme – “Enterprise DropBox”

In a number of recent presentations on cloud storage recently, I have started by asking the audience “how many of you use DropBox?” I have seen rooms where more than half of the hands go up. Of course, the next question I ask is “does your corporate IT department know about this?” – sheepish grins abound.

DropBox has been responsible for for a significant fraction of the growth in the number of Amazon S3 objects – that’s where the files end up when you drop them into that icon on your laptop, smartphone or tablet. However, if that file is a corporate document, who is in charge of making sure the data and its storage meets corporate policies for protection, privacy, retention and security? Nobody.

Thus there is now growing interest in bringing that data back in-house and on premise for the enterprise so that business policies for the data can be enforced. This trending meme has been termed “Enterprise Dropbox”. The basic idea is to offer the equivalent service and set of applications to allow corporate IT users to store their corporate documents where the IT department can manage them.

Is this “Private Cloud”? Well, yes in that it uses capitalized corporate storage equipment. But it also sits “at the edge” of the corporate network so as to be accessible by employees wherever they happen to be. In reality, Enterprise DropBox needs to be part of an overall Bring Your Own Device (BYOD) strategy to enable frictionless innovation and collaboration for employees.

Who are likely to be the players in this space? Virtualization vendors such as Citrix (with its ShareFile acquisition) and VMware with its Project Octopus initiative look to be first movers in this space, along with start ups such as Oxygen Cloud. It’s interesting that major storage vendors have not picked up on this as yet.

Digging into how this works, you find that every vendor has a storage cloud with an HTTP based object storage interface that is then exposed to the internet with secure protocols. Each interface is just slightly different enough that there is no interoperability. In addition, each vendor develops, maintains and distributes it own set of client “apps” for operating systems, smartphones and tablets. A key feature is integration of the authentication and authorization with the corporate LDAP directory both for security and to reduce administrative overhead. Support for quotas and department charge back is essential.

Looking down the road, however, this proliferation of proprietary clients and interfaces is already causing headaches for the poor device user, who may have several of these apps on their devices (all maxed out to their “free” limit). The burden on vendors is the development cost of creating and maintaining all those applications on all those different devices and operating systems. We’ve seen this before, however, in the early days of the Windows ecosystem. You used to have to purchase a separate FTP client for early Windows installations. Want NFS? A separate client purchase and install. Of course, now all those standard protocol clients are built into operating systems everywhere. Nobody thinks twice about it.

The same thing will eventual work its way out in the smart device category as well. But not until a standard protocol emerges that all the applications can use (such as FTP or NFS in the Windows case). The SNIA’s Cloud Data Management Interface (CDMI) is poised to meet this need as it’s adoption continues to accelerate. CDMI offers a RESTful HTTP object storage data path that is highly secure and has the features that corporate IT departments need in order to protect and secure data while meeting business policies. It enables each smart device to have a single embedded client to multiple clouds – both public and private. No more proliferation of little icons all going to separate clouds.

What will drive this evolution? You – the corporate customer of these vendor offerings. You can ask the Enterprise DropBox vendors simply to “show me CDMI support in your roadmap”. Educate your employees about choosing smart devices that support the CDMI standard natively. Only then will the market forces compel the vendors to realize that there is no value in locking in their customers. Instead they can differentiate on the innovation and execution that separates them from their competitors. Adoption of a standard such as CDMI will actually accelerate the growth of the entire market as the existing friction between clouds gets ground down and smoothed out by virtue of this adoption.

Validating CDMI Features – Metadata Search

Here we go again with an announcement of a cloud offering that again validates an existing standardized feature of CDMI. The new Amazon CloudSearch offering lets you store structured metadata in the cloud and perform queries on the metadata. They missed an opportunity, however, to integrate this with their existing cloud object storage offering. After all, if you already have object storage, why not put the metadata with the data object instead of separating it out in a separate cloud?

CDMI lets you put the user metadata directly into the storage object, where it is protected, backed up, archived and retained along with the actual data. CDMI’s rich query functions are then able to find the storage object based on the values of the metadata without talking to a separate cloud offering with a new, proprietary API.

CDMI standardizes a Query Queue that allows the client to create a scope specification (equivalent to a WHERE clause) to find specific objects that match the criteria, and a results specification (equivalent to a SELECT clause) that determines the elements of the object that are returned for each match. Results are placed in a CDMI queue object and can be processed one at a time, or in bulk. This powerful feature allows any storage cloud that has a search feature to expose it in a standard manner for interoperability between clouds.

An example of the metadata associated with a query queue is as follows:

{
     "metadata" : {
          "cdmi_queue_type" : "cdmi_query_queue",
          "cdmi_scope_specification" : [
               {
                    "domainURI" : "== /cdmi_domains/MyDomain/",
                    "parentURI" : "starts /MyMusic",
                    "metadata" : {
                         "artist" : "*Bono*"
                    }
               }
          ],
          "cdmi_results_specification": {
               "objectID" : "",
               "metadata" : {
                    "title" : ""
               }
          }
     }
}

 

When results are stored in a query queue, each enqueued value consists of a JSON object of MIME-type “application/json”. This JSON object contains the specified values requested in the cdmi_results_specification of the query queue metadata.

An example of a query result JSON object is as follows:

{
     "objectID" : "00007E7F0010EB9092B29F6CD6AD6824",
     "metadata" : {
          "title" : "Vertigo"
     }
}

Thus if you are using your storage cloud for storing music files, for example, all of the metadata for each mp3 object can be stored right along with the object, and CDMI’s powerful query mechanisms can be used to find the files you are interested in without invoking a separate search cloud with disassociated metadata,

Validating CDMI features – Object Expiration

Validating yet another feature of the CDMI standard (see previous post for an earlier one), Amazon announced their Object Expiration feature for S3. While not a new concept for storage interfaces, it is the first cloud implementation of this capability that I know of. The idea is simply to have the server side of the cloud do object deletion on your behalf automatically, once the lifecycle of that data has completed.

As part of overall Data Lifecycle Management, object deletion is the most common terminal state for data. CDMI has standardized the interface for this capability in cloud storage with a comprehensive Retention and Hold Management feature (Chapter 17). The granularity of the standard CDMI feature is finer than that of the S3 feature in that it allows for retention and deletion on individual objects (although you could accomplish this in S3 with prefix = object name, it doesn’t scale using the header fields that Amazon uses). The S3 prefix mechanism can be used to scope the expiration policy down to individual “directories” (forward slash terminated parts of object names), and CDMI allows this also for the semantically equivalent CDMI sub-containers.

Complying with Regulations

Although the ability to delete objects when their lifecycle completes is useful, it is insufficient for complying with regulations such as Sarbanes-Oxley, or for eDiscovery needs during litigation. For most enterprises, they need to show that the data has not been modified during its lifecycle. In addition, if a subpoena is issued for the data – you DO NOT want the object deleted, even if it’s retention period has expired – this can cost you millions of dollars in a pending court case…

The CDMI standard anticipates that storage clouds will want to offer a more robust, full featured retention and hold management for corporate data, and that a standard means of achieving it will be needed. Take a quick look at Chapter 17 (it’s quite compact while being comprehensive) and investigate using the standard way to achieve this function. If you are a cloud vendor trying to emulate the S3 interface, good luck to you – Amazon will continue to expand the definition of what “S3″ means (like adding this feature), forcing you to constantly modify your cloud’s storage interface to keep up (as well as requiring you to reverse engineer any bugs that exist).

Validating CDMI features – Server Side Encryption

One of the features of many storage systems and even disk drives is the ability to encrypt the data at rest. This protects against a specific threat – the disk drive going out the back door for replacement or repair. So it was only a matter of time before we would see this important feature start to be offered for Cloud Storage as well. Well, today Amazon announced their Server Side Encryption capability for their S3 cloud offering. This feature was anticipated by the CDMI standard interface when it was finalized as a standard back in April 2010.

Standard Server Side Encryption

So, how does CDMI standardize this feature? Well, as usual, it starts with finding out if the cloud actually supports the feature and what choices are available. In CDMI, this is done through the capabilities resource – a kind of catalog or discovery mechanism. By fetching the capabilities resource for objects, containers, domain or queues, you can tell whether server side encryption of data at rest if available from the cloud offering (yes this is granular for a reason). The actual capability name is: cdmi_encryption (see section 12.1.3). This indicates that the cloud can do encryption for the data at rest, but also indicates what algorithms are available to do this encryption. The algorithms are expressed in the form of: ALGORITHM_MODE_KEYLENGTH, where:

“ALGORITHM” is the encryption algorithm (e.g., “AES” or “3DES”).

“MODE” is the mode of operation (e.g.,”XTS”, “CBC”, or “CTR”).

“KEYLENGTH” is the key size (e.g.,”128″,”192″, “256″).

So the cloud can offer the user several different algorithms of different strengths and types, or if it only offers a single algorithm (such as the Amazon offering), the cloud storage client can at least understand what that algorithm is.

So how does the user tell the cloud that she wants her data encrypted? Amazon does this with a proprietary header of course, but CDMI does it with standard Data System Metadata that can be placed on any object, container of objects, queue or domain. This metadata is called cdmi_encryption (see section 16.4), and contains merely a string with a value chosen from the list of available algorithms in the corresponding capability. There is also a cdmi_encryption_provided metadata value to tell the client whether their data is being encrypted or not by the cloud.

Lastly, there is a system-wide capability called cdmi_security_encryption (section 12.1.1) that tells the user whether the cloud does server side encryption at all.

Server side encryption is an important capability for cloud storage offerings to provide, which is why CDMI standardized this in advance of having cloud offerings available. We expect more clouds to offer this in the future, and customers to soon realize that – without CDMI implementations, these offerings are locking them in and causing a high cost of exiting that vendor.

Join the Cloud Storage Movement at SNIA’s Winter Symposium 2011

Every year the Storage Networking Industry Association (SNIA) has a gathering of their members in San Jose to coordinate the work of the various Technical Work Groups, Forums and Initiatives. This year the Symposium will take place January 24th – 27th, 2011 at the Sainte Claire Hotel in San Jose, CA. SNIA opens this Symposium to non-SNIA members who are evaluating membership, so feel free to attend. Please Register for the Symposium if you plan to be there in person.

SNIA Cloud Events

The Cloud Storage Technical Work Group (TWG) kicks off a multi-day face to face session starting at 1:00pm PT on Monday. We will be discussing the submission of CDMI for international standardization and continuing to discuss the scope of the next minor release (1.1) of CDMI. Topics include Federation and NoSQL among others. Bring your own ideas for how to improve CDMI. The full agenda has been posted publicly.

On Wednesday, the Cloud Storage Initiative will give an overview of their activities at a breakfast session starting at 8:30am. Then at noon on Wednesday, be sure and join us for the 2011 Activities Kickoff presentation in the Grande Ballroom. We will be showcasing all of the upcoming activities that you will want to be involved with over the next year. This session will be live streamed if you cannot make it in person. Regardless of whether you will be there in person or remote, please register for this update event (in addition to the Symposium registration above). More information.

Wednesday afternoon is the meeting of the Cloud Storage Initiative from 1-5pm (also in the Grande Ballroom). Be sure and join us and help plan the activities for the upcoming year.

Lastly, on Wednesday night there will be a Birds of Feather (BOF) session on a new group that is forming for the Archive and Preservation in the Cloud.

Whereas with Cloud Backup, the cloud is simply a repository of backup data, with Cloud Archive and Preservation, the Cloud is where the active processes occur that ensure long term retention, preservation and viability of data.
CDMI is uniquely designed to accommodate these needs with the Data System Metadata that it standardizes.
Cloud providers see the ability to offer more than just a best effort storage area with the promise of being the trusted steward of information for the long term.
Additional services such as eDiscovery and automatic format conversion can easily be offloaded to the cloud reducing costs.

Please join us Wednesday evening from 5:30pm – 7:00pm in the Grande Ballroom for a Birds of Feather session to kick off the formation of the CSI Archive/Preservation Special Interest Group (SIG). Light refreshments will be provided. If you would like to participate remotely, please use the following call in information:
Toll Free: 866-244-8528
International:+1-719-457-0816
Passcode: 510843#
Webex: http://snia.webex.com, Meeting Name: Archive and Preservation SIG
Meeting Password: cloud2011

Why not pick one of the “open” APIs instead of CDMI?

There is a post by Jerry Huang , CEO of Gladinet on the problems with trying to be compatible with Amazon’s S3 API. Jerry suggest you look at OpenStack or a common library instead.

Amazon’s API (as with any cloud vendor’s API) is a moving target for sure, but the main issue is that these APIs are under the change control of a single vendor. Doesn’t matter how “open” the API is (in terms of copyright license) because the vendor can change it to disadvantage a competitor. So if you are a competitor, you would be foolish to use that API as the only interface into your cloud. So what happens? Each cloud vendor releases their own “open” API – similar but slightly different (enough to get around copyright), almost always RESTful and pretty much they all do the same thing.

So, you get the situation we have today with rapid proliferation of many different interfaces all pretty much the same. But that doesn’t help the poor clients. They have to code to N different interfaces to work with N different clouds. And since they are rapidly evolving, they have to keep up with all these API changes over time.

The Cloud Storage standard CDMI does not have this problem. CDMI is under the change control of a standards body (SNIA) and accommodates requirements from all the cloud storage players in it’s standardization process. More importantly, it was developed under the SNIA IP policy to help prevent any of the specification author companies from gaming the spec with their Intellectual Property. Thus cloud vendors can pick up the CDMI specification and implement it with confidence. They don’t need to come up with their own API. CDMI also has a standard way to extend the specification for vendor specific functions that still allows for core compatibility with other vendors. Want to do versioning? There is an example vendor extension in CDMI that shows you how.

From a client side point of view, Jerry also mentions common libraries. Jclouds is a good example of this (for Java). There also common libraries for other languages. While that can insulate a client from the many proliferating APIs, it’s a tough task to keep that library up to date with these APIs (just ask Adrian). The sooner the various cloud providers can implement the CDMI standard (even along-side of their existing ones), the sooner common libraries like Jclouds can just maintain a single adapter to a standard API.

SNIA Cloud Activities for 2010

Given that it’s the middle of summer it may be hot where you are, but the SNIA Cloud activities are heating up for the remainder of this year, and you don’t want to be left out.

SNIA Summer Symposium

At the end of July every year SNIA hosts a Symposium in San Jose for all the groups. The Cloud Storage TWG will be meeting from Monday afternoon through Thursday morning. The agenda is posted publicly and non-SNIA members are encouraged to attend.

Also at the Symposium Monday night is a Birds of Feather (BOF) session where we will be doing a demo of CDMI and OCCI working together in a common infrastructure. There will be time for details on the implementation and discussion afterward.

Thursday morning will be a special session to update folks on the SNIA Cloud activities for the remainder of the year. Besides the in person session at the Symposium, the session will also be broadcast as an online Webinar for folks who cannot make it in person. More information and a registration link is available on the SNIA Website.

Storage Developer Conference

#alttext#
In September will be the annual Storage Developer Conference (SDC) and this year Cloud is a big part of the agenda. There will be a CDMI Plugfest throughout the week, a Cloud Hands on Lab for developers, and Cloud Tracks all week including some big cloud related keynotes. But *wait* there’s more. Following SDC at the same hotel on Thursday September 23rd will be the…

SNIA Cloud Burst Event

#alttext# This is an event that is squarely focused on Cloud Storage and brings together end users, cloud providers and storage vendors for a unique experience including demos, a showcase and in depth sessions on this part of the overall cloud industry. More information is available on the Cloud Burst page.

Storage Networking World

For the past two SNWs, there has been a Cloud Pavilion with great traffic and interest from the attendees for those that participate. At this fall’s SNW in Dallas, we will repeat this successful program with a limited number of slots. In addition we will again have a hands on lab for cloud that is always well attended (by end users only). If you are looking for a speaking opportunity, please consider being a sponsor of the cloud summit at SNW where end users come to learn about the cloud and the offerings that are available.

SNW Europe

Last year SNW Europe was a huge success for the SNIA Cloud Participants, with a year over year increase in record attendance. This year will see an increasing set of activities around the cloud, including a new Cloud Pavilion and Hands on Labs. There are a limited number of slots for these and they will sell out early. Included is an opportunity for a speaking engagement as well.

“Membership has it’s privileges”

Many of these opportunities are open only to Cloud Storage Initiative (CSI) member companies. The membership fees help to fund these activities for the members and augment the work of the volunteers with paid resources. If you can help get your company involved, please contact Marty Foltyn (marty@bitsprings.com) for more information.