SSSI PCIe SSD Taskforce Enters Final Stretch

With opening day of the Del Mar races in my home town on Wednesday, it seems only fitting to note that the SSSI PCIe SSD taskforce is rounding the last turn in its informational call schedule.

If you have a stake in this fast growing technology area, you won’t want to miss the final two calls on July 16 and July 30 at 7:00 pm ET/4:00 PM PT.

The July 16 call will feature a talk by Narinder Lall of eASIC on PCIe Controllers and a presentation by Walt Hubis of the SNIA Security Technical Work Group on Security and Removable NVRAM PCIe Storage.

Join the teleconference at 1-866-439-4480 passcode 25478081# and the webex at snia.webex.com meeting id 797-289-257 passcode pcie2012.

Finally, if you’ve missed any calls to this point, catch up by visiting http://snia.org/forums/sssi/pcie.

See you at the races!

Live Webcast: 10GbE – Key Trends, Drivers and Predictions

The SNIA Ethernet Storage Forum (ESF) will be presenting a live Webcast on 10GbE on Thursday, July 19th.  Together with my SNIA colleagues, David Fair and Gary Gumanow, we’ll be discussing the technical and economic justifications that will likely make 2012 the “breakout year” for 10GbE.  We’ll cover the disruptive technologies moving this protocol forward and highlight the real-world benefits early adopters are seeing. I hope you will join us!

The Webacast will begin at 8:00 a.m. PT/11:00 a.m. ET. Register Now: http://www.brighttalk.com/webcast/663/50385

This event is live, so please come armed with your questions. We’ll answer as many as we can on the spot and include the full Q&A here in a SNIA ESF blog post.

We look forward to seeing you on the 19th!

Live Webcast: 10GbE – Key Trends, Drivers and Predictions

The SNIA Ethernet Storage Forum (ESF) will be presenting a live Webcast on 10GbE on Thursday, July 19th.  Together with my SNIA colleagues, David Fair and Gary Gumanow, we’ll be discussing the technical and economic justifications that will likely make 2012 the “breakout year” for 10GbE.  We’ll cover the disruptive technologies moving this protocol forward and highlight the real-world benefits early adopters are seeing. I hope you will join us!

The Webacast will begin at 8:00 a.m. PT/11:00 a.m. ET. Register Now: http://www.brighttalk.com/webcast/663/50385

This event is live, so please come armed with your questions. We’ll answer as many as we can on the spot and include the full Q&A here in a SNIA ESF blog post.

We look forward to seeing you on the 19th!

Impressions from Cisco Live 2012

I recently attended Cisco Live in San Diego last week and wanted to share some of my impressions of the show.

First of all, the weather was a disappointment. I’m a native Californian (the northern state of course) and I was looking forward to some sweet weather instead of the cool overcast climate. It’s been so nice in Boston, I have been spoiled.

Attendance was huge. I heard something north of 17,000 attendees. I don’t know if that was actual attendees or registrations. But, it was a significant number and I had several engaging conversations about data center trends, applications, as well as general storage inquiries with the attendees.

Presenting at the Intel Booth

My buddies at Intel asked to make a couple of presentations at their booth and I spoke on the current status of 10GbE adoption and the value it offers. My two presentations were in the morning of the first two full days of the show. Things didn’t look good when only a few attendees were seated at the time we were about to start. My first impression seeing the empty seats in the theater was, “the Intel employees better make a great audience.”

Fortunately, the 20 or so seats filled just as I started with more visitors standing in the back and side. The number of attendees doubled the second day, so maybe I built a reputation.  Yeah, right.

Anyway, let me share just a couple of the ideas from my presentation here:

1)      10GbE is an ideal network infrastructure that offers great flexibility and performance with the ability to support a variety of workloads and applications. For storage, both block and file based protocols are supported which is ideal for today’s highly virtualized infrastructures.

2)      The ability to consolidate data traffic over a shared network promises significant capital and operational benefits for organizations currently supporting data centers with mixed network technologies. These benefits include fewer ports, cables, and components which mean less equipment to purchase, manage, power and cool. Goodness all around.

3)      There are a couple of applications in particular that are making 10GbE particularly useful.

  1. Virtualization – high VM density drives increase bandwidth requirements from server to storage
  2. Flash / SSD – flash memory drives increased performance at both the server and storage which requires increased bandwidth

After the presentation, I asked for questions and was pleased with the number and quality of questions. Sure, we were giving away swag (Intel t-shirts). But, the relevance of the questions was particularly interesting. Many customers were considering deploying converged networks or just moving to Ethernet from Fibre Channel infrastructures. Some of the questions included, where would you position iSCSI vs FCoE? What are the ideal use cases for each? When do you expect to see 40GbE or 100GbE and for what applications? What about other network technologies, such as Infiniband?

Interestingly, very few if any were planning to move to 16Gb Fibre Channel. Now, this was a Cisco show, so I would expect attendees to be there because they favor Cisco’s message and technology or are in the process of evaluating it. So, given Cisco’s strength and investment in 10GbE, it shouldn’t be a surprise that most attendees at the show, or at least my presentation, were leaning that direction. But, I didn’t expect it to be so one sided.

Conclusion

Interest in vendor technology shows is clearly surpassing other industry events, and Cisco Live is no exception. And each Cisco Live event continues to reflect greater interest from customers in 10GbE in the datacenter.

Updated Client Solid State Performance Test Specification Now Available

SNIA’s Solid State Storage Initiative has just released a revised Client SSS Performance Test Specification (PTS-Client) which adds a new write saturation test and refines existing tests.

The Solid State Storage Performance Test Specification (PTS) is a device-level performance test suite for benchmarking and comparing performance among SAS, SATA and PCI Express SSDs

Revision 1.1 of the PTS-Client updates tests for IOPS, throughput and latency to more accurately reflect the workload conditions under which Client SSDs are used.  The PTS-Client v1.1 also adds a Write Saturation test that measures the initial Fresh-Out-of-Box state of SSDs and their performance evolution as data is randomly written to the device.

Eden Kim, Chair of SNIA’s SSS Technical Working Group, describes the primary updates to PTS-Client v1.1 as adjustments to preconditioning ranges and test boundaries.   Taken together, these parameters create a repeatable test stimulus that more accurately reflects the workload characteristics of SSDs used in a single user environment The PTS-Client v1.1 also adds an easily understandable description of each test, which helps the user to understand the purpose of the test, the test flow, and guidance on how to interpret the test results.

Sample test results using the PTS-Client v1.1 have been posted to the SNIA SSSI Understanding PTS Performance webpage.

 

Full Steam Ahead for SNIA SSSI PCIe Task Force – It’s Not Too Late to Participate!

SSSI’s PCIe SSD Task Force has covered a lot of ground since the inaugural call April 9.  Sixty-five organizations are now participating with 125 members on the email reflector.  The first four calls identified issues, with speakers from the SSSI, Agilent, Calypso, HP, LeCroy, Marvell, Micron, Seagate, Stec, Toshiba, and Virident taking a closer look at standards; discussing a PCIe test hardware RTP refresh; presenting results of a survey on how many IOPS are enough; discussing PCIe test methodology and system integration issues; presenting on 2.5” PCIe form factor; and reviewing SCSI Express, the PCI SIG, and PCIe system and form factor concerns.   All call notes are at http://snia.org/forums/sssi/pcie.

The open meeting roadmap now ramps with calls on the following topics – Big Picture – What Does It All Mean? (June 4); Deployment Strategies/Market Development (June 18); Where Do We Go From Here? (July 2); and Roadmaps and Milestones 2012 (July 16).  All SSSI members are invited to attend – calls are 4:00 pm – 5:30 pm PT and details are at http://snia.org/forums/sssi/pcie.

PCIe SSD Task Force activities will culminate in a PCIe Task Force Face-to-Face Meeting August 20 from 5:30 pm – 7:00 pm at the Flash Memory Summit in Santa Clara CA (www.flashmemorysummit.com). Contact SSSI at pciechair@snia.org if you would like to attend.

Membership is complimentary in the PCIe SSD Task Force and all current SSSI members are welcome to participate.   After July, the Task Force will change format to a SSSI Committee and companies not already SSSI members will need to join the SNIA and SSSI to participate. For additional information, or to join, please contact the PCIe Taskforce Chair at pciechair@snia.org.

Full Steam Ahead for SNIA SSSI PCIe Task Force – It’s Not Too Late to Participate!

SSSI’s PCIe SSD Task Force has covered a lot of ground since the inaugural call April 9.  Sixty-five organizations are now participating with 125 members on the email reflector.  The first four calls identified issues, with speakers from the SSSI, Agilent, Calypso, HP, LeCroy, Marvell, Micron, Seagate, Stec, Toshiba, and Virident taking a closer look at standards; discussing a PCIe test hardware RTP refresh; presenting results of a survey on how many IOPS are enough; discussing PCIe test methodology and system integration issues; presenting on 2.5” PCIe form factor; and reviewing SCSI Express, the PCI SIG, and PCIe system and form factor concerns.   All call notes are at http://snia.org/forums/sssi/pcie.

The open meeting roadmap now ramps with calls on the following topics – Big Picture – What Does It All Mean? (June 4); Deployment Strategies/Market Development (June 18); Where Do We Go From Here? (July 2); and Roadmaps and Milestones 2012 (July 16).  All SSSI members are invited to attend – calls are 4:00 pm – 5:30 pm PT and details are at http://snia.org/forums/sssi/pcie.

PCIe SSD Task Force activities will culminate in a PCIe Task Force Face-to-Face Meeting August 20 from 5:30 pm – 7:00 pm at the Flash Memory Summit in Santa Clara CA (www.flashmemorysummit.com). Contact SSSI at pciechair@snia.org if you would like to attend.

Membership is complimentary in the PCIe SSD Task Force and all current SSSI members are welcome to participate.   After July, the Task Force will change format to a SSSI Committee and companies not already SSSI members will need to join the SNIA and SSSI to participate. For additional information, or to join, please contact the PCIe Taskforce Chair at pciechair@snia.org.

New Cloud Storage Meme – “Enterprise DropBox”

In a number of recent presentations on cloud storage recently, I have started by asking the audience “how many of you use DropBox?” I have seen rooms where more than half of the hands go up. Of course, the next question I ask is “does your corporate IT department know about this?” – sheepish grins abound.

DropBox has been responsible for for a significant fraction of the growth in the number of Amazon S3 objects – that’s where the files end up when you drop them into that icon on your laptop, smartphone or tablet. However, if that file is a corporate document, who is in charge of making sure the data and its storage meets corporate policies for protection, privacy, retention and security? Nobody.

Thus there is now growing interest in bringing that data back in-house and on premise for the enterprise so that business policies for the data can be enforced. This trending meme has been termed “Enterprise Dropbox”. The basic idea is to offer the equivalent service and set of applications to allow corporate IT users to store their corporate documents where the IT department can manage them.

Is this “Private Cloud”? Well, yes in that it uses capitalized corporate storage equipment. But it also sits “at the edge” of the corporate network so as to be accessible by employees wherever they happen to be. In reality, Enterprise DropBox needs to be part of an overall Bring Your Own Device (BYOD) strategy to enable frictionless innovation and collaboration for employees.

Who are likely to be the players in this space? Virtualization vendors such as Citrix (with its ShareFile acquisition) and VMware with its Project Octopus initiative look to be first movers in this space, along with start ups such as Oxygen Cloud. It’s interesting that major storage vendors have not picked up on this as yet.

Digging into how this works, you find that every vendor has a storage cloud with an HTTP based object storage interface that is then exposed to the internet with secure protocols. Each interface is just slightly different enough that there is no interoperability. In addition, each vendor develops, maintains and distributes it own set of client “apps” for operating systems, smartphones and tablets. A key feature is integration of the authentication and authorization with the corporate LDAP directory both for security and to reduce administrative overhead. Support for quotas and department charge back is essential.

Looking down the road, however, this proliferation of proprietary clients and interfaces is already causing headaches for the poor device user, who may have several of these apps on their devices (all maxed out to their “free” limit). The burden on vendors is the development cost of creating and maintaining all those applications on all those different devices and operating systems. We’ve seen this before, however, in the early days of the Windows ecosystem. You used to have to purchase a separate FTP client for early Windows installations. Want NFS? A separate client purchase and install. Of course, now all those standard protocol clients are built into operating systems everywhere. Nobody thinks twice about it.

The same thing will eventual work its way out in the smart device category as well. But not until a standard protocol emerges that all the applications can use (such as FTP or NFS in the Windows case). The SNIA’s Cloud Data Management Interface (CDMI) is poised to meet this need as it’s adoption continues to accelerate. CDMI offers a RESTful HTTP object storage data path that is highly secure and has the features that corporate IT departments need in order to protect and secure data while meeting business policies. It enables each smart device to have a single embedded client to multiple clouds – both public and private. No more proliferation of little icons all going to separate clouds.

What will drive this evolution? You – the corporate customer of these vendor offerings. You can ask the Enterprise DropBox vendors simply to “show me CDMI support in your roadmap”. Educate your employees about choosing smart devices that support the CDMI standard natively. Only then will the market forces compel the vendors to realize that there is no value in locking in their customers. Instead they can differentiate on the innovation and execution that separates them from their competitors. Adoption of a standard such as CDMI will actually accelerate the growth of the entire market as the existing friction between clouds gets ground down and smoothed out by virtue of this adoption.

Validating CDMI Features – Metadata Search

Here we go again with an announcement of a cloud offering that again validates an existing standardized feature of CDMI. The new Amazon CloudSearch offering lets you store structured metadata in the cloud and perform queries on the metadata. They missed an opportunity, however, to integrate this with their existing cloud object storage offering. After all, if you already have object storage, why not put the metadata with the data object instead of separating it out in a separate cloud?

CDMI lets you put the user metadata directly into the storage object, where it is protected, backed up, archived and retained along with the actual data. CDMI’s rich query functions are then able to find the storage object based on the values of the metadata without talking to a separate cloud offering with a new, proprietary API.

CDMI standardizes a Query Queue that allows the client to create a scope specification (equivalent to a WHERE clause) to find specific objects that match the criteria, and a results specification (equivalent to a SELECT clause) that determines the elements of the object that are returned for each match. Results are placed in a CDMI queue object and can be processed one at a time, or in bulk. This powerful feature allows any storage cloud that has a search feature to expose it in a standard manner for interoperability between clouds.

An example of the metadata associated with a query queue is as follows:

{
     "metadata" : {
          "cdmi_queue_type" : "cdmi_query_queue",
          "cdmi_scope_specification" : [
               {
                    "domainURI" : "== /cdmi_domains/MyDomain/",
                    "parentURI" : "starts /MyMusic",
                    "metadata" : {
                         "artist" : "*Bono*"
                    }
               }
          ],
          "cdmi_results_specification": {
               "objectID" : "",
               "metadata" : {
                    "title" : ""
               }
          }
     }
}

 

When results are stored in a query queue, each enqueued value consists of a JSON object of MIME-type “application/json”. This JSON object contains the specified values requested in the cdmi_results_specification of the query queue metadata.

An example of a query result JSON object is as follows:

{
     "objectID" : "00007E7F0010EB9092B29F6CD6AD6824",
     "metadata" : {
          "title" : "Vertigo"
     }
}

Thus if you are using your storage cloud for storing music files, for example, all of the metadata for each mp3 object can be stored right along with the object, and CDMI’s powerful query mechanisms can be used to find the files you are interested in without invoking a separate search cloud with disassociated metadata,

Data Reduction Research Notes

With the continuing system enterprise data growth rates, which in some areas may even exceed 100% year over year, according to the IDC, many technical approaches to reducing overall storage needs are being investigated. The following is a short review of the areas in which interesting technical solutions have been implemented. One primary technique which has been receiving a lot of attention involves ‘Deduplication’ technology, which can be divided into many areas. Some papers covering deduplication overviews are currently available in the DPCO presentation & tutorial page, at http://www.snia.org/forums/dpco/knowledge/pres_tutorials. A new presentation by Gene Nagle (the current chairman of the DPCO) and Thomas Rivera will be posted there soon, and will be presented at the upcoming spring 2012 SNW conference.

Other areas which have been investigated involve storage management, rather than concentrating on data reduction. This involves implementing storage tiers, as well as creating new technologies, such as Virtual Tape Libraries and Solid State Devices, in order to ease the implementation of various tiers. Here are the areas which seem to have had quite a bit of activity.

Data reduction areas

• Compression
• Thin Provisioning
• Deduplication, which includes
o File deduplication
o Block deduplication
o Delta block optimization
o Application Aware deduplication
o Inline vs. Post processing deduplication
o Virtual Tape Library (VTL) deduplication

Storage Tiering

Tiered storage arranges various storage components in a structured organization, in order to have data storage automatically migrated between storage components which have significantly different performance as well as cost. These components are quite variable, based on performance characteristics and throughput, location with regards to the servers, overall cost, media types, and other issues. The policies based on these parameters which are developed to define each tier will have significant effects, since these policies determine the movement of data within the various tiers, and the resulting accessibility of that data. An overview of Storage Tiering, called “What’s Old Is New Again”, written by Larry Freeman, is available in this DPCO blog, and he will also be giving a related presentation at the Spring 2012 SNW.

SSD and Cache Management

Solid state memory has become quite popular, since it has such high retrieval performance rate, and can be used both as much larger cache implementation than before, as well as the top level for tiered storage. A good discussion of this is at http://www.informationweek.com/blog/231901631

VTL

Storage presented as a virtual tape library will allow integration with current backup software, using various direct attach or network connections, such as SAS, FibreChannel, or iSCSI. A nice overview is at http://searchdatabackup.techtarget.com/feature/Virtual-tape-library-VTL-data-deduplication-FAQ.

Thin Provisioning

Thin provisioning is a storage reduction technology which uses storage virtualization to reduce overall usage; for a brief review, see http://www.symantec.com/content/en/us/enterprise/white_papers/b-idc_exec_brief_thin_provisioning_WP.en-us.pdf

Deduplication Characteristics & Performance Issues

When looking at the overall coverage of deduplication techniques, it appears that file level deduplication can cover a high percentage of the overall storage, which may offer a simpler and quicker solution for data reduction. Block level deduplication may introduce bigger performance and support issues and will add a layer of indirection, in addition to de-linearizing data placement, but it is needed for some files, such as VM & filesystem images. In addition, when performing deduplication on backup storage, this may not be a severe issue.

One deduplication technique called sparse file support, where chunks of zeros are mapped by marking their existence in metadata, is available in NTFS, XFS, and the ext4 file systems, among others. In addition, the Single Instance Storage (SIS) technique, which replaces duplicate files with copy-on-write links, is useful and performs well.

Source side deduplication is complex; storage side deduplication is much simpler, so implementing deduplication at the storage site, rather than at the server site, may be preferable. In addition, global deduplication in clustered environments or SAN/NAS environments can be quite complex, and may lead to fragmentation, so local deduplication, operating within each storage node, is a simpler solution. It uses a hybrid duplicate detection model aiming for file-level deduplication, and reverting to segment level deduplication only when necessary. This reduces the global problems to simple routing issues, so that the incoming files are routed to the node which has the highest likelyhood of possessing a duplicate copy of the file, or of parts of the file.

See “A Study of Practical Deduplication”, given the best paper award at USENIX Fast 2011: http://www.usenix.org/events/fast11/tech/full_papers/Meyer.pdf. It has references to other papers which discuss various experiments and measurements with deduplication and other data reduction techniques. Also, look at various metrics, discussed in “Tradeoff in Scalable Data Routing for Deduplication Clusters” at http://www.usenix.org/events/fast11/tech/full_papers/Dong.pdf