Data Reduction Research Notes

With the continuing system enterprise data growth rates, which in some areas may even exceed 100% year over year, according to the IDC, many technical approaches to reducing overall storage needs are being investigated. The following is a short review of the areas in which interesting technical solutions have been implemented. One primary technique which has been receiving a lot of attention involves ‘Deduplication’ technology, which can be divided into many areas. Some papers covering deduplication overviews are currently available in the DPCO presentation & tutorial page, at http://www.snia.org/forums/dpco/knowledge/pres_tutorials. A new presentation by Gene Nagle (the current chairman of the DPCO) and Thomas Rivera will be posted there soon, and will be presented at the upcoming spring 2012 SNW conference.

Other areas which have been investigated involve storage management, rather than concentrating on data reduction. This involves implementing storage tiers, as well as creating new technologies, such as Virtual Tape Libraries and Solid State Devices, in order to ease the implementation of various tiers. Here are the areas which seem to have had quite a bit of activity.

Data reduction areas

• Compression
• Thin Provisioning
• Deduplication, which includes
o File deduplication
o Block deduplication
o Delta block optimization
o Application Aware deduplication
o Inline vs. Post processing deduplication
o Virtual Tape Library (VTL) deduplication

Storage Tiering

Tiered storage arranges various storage components in a structured organization, in order to have data storage automatically migrated between storage components which have significantly different performance as well as cost. These components are quite variable, based on performance characteristics and throughput, location with regards to the servers, overall cost, media types, and other issues. The policies based on these parameters which are developed to define each tier will have significant effects, since these policies determine the movement of data within the various tiers, and the resulting accessibility of that data. An overview of Storage Tiering, called “What’s Old Is New Again”, written by Larry Freeman, is available in this DPCO blog, and he will also be giving a related presentation at the Spring 2012 SNW.

SSD and Cache Management

Solid state memory has become quite popular, since it has such high retrieval performance rate, and can be used both as much larger cache implementation than before, as well as the top level for tiered storage. A good discussion of this is at http://www.informationweek.com/blog/231901631

VTL

Storage presented as a virtual tape library will allow integration with current backup software, using various direct attach or network connections, such as SAS, FibreChannel, or iSCSI. A nice overview is at http://searchdatabackup.techtarget.com/feature/Virtual-tape-library-VTL-data-deduplication-FAQ.

Thin Provisioning

Thin provisioning is a storage reduction technology which uses storage virtualization to reduce overall usage; for a brief review, see http://www.symantec.com/content/en/us/enterprise/white_papers/b-idc_exec_brief_thin_provisioning_WP.en-us.pdf

Deduplication Characteristics & Performance Issues

When looking at the overall coverage of deduplication techniques, it appears that file level deduplication can cover a high percentage of the overall storage, which may offer a simpler and quicker solution for data reduction. Block level deduplication may introduce bigger performance and support issues and will add a layer of indirection, in addition to de-linearizing data placement, but it is needed for some files, such as VM & filesystem images. In addition, when performing deduplication on backup storage, this may not be a severe issue.

One deduplication technique called sparse file support, where chunks of zeros are mapped by marking their existence in metadata, is available in NTFS, XFS, and the ext4 file systems, among others. In addition, the Single Instance Storage (SIS) technique, which replaces duplicate files with copy-on-write links, is useful and performs well.

Source side deduplication is complex; storage side deduplication is much simpler, so implementing deduplication at the storage site, rather than at the server site, may be preferable. In addition, global deduplication in clustered environments or SAN/NAS environments can be quite complex, and may lead to fragmentation, so local deduplication, operating within each storage node, is a simpler solution. It uses a hybrid duplicate detection model aiming for file-level deduplication, and reverting to segment level deduplication only when necessary. This reduces the global problems to simple routing issues, so that the incoming files are routed to the node which has the highest likelyhood of possessing a duplicate copy of the file, or of parts of the file.

See “A Study of Practical Deduplication”, given the best paper award at USENIX Fast 2011: http://www.usenix.org/events/fast11/tech/full_papers/Meyer.pdf. It has references to other papers which discuss various experiments and measurements with deduplication and other data reduction techniques. Also, look at various metrics, discussed in “Tradeoff in Scalable Data Routing for Deduplication Clusters” at http://www.usenix.org/events/fast11/tech/full_papers/Dong.pdf

The Future of Flash & SSDs Is Not-So-Bleak

You have have seen articles about the study by a UCSD researcher that says future of Flash (and NAND Flash-based SSDs)  is bleak: http://www.networkworld.com/news/2012/021612-ssds-have-a-bleak-future-256255.html?source=NWWNLE_nlt_daily_am_2012-02-20

Well, SSSI member Allyn Malventano of PC Perspectives begs to differ: http://www.pcper.com/reviews/Editorial/NAND-Flash-Memory-Future-Not-So-Bleak-After-All

 

SNIA ESF Sponsored Webinar on Advances in NFSv4

Good news.

The SNIA Ethernet Storage Forum (ESF) will be presenting a live webinar on the topic of NFS version 4, including version 4.1 (RFC 5661) as well as a glimpse of what is being considered for version 4.2. The expert on the topic will be Alex McDonald, SNIA NFS SIG co-chair. Gary Gumanow, ESF Board Member will moderate the webinar.

The webinar will begin at 8am PT / 11am ET. You can register for this BrightTalk hosted event here http://www.brighttalk.com/webcast/663/41389.

The webinar will be interactive, so feel free to ask questions of the guest speaker. Questions will be addressed live during the webinar. Answers to questions not addressed during the webinar will be included with answers from the webinar on a blog post after the event on the SNIA ESF blog.

So, get registered. We’ll see you on the 29th.

Recommended Reading List on SSDs and Performance

SSSI has developed an extensive library of educational materials about SSD performance and how to use the SSS Performance Test Specifications to measure it.  If you’re new to SSDs or simply want to become more knowledgeable on the subject, we can help.

Below is a list of white papers, presentations, webcasts, and even a video that discuss SSDs, SSD performance and how it should be measured.  The list is in the recommended order of reading / viewing, and ranges from basic overviews to technical details.  Hope you find this useful.

  1. What more logical place to start than Solid State Storage 101?  This white paper talks about SSDs, how they work and how they fit into system architectures.
  2. Another white paper, NAND Flash Solid State Storage for the Enterprise, looks at Flash memory in more detail and how SSD controllers work.
  3. Facing an SSS Decision? Here is How SNIA is Helping Users Evaluate SSS Performance is a presentation that starts to delve into SSD performance and the basic principles of the SSS Performance Test Specification.
  4. The presentation Validating SSS Performance also introduces the SSS PTS, but in additional detail.
  5. The Solid State Storage Performance Test Specification (SSS PTS) White Paper provides an easily understandable introduction to the SSS PTS.
  6. Here’s a video of our own Eden Kim Describing the SSS PTS at Storage Visions 2012.
  7. SNIA Solid State Storage Test Specification is a more technical description of the contents of the SSS PTS.
  8. Now that you’ve read all about them, the actual SSS PTS documents can be downloaded here.
  9. And finally, SSSI has put together a webpage on Understanding SSD Performance, which explains the test results generated from the SSS PTS and what they mean to users.

You can find a lot of other informative material related to SSDs on the SSSI Education page.

If you have any questions, comments or requests, please comment on this post or send a message to asksssi@snia.org.

Solid State Storage Contributors Honored at SNIA Symposium

Passionate and dedicated volunteers are vital to the success of SNIA and its programs.  Congratulations to the following SNIA Solid State Storage honorees, selected by the entire SNIA member community, who were recognized for their 2011 contributions!

Volunteer of the Year recognizes an individual who, above all others in 2011, consistently stepped up and helped SNIA achieve something new and groundbreaking or who significantly advanced an existing program. Congratulations to Paul Wassenberg of Marvell, SNIA Solid State Storage Initiative Chair, winner of the 2011 SNIA Volunteer of the Year for his leadership in Solid State Storage education and outreach of SSSI activities including the Enterprise and Client Performance Test Specifications and Understanding SSD Performance Project .

The Industry Impact Honoree recognizes an individual who has significantly advanced a cause for SNIA leading to an impact on the industry or the Association. Congratulations to Eden Kim of Calypso Systems, SNIA Solid State Storage Technical Work Group, winner of the 2011 Industry Impact Award for his leadership on development of the Solid State Storage Enterprise and Client Performance Test Specifications.

The Most Significant Impact by a Technical Work Group recognizes the SNIA TWG, which above all others in 2011, had members and efforts which consistently stepped up and helped SNIA achieve something new and groundbreaking or which significantly advanced an existing program. Congratulations to the Solid State Storage Technical Work Group, honored in 2011 for their development of the Solid State Storage Enterprise and Client Performance Test Specifications.

Eden Kim, SNIA SSS TWG Chair, receiving "SNIA Industry Impact Award" from Wayne Adams, SNIA Chairman of the Board, and Leo Leger, SNIA Executive Director

Solid State Storage Contributors Honored at SNIA Symposium

Passionate and dedicated volunteers are vital to the success of SNIA and its programs.  Congratulations to the following SNIA Solid State Storage honorees, selected by the entire SNIA member community, who were recognized for their 2011 contributions!

Volunteer of the Year recognizes an individual who, above all others in 2011, consistently stepped up and helped SNIA achieve something new and groundbreaking or who significantly advanced an existing program. Congratulations to Paul Wassenberg of Marvell, SNIA Solid State Storage Initiative Chair, winner of the 2011 SNIA Volunteer of the Year for his leadership in Solid State Storage education and outreach of SSSI activities including the Enterprise and Client Performance Test Specifications and Understanding SSD Performance Project .

The Industry Impact Honoree recognizes an individual who has significantly advanced a cause for SNIA leading to an impact on the industry or the Association. Congratulations to Eden Kim of Calypso Systems, SNIA Solid State Storage Technical Work Group, winner of the 2011 Industry Impact Award for his leadership on development of the Solid State Storage Enterprise and Client Performance Test Specifications.

The Most Significant Impact by a Technical Work Group recognizes the SNIA TWG, which above all others in 2011, had members and efforts which consistently stepped up and helped SNIA achieve something new and groundbreaking or which significantly advanced an existing program. Congratulations to the Solid State Storage Technical Work Group, honored in 2011 for their development of the Solid State Storage Enterprise and Client Performance Test Specifications.

Eden Kim, SNIA SSS TWG Chair, receiving "SNIA Industry Impact Award" from Wayne Adams, SNIA Chairman of the Board, and Leo Leger, SNIA Executive Director

Share

Understand SSD Performance Project

At last week’s Storage Vision conference, SSSI announced the Understanding SSD Performance project, which is intended to educate users about how to use the SSS PTS (Performance Test Specification) to make intelligent decisions about SSD performance.  You can find the press release here.

The project outcomes so far include a new webpage at www.snia.org/forums/sssi/pts, a white paper (www.snia.org/forums/sssi/knowledge/education), and a webcast.

Join us for the webcast on January 19 at 11AM Pacific Time by going to www.brighttalk.com/webcast/663/40549.

 

Validating CDMI features – Object Expiration

Validating yet another feature of the CDMI standard (see previous post for an earlier one), Amazon announced their Object Expiration feature for S3. While not a new concept for storage interfaces, it is the first cloud implementation of this capability that I know of. The idea is simply to have the server side of the cloud do object deletion on your behalf automatically, once the lifecycle of that data has completed.

As part of overall Data Lifecycle Management, object deletion is the most common terminal state for data. CDMI has standardized the interface for this capability in cloud storage with a comprehensive Retention and Hold Management feature (Chapter 17). The granularity of the standard CDMI feature is finer than that of the S3 feature in that it allows for retention and deletion on individual objects (although you could accomplish this in S3 with prefix = object name, it doesn’t scale using the header fields that Amazon uses). The S3 prefix mechanism can be used to scope the expiration policy down to individual “directories” (forward slash terminated parts of object names), and CDMI allows this also for the semantically equivalent CDMI sub-containers.

Complying with Regulations

Although the ability to delete objects when their lifecycle completes is useful, it is insufficient for complying with regulations such as Sarbanes-Oxley, or for eDiscovery needs during litigation. For most enterprises, they need to show that the data has not been modified during its lifecycle. In addition, if a subpoena is issued for the data – you DO NOT want the object deleted, even if it’s retention period has expired – this can cost you millions of dollars in a pending court case…

The CDMI standard anticipates that storage clouds will want to offer a more robust, full featured retention and hold management for corporate data, and that a standard means of achieving it will be needed. Take a quick look at Chapter 17 (it’s quite compact while being comprehensive) and investigate using the standard way to achieve this function. If you are a cloud vendor trying to emulate the S3 interface, good luck to you – Amazon will continue to expand the definition of what “S3″ means (like adding this feature), forcing you to constantly modify your cloud’s storage interface to keep up (as well as requiring you to reverse engineer any bugs that exist).

Holiday Education Before Our New Year Event!

These first weeks of December always fly by with the myriad of tasks and assignments we need to get done before everyone disappears for the
holidays. I won’t add to your load, but I will ask you to put two items on your future to-do list.

SNIA Solid State Storage Initiative just completed two excellent webcasts on Solid State Storage – a topic that is getting the 2012 buzz.    In my involvement with the SNIA SSSI, I was lucky enough to introduce both webcasts, and they really brought some new perspectives to light.

Put them on your holiday viewing list, and you’ll jumpstart your 2012 education just in time to see the SSSI at Storage Visions, January 8-9, 2012 in Las Vegas.

I would suggest beginning with a session that gives a glimpse into how IT professionals look at solid state drives and high speed memory technologies.  Jim Bagley of Storage Strategies NOW gives a quick background of the solid state drive and high-speed memory market and then dives into solid state storage growth and how businesses are adopting and deploying solid state storage for rapid access to transactional data, the cloud and virtual desktop infrastructures.

This session includes the results of an IT Professionals Adoption Survey, co-sponsored by the SNIA and the Solid State Storage Initiative (SSSI), which presents information on the status of solid state drives (SSDs) and high-speed memory, and that’s where it gets interesting.  You’ll want to compare your organization to these answers from 300+ professionals, and see where you stand with SSD adoption, how SSDs have affected the perception of the value of IT as a business enabler, and which storage applications are deploying SSDs.

The webcast is entitled “Solid State Adoption and Use – a Glimpse into the IT Professional Mind” and you can access it here.

Stay tuned for my next blog entry, I’ll give your present #2, and maybe even a bonus gift.