Cloud Object Storage – You’ve Got Questions, We’ve Got Answers

The SNIA Cloud Storage Initiative hosted a live Webcast “Cloud Object Storage 101.” Like any “101” type course, there were a lot of good questions. Here they all are – with our answers. If you have additional questions, please let us know by commenting on this blog.

Q. How do you envision the new role of tape (LTO) in this unstructured data growth?

A. Exactly the same way that tape has always played a part; it’s the storage medium that requires no power to store cold data and is cheap per bit. Although it has a limited shelf life, and although we believe that flash will eventually replace it, it still has a secure & growing foreseeable future.

Q. What are your thoughts on whether object storage can exist outside the bounds of supporting file systems? Block devices directly storing objects using the key as reference and removing the intervening file system? A hierarchy of objects instead of files?

A. All of these things. Objects can be objects identified by an ID in a flat non-hierarchical structure; or we can impose a hierarchy by key- to objectID translation; or indeed, an object may contain complete file systems or be treated like a block device. There are really no restrictions on how we can build meta data that describes all these things over the bytes of storage that makes up an object.

Q. Can you run write insensitive low latency apps on object storage, ex: virtual machines?

A. Yes. Object storage can be made up of the same stuff as other high performance storage systems; for instance, flash connect via high bandwidth and low latency networks. Or they could even be object stores built over PCIe and NVDIMM.

Q. Is erasure coding (EC) expensive in terms of networking and resources utilization (especially in case of rebuild)?

A. No, that’s one of the advantages of EC. Rebuilds take place by reading data from many disks and writing it to many disks; in traditional RAID rebuilds, the focus is normally on the one disk that’s being rebuilt.

Q. Is there any overhead for small files or object use cases? Do you have a recommended size?

A. Each system will have its own advantages and disadvantages for objects of specific sizes. In general, object stores are designed to store billions of objects, so the number of objects is usually not an issue.

Q. Can you comment on Internet bandwidth limitations on geographically dispersed erasure coded data?

A. Smart caching can make a big difference, but at the end of the day, a geographically EC dispersed object store won’t be faster than a local store. You can’t beat the speed of light.

Q. The suppliers all claim easy exit strategies from their systems. If we were to use one of the on-premise solutions such as ECS or Cleversafe, and then down the road decide to move off-premise, is the migration/egress typically as easy as claimed?

A. In general, any proprietary interface might lock you in. The SNIA’s CDMI is vendor neutral, and supported by a number of vendors. Amazon’s S3 is a popular and common interface. Ultimately, vendors want your data on their systems – and that means making it easy to get the data from a competing vendor’s system; lock-in is not what vendors want. Talk to your vendor and ask for other users’ experiences to get confirmation of their claims.

Q. Based on factual information, where are you seeing the most common use cases for Object Storage?

A. There are many, and each vendor of cloud storage has particular markets. Backup is a common case, as are systems in the healthcare space that treat data such as scans and X-rays as objects.

Q. NAS filers only scale up not out. They are hard to manage at scale. Why use them anymore?

A. There are many NAS systems that scale out as well as up. NFSv4 support high degrees of scale out and there are file systems like Gluster that provide very large-scale solutions indeed, into the multi-petabyte range.

Q. Are there any specific uses cases to avoid when considering object storage?

A. Yes. Many legacy applications will not generate any savings or gains if moved to object storage.

Q. Would you agree with industry statements that 80% of all data written today will NEVER be accessed again; and that we just don’t know WHICH 20% will be read again?

A. Yes to the first part, and no to the second. Knowing which 80% is cold is the trick. The industry is developing smart ways of analyzing data to help with the issue of ensuring cached data is hot data, and that cold data is placed correctly first time around.

Q. Is there also the possibility to bring “compliance” in the object storage? (thinking about banking, medical and other sensible data that needs to be tracked, retention, etc…)

A. Yes. Many object storage vendors provide software to do this.

 

Need a Primer on Cloud Object Storage?

There has been a lot of buzz around cloud object storage recently. But before you get deep into all that cloud object storage can do, it’s good to take a step back and make sure you understand the basics. That’s what the SNIA Cloud Storage Initiative is planning to do on July 14th at our live Webcast “Cloud Object Storage 101.”

Many organizations, like large service providers, have already begun to leverage software-defined object storage to support new application development and DevOps projects. Meanwhile, legacy enterprise companies are in the early stages of exploring the benefits of object storage for their particular business and are searching for how they can use cloud object storage to modernize their IT strategies, store and protect data, while dramatically reducing the costs associated with legacy storage sprawl.

This Webcast will highlight the market trends towards the adoption of object storage, the definition and benefits of object storage, and the use cases that are best suited to leverage an underlying object storage infrastructure.

Join us on July 14th to learn:

  • How to accelerate the transition from legacy storage to a cloud object architecture
  • Understand the benefits of object storage
  • Primary use cases
  • How an object storage can enable your private, public or hybrid cloud strategy without compromising security, privacy or data governance

I hope you’ll register today to join my colleague, Nancy Bennis, Director of Alliances at Cleversafe (an IBM company), and me for this tutorial on cloud object storage.

 

 

SNIA Tutorials Highlight Industry Track at USENIX FAST ’16

by Marty Foltyn

SNIA is pleased to present seven of their series of SNIA Tutorials at the 14th USENIX conference on File and Storage Technologies (USENIX FAST) on February 24, 2016 in Santa Clara, CA.  fast16_button_180_0

SNIA Tutorials are educational materials developed by vendors, training companies, analysts, consultants, and end-users in the storage and information technology industry. SNIA tutorials are presented and used throughout the world at SNIA events and international conferences.

Utilizing VDBench to Perform IDC AFA Testing will be presented by Michael Ault, Oracle Guru, IBM, Inc. This SNIA Tutorial provides procedures, scripts, and examples to perform the IDC test framework utilizing the free tool VDBench on AFAs to provide a common set of results for comparison of multiple AFAs suitability for cloud or other network based storage.

Practical Online Cache Analysis and Optimization will be presented by Carl Waldspurger, Research and Development, CloudPhysics, Inc., and Irfan Ahmad, CTO, CloudPhysics, Inc.  After reviewing the history and evolution of MRC algorithms, this SNIA Tutorial examines new opportunities afforded by MRCs to capture valuable information about locality that can be leveraged to guide efficient cache sizing, allocation, and partitioning in order to support diverse goals such as improving performance, isolation, and quality of service.

SMB Remote File Protocol (Including SMB 3.x) will be presented by Tom Talpey, Architect, Microsoft.  This SNIA Tutorial begins by describing the history and basic architecture of the SMB protocol and its operations. The second part of the tutorial covers the various versions of the SMB protocol, with details of improvements over time. The final part covers the latest changes in SMB3, and the resources available in support of its development by industry.

Object Drives: A New Architectural Partitioning will be presented by Mark Carlson, Principal Engineer, Industry Standards, Toshiba.  This SNIA Tutorial discusses the current state and future prospects for object drives. Use cases and requirements will be examined and best practices will be described.

Fog Computing and Its Ecosystem will be presented by Ramin Elahi, Adjunct Faculty, UC Santa Cruz Silicon Valley.  This SNIA Tutorial introduces and describes Fog Computing and discusses how it supports emerging Internet of Everything (IoE) applications that demand real-time/predictable latency (industrial automation, transportation, networks of sensors and actuators).

Privacy vs. Data Protection: The Impact of EU Data Protection Legislation will be presented by Thomas Rivera, Senior Technical Associate, HDS.  This SNIA Tutorial explores the new EU data protection legislation and highlights the elements that could have significant impacts on data handling practices.

Converged Storage Technology will be presented by Liang Ming, Research Engineer, Development and Research, Distributed Storage Field, Huawei.  This SNIA Tutorial discusses the concept of key-value storage, next generation key-value converged storage solutions, and what has been done to promote the key-value standard.

Get Your Registration Discount
As a friend of SNIA, we are able to offer you a $75 discount on registration for the technical sessions. Use code75FAST15SNIA during registration to receive your discount.

FAST ’16 Program
FAST ’16 will kick off with their Keynote Address given by Eric Brewer, VP Infrastructure at Google, on “Spinning Disks and Their Cloudy Future”. In addition to the SNIA Industry Track, the 3-day technical sessions program also includes 27 refereed paper presentations.

The full program is available here: https://www.usenix.org/conference/fast16/glance

 

On-Demand Cloud Storage Webcasts Worth Watching

As the SNIA Cloud Storage Initiative (CSI) starts our 2016 with a new set of educational programs and webcasts on topics of interest to those developing, implementing & managing cloud storage, I thought it might be a good time to remind everyone of the vendor-neutral educational work the CSI has delivered in 2015.

I’m particularly proud of the work the CSI has done through BrightTalk (a web based content delivery platform) in producing live hour-long tutorials on a wide variety of subjects.

What you may not know is that these are also recorded, and you can play them back when it’s convenient to you. I know that we have a global audience, and that when we deliver the live version it may be in the middle of your busy working day – or even in the middle of the night.

As part of SNIA, the CSI supports the development of technical storage standards; and that means some of our audience are developers. For those of you that are interested in more technical presentations we had two developer focussed BrightTalks:

Hierarchical Erasure Coding: Making Erasure Coding Usable

This talk covered two different approaches to erasure coding – a flat erasure code across JBOD, and a hierarchical code with an inner code and an outer code; it compared the two approaches on different parameters that impact the IT business and provided guidance on evaluating object storage solutions.

Expert Panel: Cloud Storage Initiatives – An SDC Preview

At the 2015 Storage Developer Conference (SDC) we presented on a variety of topics:

  • Mobile and Secure – Cloud Encrypted Objects using CDMI
  • Object Drives: A new Architectural Partitioning
  • Unistore: A Unified Storage Architecture for Cloud Computing
  • Using CDMI to Manage Swift, S3, and Ceph Object Repositories

We discussed how encrypted objects can be stored, retrieved, and transferred between clouds, how Object Drives allow storage to scale up and down by single drive increments, end-user and vendor use cases of the Cloud Data Management Interface (CDMI), and we introduced Unistore – an innovative unified storage architecture that efficiently integrates heterogeneous HDD and SCM devices for Cloud storage systems.

(As an added bonus, all these SDC 2015 presentations and others can be found here http://www.snia.org/events/storage-developer/presentations15.)

OpenStack has had a big year, and the CSI contributed to the discussion with:

OpenStack File Services for High Performance Computing

We looked at how OpenStack can consume and control file services appropriate to High Performance Compute in a cloud and multi-tenanted environment and investigated two approaches to integration. One approach is to have OpenStack manage the storage infrastructure services using Cinder, Nova and Neutron to provide HPC Filesystem as a Service. We also reviewed a second option of using Manila file services for OpenStack to control the HPC File system deployment and manage the exports etc. We discussed the development of the Lustre Manila driver and its current progress.

Hybrid clouds were also in the news. We delivered two sessions, specifically targeted at end users looking to understand the technologies:

Hybrid Clouds: Bridging Private & Public Cloud Infrastructures

Every IT consumer is using cloud in one form or another, and just as storage buyers are reluctant to select single vendor for their on-premises IT, they will choose to work with multiple public cloud providers. But this desirable “many vendor” cloud strategy introduces new problems of compatibility and integration. To provide a seamless view of these discrete storage clouds, Software Defined Storage (SDS) can be used to build a bridge between them. This presentation explored how SDS, with its ability to deploy on different hardware and supporting rich automation capabilities, can extend its reach into cloud deployments to support a hybrid data fabric that spans on-premises and public clouds.

Hybrid Clouds Part 2: Case Study on Building the Bridge between Private & Public

There are significant differences in how cloud services are delivered to various categories of users. The integration of these services with traditional IT operations remains an important success factor but also a challenge for IT managers. The key to success is to build a bridge between private and public clouds. This Webcast expanded on the previous Hybrid Clouds: Bridging Private & Public Cloud Infrastructures webcast where we looked at the choices and strategies for picking a cloud provider for public and hybrid solutions.

Lastly, we looked at some of the issues surrounding data protection and data privacy (no, they’re not the same thing at all!).

Privacy v Data Protection: The Impact Int’l Data Protection Legislation on Cloud

Governments across the globe are proposing and enacting strong data privacy and data protection regulations by mandating frameworks that include noteworthy changes like defining a data breach to include data destruction, adding the right to be forgotten, mandating the practice of breach notifications, and many other new elements. The implications of this and other proposed legislation on how the cloud can be utilized for storing data are significant. This webcast covered:

  • EU “directives” vs. “regulation”
  • General data protection regulation summary
  • How personal data has been redefined
  • Substantial financial penalties for non-compliance
  • Impact on data protection in the cloud
  • How to prepare now for impending changes

Moving Data Protection to the Cloud: Trends, Challenges and Strategies

This was a panel discussion; we talked about various new ways to perform data protection using the Cloud and many advantages of using the Cloud this way.

You can access all the CSI BrightTalk Webcasts on demand at the SNIA Website. Many of you will also be happy to learn that PDFs of the Webcast slides are also available there.

We had a good 2015, and I’m looking forward to producing more great educational material during 2016. If you have a topic you’d like to see the CSI cover this year, please comment below in this blog. We value input from all.

Thanks for your support and hopefully we’ll see you some time this year at one of our BrightTalk webcasts.

OpenStack File Services for HPC Q&A

We got some great questions during our Webcast on how OpenStack can consume and control file services appropriate for High Performance Computing (HPC) in a cloud and multi-tenanted environment. Here are answers to all of them. If you missed the Webcast, it’s now available on-demand. I encourage you to check it out and please feel free to leave any additional questions at this blog.

Q. Presumably we can use other than ZFS for the underlying filesystems in Lustre?

A. Yes, there a plenty of other filesystems that can be used other than ZFS. ZFS was given as an example of a scale up and modern filesystem that has recently been integrated, but essentially you can use most filesystem types with some having more advantages than others. What you are looking for is a filesystem that addresses the weaknesses of Lustre in terms of self-healing and scale up. So any filesystem that allows you to easily grow capacity whilst also being capable of protecting itself would be a reasonable choice. Remember, Lustre doesn’t do anything to protect the data itself. It simply places objects in a distributed fashion of the Object Storage Targets.

Q. Are there any other HPC filesystems besides Lustre?

A. Yes there are and depending on your exact requirements Lustre might not be appropriate. Gluster is an alternative that some have found slightly easier to manage and provides some additional functionality. IBM has GPFS which has been implemented as an HPC filesystem and other vendors have their scale-out filesystems too. An HPC filesystem is simply a scale-out filesystem capable of very good throughput with low latency. So under that definition a flash array could be considered a High Performance storage platform, or a scale out NAS appliance with some fast disks. It’s important to understand you’re workloads characteristics and demands before making the choice as each system has pro’s and con’s.

Q. Does “embarrassingly parallel” require bandwidth or latency from the storage system?

A. Depending on the workload characteristics it could require both. Bandwidth is usually the first demand though as data is shipped to the nodes for processing. Obviously the lower the latency the fast though jobs can start and run, but its not critical as there is limited communication between nodes that normally drives the low latency demand.

Q. Would you suggest to use Object Storage for NFV, i.e Telco applications?

A. I would for some applications. The problem with NFV is it actually captures a surprising breadth of applications so of which have very limited data storage needs. For example there is little need for storage in a packet switching environment beyond the OS and binaries needed to stand up the VM’s. In this case, object is a very good fit as it can be easily, geographically distributed ensuring the same networking function is delivered in the same manner. Other applications that require access to filtered data (so maybe billing based applications or content distribution) would also be good candidates.

Q. I missed something in the middle; please clarify, your suggestion is to use ZFS (on Linux) for the local file system on OSTs?

A. Yes, this was one example and where some work has recently been done in the Lustre community. This affords the OSS’s the capability of scaling the capacity upwards as well as offering the RAID-like protection and self-healing that comes with ZFS. Other filesystems can offer those some things so I am not suggesting it is the only choice.

Q. Why would someone want/need scale-up, when they can scale-out?

A. This can often come down to funding. A lot of HPC environments exist in academic institutions that rely on grant funding and sponsorship to expand their infrastructure. Sometimes it simply isn’t feasible to buy extra servers in order to add capacity, particularly if there is already performance headroom. It might also be the case that rack space, power and cooling could be factors in which case adding drives to cope with bigger workloads might be the only option. You do need to consider if the additional capacity would also provoke the need for better performance so we can’t just assume that adding disk is enough, but it’s certainly a good option and a requirement I have seen a number of times.

 

Outstanding Keynotes from Leading Storage Experts Make SDC Attendance a Must!

Posted by Marty Foltyn

Tomorrow is the last day to register online for next week’s Storage Developer Conference at the Hyatt Regency Santa Clara. What better incentive to click www.storagedeveloper.org and register than to read about the amazing keynote and featured speakers at this event – I think they’re the best since the event began in 1998! Preview sessions here, and click on the title to download the full description.SDC15_WebHeader3_999x188

Bev Crair, Vice President and General Manager, Storage Group, Intel will present Innovator, Disruptor or Laggard, Where Will Your Storage Applications Live? Next Generation Storage and discuss the leadership role Intel is playing in driving the open source community for software defined storage, server based storage, and upcoming technologies that will shift how storage is architected.

Jim Handy, General Director, Objective Analysis will report on The Long-Term Future of Solid State Storage, examining research of new solid state memory and storage types, and new means of integrating them into highly-optimized computing architectures. This will lead to a discussion of the way that these will impact the market for computing equipment.

Jim Pinkerton, Partner Architect Lead, Microsoft will present Concepts on Moving From SAS connected JBOD to an Ethernet Connected JBOD . This talk examines the advantages of moving to an Ethernet connected JBOD, what infrastructure has to be in place, what performance requirements are needed to be competitive, and examines technical issues in deploying and managing such a product.

Andy Rudoff, SNIA NVM Programming TWG, Intel will discuss Planning for the Next Decade of NVM Programming describing how emerging NVM technologies and related research are causing a change to the software development ecosystem. Andy will describe use cases for load/store accessible NVM, some transparent to applications, others non-transparent.

Richard McDougall, Big Data and Storage Chief Scientist, VMware will present Software Defined Storage – What Does it Look Like in 3 Years? He will survey and contrast the popular software architectural approaches and investigate the changing hardware architectures upon which these systems are built.

Laz Vekiarides, CTO and Co-founder, ClearSky Data will discuss Why the Storage You Have is Not the Storage Your Data Needs , sharing some of the questions every storage architect should ask.

Donnie Berkholz, Research Director, 451 Research will present Emerging Trends in Software Development drawing on his experience and research to discuss emerging trends in how software across the stack is created and deployed, with a particular focus on relevance to storage development and usage.

Gleb Budman, CEO, Backblaze will discuss Learnings from Nearly a Decade of Building Low-cost Cloud Storage. He will cover the design of the storage hardware, the cloud storage file system software, and the operations processes that currently store over 150 petabytes and 5 petabytes every month.

You could wait and register onsite at the Hyatt, but why? If you need more reasons to attend, check out SNIA on Storage previous blog entries on File Systems, Cloud, Management, New Thinking, Disruptive Technologies, and Security sessions at SDC. See the full agenda and register now for SDC at http://www.storagedeveloper.org.

Cloud Storage Development Challenges – An SDC Preview

This year’s Storage Developer Conference (SDC) is expected to draw over 400 storage developers and professionals. On August 4th, you can get a sneak preview of key cloud topics that will be covered at SDC in this live Webcast where David Slik and Mark Carlson Co-Chairs of the SNIA Cloud Technical Work Group, together with Yong Chen, Assistant Professor at Texas Tech University will discuss:

  • Mobile and Secure – Cloud Encrypted Objects using CDMI
  • Object Drives: A new Architectural Partitioning
  • Unistore: A Unified Storage Architecture for Cloud Computing
  • Using CDMI to Manage Swift, S3, and Ceph Object Repositories

You’ll learn how encrypted objects can be stored, retrieved, and transferred between clouds, how Object Drives allow storage to scale up and down by single drive increments, end-user and vendor use cases of the Cloud Data Management Interface (CDMI), and we’ll introduce Unistore – an innovative unified storage architecture that efficiently integrates heterogeneous HDD and SCM devices for Cloud storage systems.

I’ll be moderating the discussion among this expert panel. It should be an enlightening and lively hour. I hope you’ll register now to join us.

 

New Webcast: Hierarchical Erasure Coding: Making Erasure Coding Usable

On May 14th the SNIA-CSI (Cloud Storage Initiative) will be hosting a live Webcast “Hierarchical Erasure Coding: Making erasure coding usable.” This technical talk, presented by Vishnu Vardhan, Sr. Manager, Object Storage, at NetApp and myself, will cover two different approaches to erasure coding – a flat erasure code across JBOD, and a hierarchical code with an inner code and an outer code. This Webcast, part of the SNIA-CSI developer’s series, will compare the two approaches on different parameters that impact the IT business and provide guidance on evaluating object storage solutions. You’ll learn:

  • Industry dynamics
  • Erasure coding vs. RAID – Which is better?
  • When is erasure coding a good fit?
  • Hierarchical Erasure Coding- The next generation
  • How hierarchical codes make growth easier
  • Key areas where hierarchical coding is better than flat erasure codes

Register now and bring your questions. Vishnu and I will look forward to answering them.

Ethernet Connected Drives Webcast Q&A

At our recent SNIA ESF Webcast “Visions for Ethernet Connected Drives” Chris DePuy of the Dell’Oro Group discussed potential benefits, use cases, and challenges of Ethernet connected drives. It’s not surprising that we had a lot of questions given that this market is in its infancy. As promised during our live event, here are answers to questions from the audience. If you think of additional questions, please feel free to comment on this blog.

Q. Will this also mandate new protocols to be used for storage like RDMA?

A. We did not receive any feedback from the technology companies we surveyed about RDMA specifically, but new protocols very well may be required to make effective and cost-effective use of eDrives. Storage systems offer many capabilities beyond just standard Ethernet networking and new protocols may be required to deliver those as well as new services in this new storage system architecture.

Q. Is White Box bought primarily by cloud customers?

A. Yes, in our research, substantially all purchases of White Box storage devices are purchased by cloud service providers.

Q. I may have missed it but aren’t we really talking about the HGST Open Ethernet Drive Architecture and the Seagate Kinetic Open Storage Platform? Both use Ethernet interfaces but HGST puts Debian on each HDD and Seagate has a key-value API for applications to directly write to the HDD. The actual deployment of these Ethernet HDDs would be in Ethernet Layer 2 switched backplanes in a 4U chassis being built by Supermicro, Xyratex (Seagate) and several others.

A. Given this was a presentation made to a neutral industry association; we chose not to discuss specific vendors. To answer your questions, yes, we are talking about Ethernet Connected Drives from HGST and Seagate, but we also integrated feedback from other suppliers of related technology, as well, including Toshiba. To your other question, yes, we have seen enclosures with embedded Ethernet switch technology connecting to the Ethernet drives from various other vendors. In our research for this webinar, we have also seen Ethernet switch technology embedded into enclosures that don’t use Ethernet connected drives, as well, but these would have systems to convert traditional HDD interfaces, but the network would see Ethernet as the outward facing interface.

Q. Doesn’t that take space on the drive when you put CPU and more memory?

A. We asked this question, too, but learned that there is sufficient space to maintain the HDD and all the parts in the same form factors we historically have known.

Q. What can one implement in these internal processors used in Ethernet drives? For instance can we run erasure codes such as Jerasure or XOR based codes yet do the basic tasks needed for the Ethernet drives?

A. We did not receive specific feedback during the surveys for this webinar about where one would run erasure coding. Generally, though, the decision will lead to design considerations for which CPU and memory choices would be made for each drive, which in turn would change economics as to whether the overall system is affordable/feasible. Note that doing erase coding on the drives increases the amount of intelligence required on the drive, for the arithmetic, for the requisite peer-to-peer networking, and for maintaining state information about other relevant drives required for completing the erasure codes. New software to manage all this would be required as well.

Q. Can I ran Ceph OSD plus Erasure code based on open source Jerasure in the Ethernet connected drive internal ARM processor?

A. We did not receive specific feedback during the surveys for this webinar about where one would run erasure coding. Generally, though, the decision will lead to design considerations for which CPU and memory choices would be made for each drive, which in turn would change economics as to whether the overall system is affordable/feasible.

Q. Erasure coding is more complex compared to RAID, how do I implement erasure coding with Ethernet drives?

A. We did not receive specific feedback during the surveys for this webinar about where or how one would run erasure coding.

Q. Does the economics assume including the cost of the Ethernet Ports? If so are you assuming unmanaged or managed Ethernet ports?

A. In the slides, we portrayed a simplistic capital spending model that considered just servers and hard drives. In reality, there are many other factors that play into both CAPEX and OPEX comparisons between conventional and Ethernet Connected Drive architectures. Examples include the cost differential between using Ethernet switching versus traditional HDD interfaces and how much memory and CPU is needed to support a particular use case.

Q. How does the increased number of network ports needed influence this price equation?

A. In the slides, we portrayed a simplistic capital spending model that considered just servers and hard drives. In reality, there are many other factors that play into both CAPEX and OPEX comparisons between conventional and Ethernet Connected Drive architectures. Examples include the cost differential between using Ethernet switching versus traditional HDD interfaces, how much memory and CPU is needed to support a particular use case.

Q. I’m confused how Power and Cooling could be saved. If you need X number of drives to store data then you would need the same number of drives in the connected drive model wouldn’t you? Perhaps more if the e-drives lack efficiency features?

A. The general point is that proponents of Ethernet Connected Drives argue there won’t be a need for storage-oriented servers, and so the savings would result from there being fewer of them consuming power.

Q. I guess the protocol would change commanding the drives?

A. There is no single approach that has been agreed upon. During the presentation, we said there are multiple technical approaches, one of which includes using Key Value APIs, and the other is to install an Operating System onto each drive that could run whatever you want on it.

Q. Are Ethernet connected drives JBODS on Ethernet?

A. Yes, that is the way we view it, too. Sometimes they are even called, “eBODS” where the traditional JBOD controller is replaced with an Ethernet switch.

Q. How is data protected–i.e., RAID or other mechanism.

A. In our surveys, we learned that the most common method would be to leverage erasure coding that is commonly associated with object oriented storage systems.

Q. How will photonics impact this concept?

A. Photonics is involved in data center Ethernet for higher speed communications. In our surveys, we did not encounter a single instance of a vendor discussing photonics at the Ethernet Connected Drive. For HDDs, 1GbE provides more than enough bandwidth for the drive.

Q. Are the servers today connecting the storage just dumb boxes that expose storage? Don’t they do processing as well? With Ethernet drives we’re removing that computational node it seems.

A. This is a very good point. Today’s conventional storage systems have significant computing capabilities – we think these could be used to do computing as well as performing storage-oriented tasks as they do primarily today. We expect that in the future, the servers that are packaged in external storage systems will be organized in a way that allows customers to run storage functions as well as more traditional purposes that would allow us to just call them ‘servers.’ In fact, there are several startups that are popularizing this idea.

Q. When it comes to HDD manufacturers there are only three left…WD (HGST), Seagate (Samsung) and Toshiba. When it comes to SSD or flash drives there are more manufacturers. Seagate is using a dual Serial Gigabit Media Independent Interface (SGMII) on its Kinetic HDDs. What other ways are there to do Ethernet on an HDD?

A. We did not receive any feedback from the technology companies we surveyed about this topic. Note, that SNIA recently started an “Object Drive Technical Work Group” to help drive standards for Ethernet-connected drives. If this topic is of interest, we encourage you to join that TWG.

Q. Have you seen any indication of a ratio between CPU power and Memory vs. the size of the storage? What is the typical White Box? EG Intel (version?) Memory (in GB?) Storage (in TB?)

A. The uses cases we presented are based on vendor-supplied viewpoints that implicitly incorporate the answers to your question, but don’t specifically address it. What we learned is that in these use-cases, there is an assumed positive TCO savings, but not every vendor agrees with these calculations – again without providing specifics like you are asking about.

Q. How can you eliminate the object servers? You still need that functionality somewhere if you ever hope to find the data again, or protect it… You may move away from dedicated Object servers but that code has to run somewhere thus saying they are eliminated is wrong…

A. This is a very good point. The use cases offered to us suggest that this code would either reside in the Ethernet Connected Drive, or on the server running an application itself, or both. This is why we made the point that the applications would have to be re-written to take advantage of the proposed new architecture.

Q. Is the cost of Ethernet HDDs expected to be the same as current HDDs and why?
Ethernet HDDs have more processing capabilities so shouldn’t they cost more (is that 10% more?)

A. Correct. If more components were added to an otherwise identical HDD, then, the cost would be greater. This is paramount to one of the main dissenting views we learned about during the survey process. It does raise the question as to whether it makes sense to deliver underlying HDDs that are NOT identical to traditional HDDs to offset costs somehow – maybe with lower speeds, or whether these Ethernet Connected Drives would be sold at lower margins by the HDD vendors.

Q. Do Server power TCO numbers take account of lower power consumption of next generation servers as indicated by Intel?

A. We do not know what version of servers was used in these vendor-supplied TCO calculations.

Q. If you are planning to offload processing to the processor on the HDD then you are assuming that the HDD vendors will expose those drives for user access – is there any evidence of this?

A. There is no single approach that has been agreed upon, and therefore no single answer to this question. During the presentation, we said there are multiple technical approaches, one of which includes using Key Value APIs, and the other is to install an Operating System onto each drive that could run whatever you want on it.

Q. How is redundancy handled on eHDD based appliance… aka a drive fails?

A. The custom-built software would presumably be developed to handle this. And obviously, the eHDD has to add enough CPU and memory to manage all this — which of course adds cost.

Q. It seems that with the CPUs on each drive, the archive, object or whatever the application would need to be rewritten to support this specific method of parallel processing. Is anyone doing this now?

A. During the survey process, we learned that many applications were being ported to this environment, some of which apparently do take advantage of parallel computing. Given we were planning to immediately divulge information to the public, we were not presented with details.

Q. What is nearline storage?

A. This is the way it was described to us by some of the technology companies we surveyed, but the meaning is that it represents a more traditional storage system you might see in an enterprise where many drives are stopped (not rotating) and are turned on when a request comes in.

Q. Why are analytics specifically optimized for Ethernet attached storage devices – the presenter seems to anticipate that processing can be pushed onto the drive, and if this is the case why can’t other drive interfaces do this – PCIe attached storage should be even more amenable for this.

A. The presenter was sharing views compiled by the responses of various technology companies during a series of interviews conducted before this webcast. Analytics is a large, growing industry today and exists without Ethernet Connected Drives. Some of the companies surveyed offered the view that putting processing capabilities into each HDD may enhance the overall system’s performance.

Q. Can the presenter comment on the value of scale-out for E-Drives, versus legacy SAN scale out?

A. Some of the technology companies interviewed by the presenter suggested that systems based on Ethernet Connected Drives may scale to larger capacities than traditional architectures on the basis that the storage-oriented servers no longer present an impediment to scaling.

Q. Just as object storage addresses RAID smart drives could provide the meta data needed by the swift controllers to do deduplication, or the controller may do deduplication as a pre-process or post process like we have seen on NetApp or Data Domain evolve over years.
If we use optic connections the port density issue is resolved and this end up looking like something from 2001 (the movie) correct?

A. Photonics is involved in data center Ethernet for higher speed communications. In our surveys, we did not encounter a single instance of a vendor discussing photonics at the Ethernet Connected Drive. As noted above, 1GbE is more than sufficient for eHDDs.

Q. FYI…48TB Capacity Kinetic Storage Appliance $5000.00 street price
White Box 2U Dual Xeon storage server with 48TB RAW…$8000 street price

A. Thank you for sharing! You may have noticed we did not mention specific vendors during the presentation – perhaps others viewing your question will take note of your viewpoint.

Q. To the extent that hyperscale cloud environments have servers with open sockets or slots for direct attach storage of drives, how are there financial savings to connect through Ethernet instead of direct attach? Will servers of the future remove these slots and sockets? Are there other cluster wide benefits with regards to performance for data accessed directly through the network instead of through the server with the local storage, when the data is accessed by a large number of servers?

A. Hyperscalers are buying storage-related hardware at a fraction of the price that systems OEMs are selling them for mainly because they do not demand software that enterprises value so much – they leverage open source and make their own for their very specific needs. If you look at the slide about the ‘White Box Effect’ in the presentation, you get a sense for just how much less they pay – or anyone else who buys a White Box pays – but make no mistake about it, these devices don’t do much unless you integrate them into a working system intended to store and safely retain data. To answer your question, we observe that these hyperscalers are such large customers of components and systems that they could choose to request custom hardware designs with customized specifications – more of this kind of interface, fewer of that kind, etc. As an analogy, in the networking industry, one of the largest buyers of the underlying network technology like processors, Ethernet interfaces and optics are the handful of hyperscalers – and in fact these customers are larger than most vendors.

Q. Why would each drive not know about other drives storage? How does this differ from existing storage servers?

A. In the traditional storage architecture, a central system is involved. The dissenting viewpoint we received from some of the technology companies we interviewed was a counterpoint that may exist only under certain design scenarios. Our view is that if a system is designed with the goal in mind to make each drive aware of each other’s contents, then that is technically possible of course. But at a cost, as you add CPU, memory, and software to do this.

Q. I can see flash and Wi-Fi Ethernet connected drives providing Internet of Things storage for values that can be harvested impendent of when the value was stored. Thus getting a low power system that could live off of USB type power or power over Ethernet being why corporations would look at this.

A. I think the point you are making is that flash consumes very little power, right? This revolutionary technology (lets just say, non volatile memory to keep it general) is causing all kinds of disruptive changes in the storage industry, and as costs come down for NVM, all kinds of different scenarios become possible.

Q. Cost model might need to include a simpler lower cost local server with the Ethernet drive clusters by adding a cost item to the left side of their equation, comments?

A. Agreed – the equation we provided was simplistic and could be expanded to include many other terms and other simultaneous equations as well. We just thought that providing it would frame the discussion on the slide instead of just saying it verbally.

Q. Obviously, it will be higher, but how do you envision this changing Ethernet bandwidth requirements? Will Ethernet connected drives only become a reality once 40, 25, 100 Gb becomes the mainstream for Ethernet networks?

A. Network bandwidth needs will be a function of how the servers interact with the drives – I can see scenarios where traffic might be kept more locally, or where asking each drive for ‘the answer’ instead of ‘all of its data’ so it can be processed in a server, might actually cause your premise that traffic increases. The point I’m getting to is that it depends on what applications these Ethernet Connected Drives are used for. Nevertheless, Metcalfe’s law (all available bandwidth installed will be consumed) has not yet been repealed publicly that I’m aware of.

Q. With Ethernet connected drives are we still stuck with the fundamental issue that HDD are still transactionally inefficient and thus while a novel concept the basic drive unless improvements are made in transactional efficiencies are improved remain the bottleneck?

A. We think HDDs will co-exist with Flash/NVM for a very long time. Some very smart engineers are working to make this co-existence increasingly efficient, taking into account the strengths and weaknesses of both storage media.

 

 

New Webcast: Visions For Ethernet Connected Drives

Mark your calendar for March 25th as SNIA-ESF, together with the Dell’Oro Group, will be hosting a live Webcast, “Visions for Ethernet Connected Drives.” The arrival of mass-storage services, the emergence of analytics applications and the adoption of object storage by the cloud-services industry have provided an impetus for new storage hardware architectures. One such underlying hardware technology is the Ethernet connected hard drive, which is in early stages of availability.

Please join us on March 25th to hear Chris DePuy, Vice President of Dell’Oro Group share findings from interviews with storage-related companies, including those selling hard drives, semiconductors, peripherals and systems, as he will present some common themes uncovered, including:

  • What system-level architectural changes may be needed to support Ethernet connected drives
  • What capabilities may emerge as a result of the availability of these new drives
  • What part of the value chain spends the time and money to package working solutions

We will also present some revenue and unit statistics about the storage systems and hard drive markets and will discuss potential market scenarios that may unfold as a result of the object storage and Ethernet connected drive trends.

I’ll be hosting the event and together with Chris, taking your questions. I hope you’ll join us.