The Ins and Outs of a Scale-Out File System Architecture

To meet the increasingly higher demand on both capacity and performance in large cluster computing environments, the storage subsystem has evolved toward a modular and scalable design. The scale-out file system has emerged as one implementation of the trend, in addition to scale-out object and block storage solutions. What are the key principles when architecting a scale-out file system? Find out on February 28th when the SNIA Networking Storage Forum (NSF) hosts The Scale-Out File System Architecture Overview, a live webcast where we will present an overview of scale-out file system architectures. This presentation will provide an introduction to scale-out-file systems and cover: Read More

What the “T” Means in SNIA Cloud Storage Technologies

The SNIA Cloud Storage Initiative (CSI) has had a rebrand; we’ve added a T for Technologies into our name, and we’re now officially the Cloud Storage Technologies Initiative (CSTI). That doesn’t seem like a significant change, but there’s a good reason. Our old name reflected the push to getting acceptance of cloud storage, and that specific cloud storage debate has been won, and big time. One relatively small cloud service provider is currently storing 400PB of clients’ data. Twitter alone consumes 300PB of data on Google’s cloud offering. Facebook, Amazon, AliBaba, Tencent – all have huge data storage numbers. Enterprises of every size are storing data in the cloud. That’s why we added the word “technologies.” The expanded charter and new name reflect the need to support the evolving cloud business models and architectures such as OpenStack, software defined storage, Kubernetes and object storage. Read More

SNIA Storage Developer Conference-The Knowledge Continues

SNIA’s 18th Storage Developer Conference is officially a success, with 124 general and breakout sessions;  Cloud Interoperability, Kinetiplugfest 5c Storage, and SMB3 plugfests; ten Birds-of-a-Feather Sessions, and amazing networking among 450+ attendees.  Sessions on NVMe over Fabrics won the title of most attended, but Persistent Memory, Object Storage, and Performance were right behind.  Many thanks to SDC 2016 Sponsors, who engaged attendees in exciting technology discussions.

For those not familiar with SDC, this technical industry event is designed for a variety of storage technologists at various levels from developers to architects to product managers and more.  And, true to SNIA’s commitment to educating the industry on current and future disruptive technologies, SDC content is now available to all – whether you attended or not – for download and viewing.

20160919_120059You’ll want to stream keynotes from Citigroup, Toshiba, DSSD, Los Alamos National Labs, Broadcom, Microsemi, and Intel – they’re available now on demand on SNIA’s YouTube channel, SNIAVideo.

All SDC presentations are now available for download; and over the next few months, you can continue to download SDC podcasts which combine audio and slides. The first podcast from SDC 2016 – on hyperscaler (as well as all 2015 SDC Podcasts) are available here, and more will be available in the coming weeks.

SNIA thanks all its members and colleagues who contributed to make SDC a success! A special thanks goes out to the SNIA Technical Council, a select group of acknowledged industry experts who work to guide SNIA technical efforts. In addition to driving the agenda and content for SDC, the Technical Council oversees and manages SNIA Technical Work Groups, reviews architectures submitted by Work Groups, and is the SNIA’s technical liaison to standards organizations. Learn more about these visionary leaders at http://www.snia.org/about/organization/tech_council.

And finally, don’t forget to mark your calendars now for SDC 2017 – September 11-14, 2017, again at the Hyatt Regency Santa Clara. Watch for the Call for Presentations to open in February 2017.

Q&A – OpenStack Mitaka and Data Protection

At our recent SNIA Webcast “Data Protection and OpenStack Mitaka,” Ben Swartzlander, Project Team Lead OpenStack Manila (NetApp), and Dr. Sam Fineberg, Distinguished Technologist (HPE), provided terrific insight into data protection capabilities surrounding OpenStack. If you missed the Webcast, I encourage you to watch it on-demand at your convenience. We did not have time to get to all of out attendees’ questions during the live event, so as promised, here are answers to the questions we received.

Q. Why are there NFS drivers for Cinder?

 A. It’s fairly common in the virtualization world to store virtual disks as files in filesystems. NFS is widely used to connect hypervisors to storage arrays for the purpose of storing virtual disks, which is Cinder’s main purpose.

 Q. What does “crash-consistent” mean?

 A. It means that data on disk is what would be there is the system “crashed” at that point in time. In other words, the data reflects the order of the writes, and if any writes are lost, they are the most recent writes. To avoid losing data with a crash consistent snapshot, one must force all recently written data and metadata to be flushed to disk prior to snapshotting, and prevent further changes during the snapshot operation.

Q. How do you recover from a Cinder replication failover?

 A. The system will continue to function after the failover, however, there is currently no mechanism to “fail-back” or “re-replicate” the volumes. This function is currently in development, and the OpenStack community will have a solution in a future release.

 Q. What is a Cinder volume type?

 A. Volume types are administrator-defined “menu choices” that users can select when creating new volumes. They contain hidden metadata, in the cinder.conf file, which Cinder uses to decide where to place them at creation time, and which drivers to use to configure them when created.

 Q. Can you replicate when multiple Cinder backends are in use?

 A. Yes

 Q. What makes a Cinder “backup” different from a Cinder “snapshot”?

 A. Snapshots are used for preserving the state of a volume from changes, allowing recovery from software or user errors, and also allowing a volume to remain stable long enough for it to be backed up. Snapshots are also very efficient to create, since many devices can create them without copying any data. However, snapshots are local to the primary data and typically have no additional protection from hardware failures. In other words, the snapshot is stored on the same storage devices and typically shares disk blocks with the original volume.

Backups are stored in a neutral format which can be restored anywhere and typically on separate (possibly remote) hardware, making them ideal for recovery from hardware failures.

 Q. Can you explain what “share types” are and how they work?

 A. They are Manila’s version of Cinder’s volume types. One key difference is that some of the metadata about them is not hidden and visible to end users. Certain APIs work with shares of types that have specific capabilities.

 Q. What’s the difference between Cinder’s multi-attached and Manila’s shared file system?

A. Multi-attached Cinder volumes require cluster-aware filesystems or similar technology to be used on top of them. Ordinary file systems cannot handle multi-attachment and will corrupt data quickly if attached more than one system. Therefore cinder’s multi-attach mechanism is only intended for fiesystems or database software that is specifically designed to use it.

Manilla’s shared filesystems use industry standard network protocols, like NFS and SMB, to provide filesystems to arbitrary numbers of clients where shared access is a fundamental part of the design.

 Q. Is it true that failover is automatic?

 A. No. Failover is not automatic, for Cinder or Manila

 Q. Follow-up on failure, my question was for the array-loss scenario described in the Block discussion. Once the admin decides the array has failed, does it need to perform failover on a “VM-by-VM basis’? How does the VM know to re-attach to another Fabric, etc.?

A. Failover is all at once, but VMs do need to be reattached one at a time.

 Q. What about Cinder? Is unified object storage on SHV server the future of storage?

 A. This is a matter of opinion. We can’t give an unbiased response.

 Q. What about a “global file share/file system view” of a lot of Manila “file shares” (i.e. a scalable global name space…)

 A. Shares have disjoint namespaces intentionally. This allows Manila to provide a simple interface which works with lots of implementations. A single large namespace could be more valuable but would preclude many implementations.

 

 

Open Source Software-Only Storage – Really.

Virtually any storage solution is more parts software than hardware. Having said this, users don’t care as much about the percentage of hardware vs. software. They want their consumption experience to be easy and fast to start up, with a pay-as-you-grow model and with the ability to scale without limits. So, it should not be a shock that real IT organizations are using software-only on standard servers to deliver storage to their customers. What’s more, this type of storage can be powered by open source.

At the upcoming SNIA Data Storage Innovation Conference, we are looking forward to discussing software-defined storage (SDS) from a user experience perspective with examples of OpenStack Swift providing an engine for building SDS clusters with any mixed combination of standard server and HDD hardware in a way that is simple enough for any enterprise to dynamically scale.

Swift is a highly available, distributed, scalable object store available as open source.  It is designed to handle non-relational (that is, not just simple row-column data) or unstructured data at large scale with high availability and durability.  For example, it can be used to store files, videos, documents, analytics results, Web content, drawings, voice recordings, images, maps, musical scores, pictures, or multimedia. Organizations can use Swift to store large amounts of data efficiently, safely, and cheaply. It scales horizontally without any single point of failure.  It offers a single multi-tenant storage system for all applications, the ability to use low-cost industry-standard servers and drives, and a rich ecosystem of tools and libraries.  It can serve the needs of any service provider or enterprise working in a cloud environment, regardless of whether the installation is using other OpenStack components.

I know what you are thinking, storage is too critical, so it will never work this way. But the same was said >25 years go when using RAID was seen as too risky given solutions would acknowledge writes while the data was in cache prior to being written to disk. The same was also said >15 years ago when VMware was seen as not robust enough to run any manner of demanding or critical application. Replicas and Erasure Codes are analogous to RAID 1 and RAID 5 respectively, and the uniquely as possible distribution of data behind a single namespace abstracts standard hardware like server virtualization.

Interested in hearing more? Come check out my DSI session, “Swift Use Cases with SwiftStack,” where we look forward to sharing how this new type of storage can work, and to suspend your disbelief that this storage can be enterprise-grade.

 

Data Protection and OpenStack Mitaka

Interested in data protection and storage-related features of OpenStack? Then please join us for a live SNIA Webcast “Data Protection and OpenStack Mitaka” on June 22nd. We’ve pulled together an expert team to discuss the data protection capabilities of the OpenStack Mitaka release, which includes multiple new resiliency features. Join Dr. Sam Fineberg, Distinguished Technologist (HPE), and Ben Swartzlander, Project Team Lead OpenStack Manila (NetApp), as they dive into:

  • Storage-related features of Mitaka
  • Data protection capabilities – Snapshots and Backup
  • Manila share replication
  • Live migration
  • Rolling upgrades
  • HA replication

Sam and Ben will be on-hand for a candid Q&A near the end of the Webcast, so please start thinking about your questions and register today. We hope to see you there!

This Webcast is co-sponsored by two groups within the Storage Networking Industry Association (SNIA): the Cloud Storage Initiative (CSI), and the Data Protection & Capacity Optimization Committee (DPCO).

 

On-Demand Cloud Storage Webcasts Worth Watching

As the SNIA Cloud Storage Initiative (CSI) starts our 2016 with a new set of educational programs and webcasts on topics of interest to those developing, implementing & managing cloud storage, I thought it might be a good time to remind everyone of the vendor-neutral educational work the CSI has delivered in 2015.

I’m particularly proud of the work the CSI has done through BrightTalk (a web based content delivery platform) in producing live hour-long tutorials on a wide variety of subjects.

What you may not know is that these are also recorded, and you can play them back when it’s convenient to you. I know that we have a global audience, and that when we deliver the live version it may be in the middle of your busy working day – or even in the middle of the night.

As part of SNIA, the CSI supports the development of technical storage standards; and that means some of our audience are developers. For those of you that are interested in more technical presentations we had two developer focussed BrightTalks:

Hierarchical Erasure Coding: Making Erasure Coding Usable

This talk covered two different approaches to erasure coding – a flat erasure code across JBOD, and a hierarchical code with an inner code and an outer code; it compared the two approaches on different parameters that impact the IT business and provided guidance on evaluating object storage solutions.

Expert Panel: Cloud Storage Initiatives – An SDC Preview

At the 2015 Storage Developer Conference (SDC) we presented on a variety of topics:

  • Mobile and Secure – Cloud Encrypted Objects using CDMI
  • Object Drives: A new Architectural Partitioning
  • Unistore: A Unified Storage Architecture for Cloud Computing
  • Using CDMI to Manage Swift, S3, and Ceph Object Repositories

We discussed how encrypted objects can be stored, retrieved, and transferred between clouds, how Object Drives allow storage to scale up and down by single drive increments, end-user and vendor use cases of the Cloud Data Management Interface (CDMI), and we introduced Unistore – an innovative unified storage architecture that efficiently integrates heterogeneous HDD and SCM devices for Cloud storage systems.

(As an added bonus, all these SDC 2015 presentations and others can be found here http://www.snia.org/events/storage-developer/presentations15.)

OpenStack has had a big year, and the CSI contributed to the discussion with:

OpenStack File Services for High Performance Computing

We looked at how OpenStack can consume and control file services appropriate to High Performance Compute in a cloud and multi-tenanted environment and investigated two approaches to integration. One approach is to have OpenStack manage the storage infrastructure services using Cinder, Nova and Neutron to provide HPC Filesystem as a Service. We also reviewed a second option of using Manila file services for OpenStack to control the HPC File system deployment and manage the exports etc. We discussed the development of the Lustre Manila driver and its current progress.

Hybrid clouds were also in the news. We delivered two sessions, specifically targeted at end users looking to understand the technologies:

Hybrid Clouds: Bridging Private & Public Cloud Infrastructures

Every IT consumer is using cloud in one form or another, and just as storage buyers are reluctant to select single vendor for their on-premises IT, they will choose to work with multiple public cloud providers. But this desirable “many vendor” cloud strategy introduces new problems of compatibility and integration. To provide a seamless view of these discrete storage clouds, Software Defined Storage (SDS) can be used to build a bridge between them. This presentation explored how SDS, with its ability to deploy on different hardware and supporting rich automation capabilities, can extend its reach into cloud deployments to support a hybrid data fabric that spans on-premises and public clouds.

Hybrid Clouds Part 2: Case Study on Building the Bridge between Private & Public

There are significant differences in how cloud services are delivered to various categories of users. The integration of these services with traditional IT operations remains an important success factor but also a challenge for IT managers. The key to success is to build a bridge between private and public clouds. This Webcast expanded on the previous Hybrid Clouds: Bridging Private & Public Cloud Infrastructures webcast where we looked at the choices and strategies for picking a cloud provider for public and hybrid solutions.

Lastly, we looked at some of the issues surrounding data protection and data privacy (no, they’re not the same thing at all!).

Privacy v Data Protection: The Impact Int’l Data Protection Legislation on Cloud

Governments across the globe are proposing and enacting strong data privacy and data protection regulations by mandating frameworks that include noteworthy changes like defining a data breach to include data destruction, adding the right to be forgotten, mandating the practice of breach notifications, and many other new elements. The implications of this and other proposed legislation on how the cloud can be utilized for storing data are significant. This webcast covered:

  • EU “directives” vs. “regulation”
  • General data protection regulation summary
  • How personal data has been redefined
  • Substantial financial penalties for non-compliance
  • Impact on data protection in the cloud
  • How to prepare now for impending changes

Moving Data Protection to the Cloud: Trends, Challenges and Strategies

This was a panel discussion; we talked about various new ways to perform data protection using the Cloud and many advantages of using the Cloud this way.

You can access all the CSI BrightTalk Webcasts on demand at the SNIA Website. Many of you will also be happy to learn that PDFs of the Webcast slides are also available there.

We had a good 2015, and I’m looking forward to producing more great educational material during 2016. If you have a topic you’d like to see the CSI cover this year, please comment below in this blog. We value input from all.

Thanks for your support and hopefully we’ll see you some time this year at one of our BrightTalk webcasts.

Come See SNIA at the Software-Defined Infrastructure Summit

Demand for software-defined infrastructure (SDI) is on the rise, and with good reason. SDI helps data centers meet the challenges of cloud computing, big data/analytics, mobility and social media, in an agile and cost-effective way.  I’m pleased to announce that SNIA will be an active participant at next week’s Software-Defined Infrastructure Summit in Santa Clara, CA, December 1-3.

My colleagues and I at the SNIA Cloud Storage Initiative have organized a “Working with OpenStack” Seminar that kicks off the Summit on Tuesday, December 1.

I will keynote an OpenStack fireside chat along with Chris DePuy, VP, at Dell’Oro Group. We’ll be discussing the SNIA Cloud Data Management Interface (CDMI) and its interface with OpenStack, OpenStack implementations, how standards play, and the future of open source in the 21st century.

My keynote will be accompanied by additional SNIA talks in the Introduction to OpenStack session and the Application Management session:

  • Sam Fineberg, PhD, SNIA Cloud Storage Initiative member and Distinguished Technologist at Hewlett Packard Enterprise Storage, will provide an overview of the storage aspects of OpenStack including the core projects for block storage (Cinder) and object storage (Swift), and the new shared file service (Manila). He’ll cover some common configurations and use cases for these technologies, and discuss how they interact with the other parts of OpenStack.
  • Richelle Ahlvers, SNIA Open Source Task Force member and Principal Storage Management Architect at Avago Technologies, will discuss application integration in OpenStack and how SNIA-developed standards enable cross-vendor management interoperability and help open source projects interoperate with more industry solutions.

Tuesday’s Seminar day will include additional sessions from leaders in OpenStack, Ceph, and Software Defined Storage. SDI Summit days 2 and 3 will provide information on hardware, software, and data center technology and applications of software-defined infrastructure featuring keynotes from IBM, Intel, Red Hat, and VMware, all SNIA member companies.  It’s a must attend event.

SNIA will also be exhibiting at the Summit. Please stop by booth #408 to learn how SNIA standards are used in open source projects including cloud data management, non-volatile memory, self-contained information retention, and storage management. We will also have information on SNIA programs such as membership, certification, conformance testing, and conferences.

SNIA members and colleagues can use the code SPGP to receive a $100 discount on any level of SDI Summit registration. I hope to see you in Santa Clara!

OpenStack Manila – A Q&A on Liberty and Mitaka

Our recent Webcast with OpenStack Manila OpenStack Manila Project Team Lead (PTL), Ben Swartzlander, generated a lot of great questions. As promised, we’ve complied answers for all of the questions that came in. If you think of additional questions, please feel free to comment on this blog. And if you missed the live Webcast, it’s now available on-demand.

Q. Is Hitachi Data Systems contributing to the Manila project?

A. Yes, Hitachi contributed a new driver and also contributed a major new feature (migration) during Liberty. HDS was also active during the Kilo release with a different driver which is unfortunately no longer maintained.

Q. EMC has open sourced ViPER as CopperHD. Do you see any overlap between Manila/Cinder from one side and CopperHD from the other hand?

A. I’m not familiar enough with CoprHD to answer authoritatively, but I understand that there is definitely some overlap between it and Cinder, and I also expect there is some overlap with Manila. Assuming there is some overlap, I think that’s a great thing because competition within open source drives greater quality, and it’s confirmation that there is real demand for what we’re building.

Q. Could Manila be used stand-alone (without OpenStack) to create a fileshare server?

A. Yes, the only OpenStack service Manila depends on is Keystone (for authentication). Running Manila in a stand-alone fashion is a specific use case the team supports.

Q. If we are mapping the snap shot images what is the guarantee for data integrity?

A. Snapshots are typically crash-consistent copies of the filesystem at a point in time. In reality the exact guarantee depends on the backend used, and that’s something we’d like to avoid, so that the snapshot semantics are clear to the user. In the future, backends which cannot meet the crash-consistent guarantee will probably be forced to advertise a different capability so end users are aware of what they’re getting.

Q. Is there manila automation with ansible?

A. As far as I know this hasn’t been done yet.

Q. For kilo deployed in production does it work for all commercial drivers or is there a chart that says which commercial drivers support kilo?

A. The developer doc now has a table which attempts to answer this question. However, the most reliable way to see which drivers are part of the stable/kilo release would be to look at the driver directory of the code. This is an area where the docs need to improve.

Q. Could you explain consistency groups?

A. Consistency groups are a mechanism to ensure that 2 or more shares can be snapshotted in a single operation. Without CGs, you can take 2 snapshots of 2 shares but there is no guarantee that those snapshots will represent the same point in time. CGs allow you to guarantee that the snapshots are synchronized, which makes it possible to use multiple shares together for a single application and to take snapshots of that application’s data in a consistent way.

Q. How is the consistency group in Manila different from Cinder? Is it similar?

A. The designs are very similar. There are some semantic differences in terms of how you modify the membership of the CGs, but the snapshot functionality is identical.

Q. Are you considering pNFS, but I guess this will be hard since it has req. on the client as well?

A. Manila is agnostic to the data protocol so if the backend supports pNFS and Manila is asked to create an NFS share, it may very well get a share with pNFS support. Certainly Manila supports shares with multiple export locations so that on a system with multiple network interfaces, or a clustered system, Manila will tell the clients about all of the paths to the share. In the future we may want Manila to actually know the capabilities of the backends w.r.t. what version of NFS they support so that if a user requires a minimum version we can guarantee that they get that version or get a sensible error if it’s not possible.

Q. Share Replication. In what mode, Async and/or Sync?

A. We plan to support both, and the choice of which is used will be up to the administrator. Communication about which is used and any relevant information like RPO time would be out of band from Manila. The goal of the feature in Manila is to make Manila able to configure the replication relationship, and able to initiate failovers. The intention is for planned failovers to be disruptive but with no data loss, and for unplanned failovers to be disruptive, with data loss corresponding to the RPO that the administrator configured (which would be zero for synchronous replication).

Q. Can you point me to any resources with SNIA available for OpenStack? Where can I download document, videos, etc?

A. You can find several informative OpenStack on-demand Webcasts on the SNIA BrightTalk channel here.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

See SNIA at OpenStack Summit Tokyo

Are you headed to the OpenStack Summit in Tokyo later this month? If so, I encourage you to stop by two “Birds of a Feather” (BoF) sessions I’ll be hosting on behalf of SNIA. Here’s the info on both of them:

Extending OpenStack Swift with S3 and CDMI Interfaces – Tues. Oct. 27th 11:15 a.m.

Cloud application developers using the OpenStack infrastructure are demanding implementations of not just the Swift API, but also the S3 defacto and CDMI standard APIs. Each of these APIs not only offers features in common, but also offers what appear to be unique and incompatible facilities. At this BoF, we’ll discuss how to: Implement a multi-API strategy simply and effectively, sensibly manage the differences between each of the APIs, map common features to each other, take advantage of each of the APIs’ strengths, avoid lowest common denominator implementations

Object Drive Integration with Swift – Thurs. Oct. 29th 9:00 a.m.

With the emergence of disk drives and perhaps solid state drives with Key/Value and other object interfaces, what are the implications on solution architectures and systems built around OpenStack Swift. One approach is termed “PACO” where the Object Node speaks Key/Value to the drive and is hosted with other Swift Services. Are there other approaches to this? Are you developing products or solutions based on Object Drives? Come to this BoF to discuss these issues with fellow developers.

I expect both of these BoFs will be full of lively discussions around standards, emerging technologies, challenges, best practices and more. If you have any questions about these sessions or about work that SNIA is doing, do not hesitate to contact me. I hope to see you in Tokyo!