Author: AlexMcDonald
Q&A on All Things iSCSI
New White Paper: An Updated Overview of NFSv4
Maybe you’ve asked yourself recently; “Hmm, I wonder what’s new in NFSv4?” Maybe (and more likely) you haven’t; but you should.
During the last few years, NFSv4 has become the version of choice for many users, and there are lots of great reasons for making the transition from NFSv3 to NFSv4. Not the least of which is that it’s a relatively straightforward transition.
But there’s more; NFSv4 offers features unavailable in NFSv3. Parallelization, better security, WAN awareness and many other features make it suitable as a file protocol for the next generation of applications. As a proof point, lately we’ve seen new clients of NFSv4 servers beyond the standard Linux client, including support in VMware’s vSphere for virtual machine datastores accessible via NFSv4.
In this updated white paper, An Updated Overview of NFSv4, we explain how NFSv4 is better suited to a wide range of datacenter and high performance compute (HPC) uses than its predecessor NFSv3, as well as providing resources for migrating from v3 to v4.
You’ll learn:
- How NFSv4 overcomes statelessness issues associated with NFSv3
- Advantages and features of NFSv4.1 & NFSv4.2
- What parallel NFS (pNFS) and layouts do
- How NFSv4 supports performant WAN access
We believe this document makes the argument that users should, at the very least, be evaluating and deploying NFSv4 for use in new projects; and ideally, should be using it wholesale in their existing environments. The information in this white paper is meant to be comprehensive and educational and we hope you find it helpful.
If you have questions or comments after reading this white paper, please comment on this blog and we’ll get back to you as soon as possible.
OpenStack Manila – A Q&A on Liberty and Mitaka
Our recent Webcast with OpenStack Manila OpenStack Manila Project Team Lead (PTL), Ben Swartzlander, generated a lot of great questions. As promised, we’ve complied answers for all of the questions that came in. If you think of additional questions, please feel free to comment on this blog. And if you missed the live Webcast, it’s now available on-demand.
Q. Is Hitachi Data Systems contributing to the Manila project?
A. Yes, Hitachi contributed a new driver and also contributed a major new feature (migration) during Liberty. HDS was also active during the Kilo release with a different driver which is unfortunately no longer maintained.
Q. EMC has open sourced ViPER as CopperHD. Do you see any overlap between Manila/Cinder from one side and CopperHD from the other hand?
A. I’m not familiar enough with CoprHD to answer authoritatively, but I understand that there is definitely some overlap between it and Cinder, and I also expect there is some overlap with Manila. Assuming there is some overlap, I think that’s a great thing because competition within open source drives greater quality, and it’s confirmation that there is real demand for what we’re building.
Q. Could Manila be used stand-alone (without OpenStack) to create a fileshare server?
A. Yes, the only OpenStack service Manila depends on is Keystone (for authentication). Running Manila in a stand-alone fashion is a specific use case the team supports.
Q. If we are mapping the snap shot images what is the guarantee for data integrity?
A. Snapshots are typically crash-consistent copies of the filesystem at a point in time. In reality the exact guarantee depends on the backend used, and that’s something we’d like to avoid, so that the snapshot semantics are clear to the user. In the future, backends which cannot meet the crash-consistent guarantee will probably be forced to advertise a different capability so end users are aware of what they’re getting.
Q. Is there manila automation with ansible?
A. As far as I know this hasn’t been done yet.
Q. For kilo deployed in production does it work for all commercial drivers or is there a chart that says which commercial drivers support kilo?
A. The developer doc now has a table which attempts to answer this question. However, the most reliable way to see which drivers are part of the stable/kilo release would be to look at the driver directory of the code. This is an area where the docs need to improve.
Q. Could you explain consistency groups?
A. Consistency groups are a mechanism to ensure that 2 or more shares can be snapshotted in a single operation. Without CGs, you can take 2 snapshots of 2 shares but there is no guarantee that those snapshots will represent the same point in time. CGs allow you to guarantee that the snapshots are synchronized, which makes it possible to use multiple shares together for a single application and to take snapshots of that application’s data in a consistent way.
Q. How is the consistency group in Manila different from Cinder? Is it similar?
A. The designs are very similar. There are some semantic differences in terms of how you modify the membership of the CGs, but the snapshot functionality is identical.
Q. Are you considering pNFS, but I guess this will be hard since it has req. on the client as well?
A. Manila is agnostic to the data protocol so if the backend supports pNFS and Manila is asked to create an NFS share, it may very well get a share with pNFS support. Certainly Manila supports shares with multiple export locations so that on a system with multiple network interfaces, or a clustered system, Manila will tell the clients about all of the paths to the share. In the future we may want Manila to actually know the capabilities of the backends w.r.t. what version of NFS they support so that if a user requires a minimum version we can guarantee that they get that version or get a sensible error if it’s not possible.
Q. Share Replication. In what mode, Async and/or Sync?
A. We plan to support both, and the choice of which is used will be up to the administrator. Communication about which is used and any relevant information like RPO time would be out of band from Manila. The goal of the feature in Manila is to make Manila able to configure the replication relationship, and able to initiate failovers. The intention is for planned failovers to be disruptive but with no data loss, and for unplanned failovers to be disruptive, with data loss corresponding to the RPO that the administrator configured (which would be zero for synchronous replication).
Q. Can you point me to any resources with SNIA available for OpenStack? Where can I download document, videos, etc?
A. You can find several informative OpenStack on-demand Webcasts on the SNIA BrightTalk channel here.
What to Expect from OpenStack Manila Liberty
On October 7, 2015, the SNIA Ethernet Storage Forum is pleased to present its next live Webcast on OpenStack Manila. Manila is the OpenStack file share service that provides the management of file shares (for example, NFS and CIFS) as a core service to OpenStack. Intended to be an open-standards, highly-available and fault-tolerant component of OpenStack, Manila also aims to provide API-compatibility with popular systems like Amazon EC2.
I will be moderating this Webcast, presented by the OpenStack Manila Project Team Lead (PTL), Ben Swartzlander, who will dive into:
- An overview of Manila
- New features that are being delivered for OpenStack Liberty (due October 2015)
- A preview of Makita
With Liberty availability due next month, this information is extremely timely; I encourage you to register now to block your calendar. This will be a live and interactive Webcast, please bring your questions. I look forward to “seeing” you on October 7th.
NFS 4.2 Q&A
We received several great questions at our What’s New in NFS 4.2 Webcast. We did not have time to answer them all, so here is a complete Q&A from the live event. If you missed it, it’s now available on demand.
Q. Are there commercial Linux or windows distributions available which have adopted pNFS?
A. Yes. RedHat RHEL6.2, Suse SLES 11.3 and Ubuntu 14.10 all support the pNFS capable client. There aren’t any pNFS servers on Linux so far; but commercial systems such as NetApp (file pNFS), EMC (block pNFS), Panasas (object pNFS) and maybe others support pNFS servers. Microsoft Windows has no client or server support for pNFS.
Q. Are we able to prevent it from going back to NFS v3 if we want to ensure file lock management?
A. An NFSv4 mount (mount -t nfs4) won’t fall back to an nfs3 mount. See man mount for details.
Q. Can pNFS metadata servers forward clients to other metadata servers?
A. No, not currently.
Q. Can pNfs provide a way similar to synchronous writes? So data’s instantly safe in at least 2 locations?
A. No; that kind of replication is a feature of the data servers. It’s not covered by the NFSv4.1 or pNFS specification.
Q. Does hole punching depend on underlying file system in server?
A. If the underlying server supports it, then hole punching will be supported. The client & server do this silently; a user of the mount isn’t aware that it’s happening.
Q. How are Ethernet Trunks formed? By the OS or by the NFS client or NFS Server or other?
A. Currently, they’re not! Although trunking is specified and is optional, there are no servers that support it.
Q. How do you think vVols could impact NFS and VMware’s use of NFS?
A. VMware has committed to supporting NFSv4.1 and there is currently support in vSphere 6. vVols adds another opportunity for clients to inform the server with IO hints; it is an area of active development.
Q. In pNFS, does the callback call to the client must come from the original-called-to metadata server?
A. Yes, the callback originates from the MDS.
Q. Is hole punched in block units?
A. That depends on the server.
Q. Is there any functionality like SMB continuous availability?
A. Since it’s a function of the server, and much of the server’s capabilities are unspecified in NFSv4, the answer is – it depends. It’s a question for the vendor of your server.
Q. NFS has historically not been used in large HPC cluster environments for cluster-wide storage, for performance reasons. Do you see these changes as potentially improving this situation?
A. Yes. There’s much work being done on the performance side, and the cluster parallelism that pNFS brings will have it outperform NFSv3 once clients employ more of its capabilities.
Q. Speaking of the Amazon adoption for NFSv4.0. Do you have insight / guess on why Amazon did not select NFSv4.1, which has a lot more performance / scalability advantages over NFSv4.0?
A. No, none at all.
New Webcast: Block Storage in the Open Source Cloud called OpenStack
On June 3rd at 10:00 a.m. SNIA-ESF will present its next live Webcast “Block Storage in the Open Source Cloud called OpenStack.” Storage is a major component of any cloud computing platform. OpenStack is one of largest and most widely supported Open Source cloud computing platforms that exist in the market today. The OpenStack block storage service (Cinder) provides persistent block storage resources that OpenStack Nova compute instances can consume.
I will be moderating this Webcast, presented by a core member of the OpenStack Cinder team, Walt Boring. Join us, as we’ll dive into:
- Relevant components of OpenStack Cinder
- How block storage is managed by OpenStack
- What storage protocols are currently supported
- How it all works together with compute instances
I encourage you to register now to block your calendar. This will be a live and interactive Webcast, please bring your questions. I look forward to “seeing” you on June 3rd
Next Webcast: What’s New in NFS 4.2
We’re excited to announce our next ESF Webcast on NFSv4.2. With NFSv4.1 implemented on several commercial NFS systems, an established Linux client and a new pNFS Linux server, there is a continued growth of NFS usage in the IT industry. NFSv4.1, first introduced in 2010, meets many needs in the modern datacenter, but there are still technologies and advanced techniques that NFS developers want to deliver.
Join me and J Metz on April 28th at 10:00 a.m. PT as we’ll cover a brief update of where we are with NFSv4.1 and more detail on the proposed features for NFSv4.2 that are currently being ratified at the IETF. This will be a live, interactive session. Register now and please bring your questions.
If you need a primer on NFS before this event, I encourage you to check our 4-part Webcast mini-series available on demand:
Register today. I hope to see you on April 28th.
Object Storage 101 – Questions and Answers
At our recent live ESF Webcast, “Object Storage 101,” we talked about the what, how, and why behind storage technologies. Over 200 people attended the event. If you missed it, it’s now available on-demand. It was an interactive session and we did not have time to address all the questions, so here are answers to them all. If you think of additional questions, please feel free to comment on this blog.
Q. Would Object Storage be a feasible solution for only the nearline storage tier?
Typically Yes. If we think about the latency needed for real-time transactions, these are best served using a cache storage tier such as NAND or large arrays of RAM. Object stores are excellent methods to store and retrieve large data sets within single/multiple containers. Note: most systems support offset reads so you don’t need to access an entire object to get to the section of interest.
Q. Where is the index to find the location of an object that is stored? Is it stored locally or stored distributedly or replicated among each clusters?
Storage of the Index or Metadata of objects that are stored, if used, typically is replicated throughout the system. Also, if the Metadata is lost, typically, these can be re-built as a maintenance function.
Q. How is the object stored/broken up? Aside from being stored by metadata (like name, size, etc) … what is the process of the fragmentation…breaking it up …as described during this erasure coding segment? Once it’s assigned some unique identifier … ie. an x-ray picture…. how is it addressed? (if not by block/bit/byte/level)?
Currently, Objects are stored using one of two methods of data protection either Replication or Erasure Coding. Some systems use both. That said, there are several algorithms used today to Erasure Code protect Objects. When using Reed-Solomon methods, you need to specify the number of “Data” Fragments and the number of the “Parity” fragments that will be created. The Size of each “Data” fragment is closely related to the Object size divided by the number of “Data” fragments requested. Each “Parity” fragment will be same size of each of the “Data” fragments created. The protected Object size is the sum of the “Data” fragments plus the “Parity” fragments created. Each of these fragments (Data and Parity) is stored on a different server for the purpose of avoiding a single point failure. The application that created the Object that will be accessing the Object store is responsible for keeping track of the ID of the Object and the Namespace the ID was stored in. Typically the Application will create an ID however, when an Application “Puts” an Object using an existing ID, the older stored Object using that same ID is overwritten. Typically, access into an Object Store using a RESTful Interface using commands like “Put, Get, Delete, List” over HTTP.
Q. Will Object storage drive network scale—further adoption of 10GE and 40GE or is 1GE enough?
Yes. If we think about the interconnection between the Control Plane and Data Plane of these systems (Orchestration and Object Storage Devices), better the connectivity the higher the performance.
Q. Is the number of fragments set or configurable? What are the trade-offs of requiring fewer fragments for recovery besides perhaps processing overhead? Are there any gotchas to watch out for/consider?
Yes. Storage policies are configurable. The number of “Parity” fragments defines the data loss risk. The more “Parity” fragments requested the lower this risk but this increases the storage resource needed for the Object. Eliminating single point failures is a key consideration. For example, if your Object Storage system has 10 servers, a storage policy using 9 of 12 will have 2 fragments of this Object located on 2 servers. In this case any single server failure would not cause data loss but may cause higher latency. However, if 3 servers would fail, you would lose access to your data until the servers were recovered. If the drives of the failed servers were not recovered then data loss would occur.
Q. Is erasure encoding used instead of Hash tagging?
No. Hash Tagging is a method of generating a unique number given a specific input of data, this number is used to find the location of the Object to be stored. Erasure Coding is the method used to create the fragments. So think of Hash tag as the seed to the address needed to find the fragments.
Q. How large are the fragments?
A rough estimate is the Object size divided by the number of fragments to re-hydrate the object. (e.g. 1GByte Object stored using a 8 of 12 policy would have a fragment size of 1GByte/8 =~ 125MByte
Q. What do you see as the requirement for the interconnect between the Object storage arrays/boxes to be? Very large pipes as in multiple 40G links or something lower?
It depends on the use case or Service Level Objective for the system. If your system design uses a Proxy service and Erasure Coding, then your back end network throughput (the network connecting the Proxy and Object Storage Devices – Storage Servers) will aggregated (Multiply). In this case the network throughput is based on the number of “Data” fragments being used. If you use Replication, then the back end network throughput will not aggregate. This multiplication factor, if present, is key to an efficient network strategy. In Non-Proxy based Object Storage designs or replication based Object Storage systems the network strategy will scale with network bandwidth to the limitation of the HDDs ability to server data.
Q. What about access control and security at the object level? Is that typically part of the model?
Typically, access control methods are at the gateway or entry point of a Namespace. The access method used is up to the vendor of the Object Store.
Q. What is the presentation mode at the host level? i.e. a drive mapping or similar
Typically presentation methods are a RESTful API via HTTP. This used “PUT, GET, DELETE, LIST” semantics.
Q. Can you explain the differences/similarities between object storage, CDMI and software defined storage?
Object Stooge defined a system (Software + Hardware) to storage Objects. CDMI defends a method used to access/connect your application to an Object Storage system. Software Defined Storage describes using standard high volume servers with software for the purpose of storing data.
Q. Why can’t a traditional approach be used to Object Storage for its durability?
Traditional storage approaches such as direct attached storage (RAID Sets) do not scale. Once you run out of space, managing additional storage on separate systems becomes the issue.
Q. Aren’t all types of data going to need the accessibility required by users? For example, isn’t everything going to need to placed in an object store?
There is a lot of debate on this issue. The goal of an Object Store is two fold. 1) Drive down the cost/Byte and 2) keep content readily accessible.
Q. How to we avoid losing the Metadata from the data? Also, is there something like sub-meta data, where a small amount of Metadata is contained within the data and the larger Metadata is stored somewhere else?
Some Object storage systems support Extended File Attributes, which is a file system feature that allows the Applications to store “Metadata” about an Object which is then bound to the Object within the storage environment. These Extended File Attributes (XATTR’s) can be queried separately and can be used by your application as you see fit. The management of the XATTR’s is handled by the local file system and accessed by the Object Storage software via the RESTful API using HTTP.
Q. Is maintaining multiple copies mainly for durability or can it be used for performance enhancement (parallel access), or is that irrelevant?
Absolutely! Management of copies/replicas can serve multiple purposes. Replication across racks, datacenters, geographies, etc. can provide resiliency against failures at those levels. Replication can also be used to provide object access in close proximity to the requester. In the X-ray example discussed in the Webcast, we might set up a replica local to the medical practice for the first 90 days, in order to provide a low latency (time to first byte) copy during the initial treatment. Additional copies can be kept at remote sites in order to provide fault tolerance.
Q. Is there a standard methodology for migrating from a file-system based methodology to an object store?
The short answer is no. In general an application that is currently developed to use file or block based storage will need to be re-architected in order to take advantage of an object storage system/service. There is, however, a growing category of products referred to as “cloud gateways” that can provide a bridge to object storage by presenting a filesystem to the existing application, while writing and reading via a RESTful API to a backend object storage system/service.
Q. Is it safe to say that in order to use object storage the application needs to be “object storage aware”? Unlike a traditional storage where the application doesn’t necessarily need to be familiar with the storage or file system since that is handled at a lower layer.
Yes, however as indicated in the question regarding migration of applications above, it is possible to implement a “cloud gateway” solution that will provide the translation from RESTful API to a CIFS/NFS fileshare, thus not requiring any application changes. I would disagree with the premise that traditional applications don’t need to be familiar with the underlying storage. Traditional file-based applications must understand the location (fileserver, folder, filename, etc.) in order to gain access to the appropriate data.
Q. I’m hearing a lot of ‘what’ and ‘how’ but not so much ‘why’ about object storage. Can we hear some real-world examples of applications in industry today that are running better because of object storage?
An example of an application running today with object storage behind it, and why: Web Based Media Asset Management/Distribution. This particular use case tends to deal with billions of files/objects that can vary in size from very small thumbnail images to massive 4k HD movie files. The ability to deliver these to multiple platforms (phone, laptop, set top box, etc.) across multiple geographies is something that is well suited for object storage. Traditional file and/or block based storage environments may hit scale limitations in dealing with the number of files/objects, in addition the ability to have a single namespace maintained across multiple locations/datacenters is something that is exceedingly complex for storage environments other than object stores.
Q. Replicating an object two or three times would exponentially increase storage costs, wouldn’t it? The more copies the higher the costs?
Certainly more copies would use more storage, and as a result most object stores provide different durability schemes based upon the performance/availability tradeoffs the data owner is willing to make. Recovering a single object from a replica is significantly faster than rebuilding an object from geo-distributed EC fragments. Also, as discussed in the question above related to replicas to drive performance, replication can serve the purpose of placing objects as close to the consumer as possible, minimizing time to first bye and increasing the overall throughput of an application.
Q. If I have an app that access a CIFS share, is there a way to translate it into object store?
Please see answer to question: “Is there a standard methodology for migrating from a file-system based methodology to an object store?” Short answer: Yes, via a “cloud gateway” product.
Q. Is there a confluence point of Object and File based storage – specifically in NAS where object storage can be multi-protocol (NFS, and REST)?
While there are some object storage solutions that provide their own native cloud-gateway capability (NAS protocol to the application, RESTful API to the object store). There are very few that provide a “file/object duality” capability allowing applications to manipulate an object as both an object and a file.