Our “Storage Life on the Edge” webcast series continued on March 22, 2022 where our expert panelists, Stephen Bates, Bill Martin, Mayank Saxena and Tong Zhang highlighted several real-world edge use cases, the implications for storage, and the benefits of computational storage standards. You can access the on-demand session and the presentation slides at the SNIA Educational Library. The panel answered several questions during the live event. We only had time to get to a handful. As promised, here are answers to all of them.
Q. I have heard NVMe® is developing an open and vendor-neutral standard for computational storage devices. How important do you think standards like this one are for mass adoption of these types of devices on the edge and why?
A. Yes, NVMe is working to develop an architectural model for NVMe-based computational storage devices. The specifics of this are under development, but it will lead to new commands in NVMe that pertain to computation. Standards like this are of vital importance to the adoption of computational storage at the edge since it will lead to a rich ecosystem of software and allow for the multi-sourcing of computational storage devices from multiple vendors.
Q. Computation storage devices come in three main forms. Computational storage processor, computational storage drive and computational storage array. How do you see each of these being deployed on the edge and why?
A. I think we can expect to see all three types of computational storage devices in the edge. Computational storage drives combine the storage of an SSD with compute power. This will be very useful in the edge where physical space is a very real constraint. That said, computational storage processors will also have a role as they separate the compute element from the storage while providing peer-to-peer communication between the compute element and storage at the edge, and that can be desirable in certain instances. Finally, the computational storage array is appealing because it is a plug and play solution for computational storage that can be inserted into a 1U or 2U rack space and consumed via standards based APIs.
Q. What percentage of data at the edge have you experienced to be compressible? Can you provide some examples of edge use cases which have a high percentage of compressible data and some examples which have low percentages, and comment on the specific percentages? How does this affect the capacities of storage devices in these use cases?
A. Except image/video, most other data at edge tend to have decent compressibility. Experience shows that we may expect 2:1~4:1 compression ratio in general. Example edge use cases with highly compressible data are time series data from various IoT devices, and most edge database and data analytics systems. Typically low compressibility is caused by the use of special-purpose compression (e.g., JPEG and H.264) before data storage. By leveraging the good runtime data compressibility, computational storage drives with built-in transparent compression could very well contribute to lowering the TCO and power consumption of edge infrastructure.
Q. Will applications push encrypt/decrypt keys to the computational storage processor? Or is there pre-configuration and storage of keys?
A. It can support both models. In the traditional PKI model, keys can be stored in trusted platform module (TPM) at the edge server following certificate signing request (CSR) process, tied to some root certificate, which can be revoked when needed. There can also be an encryption key per IO, and these keys can be managed and rotated by hosts. Notably, there are a lot of innovations happening in the field of data security benefiting the edge security directly e.g. ICN (Information-Centric Network). With ICN, data can be secured at packet level with ephemeral keys agnostic to transport protocol. Computational storage can perform such encryptions near to data without involving CPU, increasing data sanctity and performance.
Q. Given the heterogeneity nature of the edge data and system, how can computational storage be a value add?
A. Heterogeneity is indeed the central nature of edge, which is very important to understand. At the edge, data is at the heart – everything else is just peripheral. There are always two primary things to consider i.e. TCO & compute for data. It is becoming more apparent for edge servers and gateways that data processing compute should be done at the edge where data is ingested. Offloading of repetitive data-intensive processing tasks can help reduce the cost and improve the ecosystem for protocol processing and governance in these heterogenous environments.
Now with this, one can have storage with a standard interface for everyday data-intensive tasks, serving varied use cases, which can be plugged into any compute entity i.e. from Rasberry Pi to 1U server in the cloudlet datacenter. That’s powerful.
Q. Would it make sense to use a computational storage drive (CSD) for general-purpose programmable computation for edge? If yes, would using embedded CPUs inside CSDs be more efficient than using external CPUs?
A. Yes, compared with external host CPUs, embedded processors inside computational storage drives tend to be much less powerful and have much less cache memory. So it does not make much sense if we only want to off-load some computation-intensive tasks into the embedded processors on a computational storage drive. However, because the computational storage drive could integrate customized hardware engines for functions like compression, encryption, and data filtering, it still makes sense to off-load certain programmable computation into the computational storage drive if it involves certain pre- or post-processing that could leverage those customized hardware engines.
Q. What could a computation storage drive do to seamlessly contribute to reducing power consumption in edge environments?
A. The low-hanging fruit here is for the computational storage drive to carry out internal transparent lossless data compression. By reducing the data volume through compression, we will write much a smaller amount of data into NAND flash memory. We know that writing data into NAND flash memory is the most energy-consuming operation inside of a storage drive. So in-storage transparent compression could seamlessly reduce the power consumption.
Remember, this is a series. If you missed the introduction “Storage Life on the Edge: Managing Data from the Edge to the Cloud and Back,” you can view it on-demand here. I also encourage you to register for the next session in this series on April 27, 2022 “Storage Life of the Edge: Security Challenges” where our security experts will discuss the multitude of security challenges created by the edge. I hope you’ll join us.