Connector and Cable differences: SAS-3 vs. SAS-4

By: David Einhorn, SCSI Trade Association Board of Directors; Business Development Manager, North America, Amphenol Corp., June 14, 2022

This blog post examines the differences between SAS-3 and SAS-4 connectors and cables. With the new generation of SAS, we see multiple upgrades and improvements.

Drive connector
[Note: 24G SAS uses the SAS-4 physical layer, which operates at a baud rate of 22.5Gb/s.]

The 29-position receptacle and plug connectors used in SAS-4 feature: hot-plugging, blind-mating, connector misalignment correction, and a PCB retention mechanism for robust SMT attachment. The connectors are SATA compliant and available across many suppliers in range of vertical and right-angle configurations. Typical applications are consistent with previous generations of server and storage equipment, HDDs, HDD carriers, and SSDs.

Read More

Training Deep Learning Models Q&A

The estimated impact of Deep Learning (DL) across all industries cannot be understated. In fact, analysts predict deep learning will account for the majority of cloud workloads, and training of deep learning models will represent the majority of server applications in the next few years. It’s the topic the SNIA Cloud Storage Technologies Initiative (CSTI) discussed at our webinar “Training Deep Learning Models in the Cloud.” If you missed the live event, it’s available on-demand at the SNIA Educational Library where you can also download the presentation slides. The audience asked our expert presenters, Milind Pandit from Habana Labs Intel and Seetharami Seelam from IBM several interesting questions. Here are their answers: Q. Where do you think most of the AI will run, especially training? Will it be in the public cloud or will it be on-premises or both [Milind:] It’s probably going to be a mix. There are advantages to using the public cloud especially because it’s pay as you go. So, when experimenting with new models, new innovations, new uses of AI, and when scaling deployments, it makes a lot of sense. But there are still a lot of data privacy concerns. There are increasing numbers of regulations regarding where data needs to reside physically and in which geographies. Because of that, many organizations are deciding to build out their own data centers and once they have large-scale training or inference successfully underway, they often find it cost effective to migrate their public cloud deployment into a data center where they can control the cost and other aspects of data management. [Seelam]: I concur with Milind. We are seeing a pattern of dual approaches. There are some small companies that don’t have the right capital necessary nor the expertise or teams necessary to acquire GPU based servers and deploy them. They are increasingly adopting public cloud. We are seeing some decent sized companies that are adopting this same approach as well. Keep in mind these GPU servers tend to be very power hungry and so you need the right floor plan, power, cooling, and so forth. So, public cloud definitely helps you have easy access and to pay for only what you consume. We are also seeing trends where certain organizations have constraints that restrict moving certain data outside their walls. In those scenarios, we are seeing customers deploy GPU systems on-premises. I don’t think it’s going to be one or the other. It is going to be a combination of both, but by adopting more of a common platform technology, this will help unify their usage model in public cloud and on-premises. Q. What is GDR? You mentioned using it with RoCE. [Seelam]: GDR stands for GPUDirect RDMA. There are several ways a GPU on one node can communicate to a GPU on another node. There are three different ways (at least) of doing this: The GPU can use TCP where GPU data is copied back into the CPU which orchestrates the communication to the CPU and GPU on another node. That obviously adds a lot of latency going through the whole TCP protocol. Another way to do this is through RoCEv2 or RDMA where CPUs, FPGAs and/or GPUs actually talk to each other through industry standard RDMA channels. So, you send and receive data without the added latency of traditional networking software layers. A third method is GDR where a GPU on one node can talk to a GPU on another node directly. This is done through network interfaces where basically the GPUs are talking to each other, again bypassing traditional networking software layers. Q. When you are talking about RoCE do you mean RoCEv2? [Seelam]: That is correct I’m talking only about RoCEv2. Thank you for the clarification. Q. Can you comment on storage needs for DL training and have you considered the use of scale out cloud storage services for deep learning training? If so, what are the challenges and issues? [Milind]: The storage needs are 1) massive and 2) based on the kind of training that you’re doing, (data parallel versus model parallel). With different optimizations, you will need parts of your data to be local in many circumstances. It’s not always possible to do efficient training when data is physically remote and there’s a large latency in accessing it. Some sort of a caching infrastructure will be required in order for your training to proceed efficiently. Seelam may have other thoughts on scale out approaches for training data. [Seelam]: Yes, absolutely I agree 100%. Unfortunately, there is no silver bullet to address the data problem with large-scale training. We take a three-pronged approach. Predominantly, we recommend users put their data in object storage and that becomes the source of where all the data lives. Many training jobs, especially training jobs that deal with text data, don’t tend to be huge in size because these are all characters so we use object store as a source directly to read the data and feed the GPUs to train. So that’s one model of training, but that only works for relatively smaller data sets. They get cached once you access the first time because you shard it quite nicely so you don’t have to go back to the data source many times. There are other data sets where the data volume is larger. So, if you’re dealing with pictures, video or these kinds of training domains, we adopt a two-pronged approach. In one scenario we actually have a distributed cache mechanism where the end users have a copy of the data in the file system and that becomes the source for AI training. In another scenario, we deployed that system with sufficient local storage and asked users to copy the data into that local storage to use that local storage as a local cache. So as the AI training is continuing once the data is accessed, it’s actually cached on the local drive and subsequent iterations of the data come from that cache. This is much bigger than the local memory. It’s about 12 terabytes of cache local storage with the 1.5 terabytes of data. So, we could get to these data sets that are in the 10-terabyte range per node just from the local storage. If they exceed that, then we go to this distributed cache. If the data sets are small enough, then we just use object storage. So, there are at least three different ways, depending on the use case on the model you are trying to train. Q. In a fully sharded data parallel model, there are three communication calls when compared to DDP (distributed data parallel). Does that mean it needs about three times more bandwidth? [Seelam]: Not necessarily three times more, but you will use the network a lot more than you would use in a DDP. In a DDP or distributed data parallel model you will not use the network at all in the forward pass. Whereas in an FSDP (fully sharded data parallel) model, you use the network both in forward pass and in backward pass. In that sense you use the network more, but at the same time because you don’t have parts of the model within your system, you need to get the model from the other neighbors and so that means you will be using more bandwidth. I cannot give you the 3x number; I haven’t seen the 3x but it’s more than DDP for sure. The SNIA CSTI has an active schedule of webinars to help educate on cloud technologies. Follow us on Twitter @SNIACloud and sign up for the SNIA Matters Newsletter, so that you don’t miss any.                      

Web 3.0 – The Future of Decentralized Storage

Decentralized storage is bridging the gap between Web 2.0 and Web 3.0, and its impact on enterprise storage is significant. The topic of decentralized storage and Web 3.0 will be the focus of an expert panel discussion the SNIA Networking Storage Forum is hosting on June 1, 2023, “Why Web 3.0 is Important to Enterprise Storage.” In this webinar, we will provide an overview of enterprise decentralized storage and explain why it is more relevant now than ever before. We will delve into the benefits and demands of decentralized storage and discuss the evolution of on-premises, to cloud, to decentralized storage (cloud 2.0). We will also explore various use cases of decentralized storage, including its role in data privacy and security and the potential for decentralized applications (dApps) and blockchain technology. Read More

It’s A Wrap – But Networking and Education Continue From Our C+M+S Summit!

Our 2023 SNIA Compute+Memory+Storage Summit was a success! The event featured 50 speakers in 40 sessions over two days. Over 25 SNIA member companies and alliance partners participated in creating content on computational storage, CXL™ memory, storage, security, and UCIe™. All presentations and videos are free to view at www.snia.org/cms-summit. “For 2023, the Summit scope expanded to examine how the latest advances within and across compute, memory and storage technologies should be optimized and configured to meet the requirements of end customer applications and the developers that create them,” said David McIntyre, Co-Chair of the Summit.  “We invited our SNIA Alliance Partners Compute Express Link™ and Universal Chiplet Interconnect Express™ to contribute to a holistic view of application requirements and the infrastructure resources that are required to support them,” McIntyre continued.  “Their panel on the CXL device ecosystem and usage models and presentation on UCIe innovations at the package level along with three other sessions on CXL added great value to the event.” Read More

Storage Threat Detection Q&A

Stealing data, compromising data, and holding data hostage have always been the main goals of cybercriminals. Threat detection and response methods continue to evolve as the bad guys become increasingly sophisticated, but for the most part, storage has been missing from the conversation. Enter “Cyberstorage,” a topic the SNIA Cloud Storage Technologies Initiative recently covered in our live webinar, “Cyberstorage and XDR: Threat Detection with a Storage Lens.” It was a fascinating look at enhancing threat detection at the storage layer. If you missed the live event, it’s available on-demand along with the presentation slides. We had some great questions from the live event as well as interesting results from our audience poll questions that we wanted to share here. Q. You mentioned antivirus scanning is redundant for threat detection in storage, but could provide value during recovery. Could you elaborate on that? Read More

Scaling Management of Storage and Fabrics

Composable disaggregated infrastructures (CDI) provide a promising solution to address the provisioning and computational efficiency limitations, as well as hardware and operating costs, of integrated, siloed, systems. But how do we solve these problems in an open, standards-based way? DMTF, SNIA, the OFA, and the CXL Consortium are working together to provide elements of the overall solution, with Redfish® and SNIA Swordfish™ manageability providing the standards-based interface. The OpenFabrics Alliance (OFA) is developing an OpenFabrics Management Framework (OFMF) designed for configuring fabric interconnects and managing composable disaggregated resources in dynamic HPC infrastructures using client-friendly abstractions. Want to learn more? Read More

Questions & Answers from our January 2023 Webcast: Storage Trends in 2023 and Beyond

These questions were asked and mostly answered during our webcast, Storage Trends in 2023 and Beyond. Graphics included in this article were shown during the webcast, and several of the questions refer to the data in the charts.

Thank you to our panelists:

Don Jeanette, Vice President, TRENDFOCUS
Patrick Kennedy, Principal Analyst, ServeTheHome
Rick Kutcipal, At-Large Director, SCSI Trade Association and Product Planner, Data Center Solutions Group, Broadcom

Q1: What does the future hold for U.3? (SFF-TA-1001) Was it included in the U.2 numbers?

U.2 should be U.X in the pie chart. There are some shipments out there today, customers are taking it, and it will likely grow — but all the efforts and priorities are really E3S, E1S and then, to some extent, E1,L.

Read More

24G SAS: an Overview of the Technology & Products

Hyperscale and enterprise data centers continue to grow rapidly and to use SAS products as a backbone. Why SAS, and what specific SAS products are helping these data centers to grow? This article briefly discusses the technology evolution of SAS, bringing us to our latest generation of 24G SAS. We will examine recent market data from TRENDFOCUS, underscoring the established and growing trajectory of SAS products. We will highlight our latest plugfest, which smoothed the way for 24G SAS to seamlessly enter the existing data storage ecosystem. Finally, we will help the reader to understand the availability, breadth, and depth of 24G SAS products that are available today, and where you can get those products.

Read More

Survey Says…Here are Data & Cloud Storage Trends Worth Noting

With the move to cloud continuing, application modernization, and related challenges such as hybrid and multi-cloud adoption and regulatory compliance requirements, enterprises must ensure they understand the current data and storage landscape. The SODA Foundation’s annual comprehensive global survey on data and storage trends does just that, providing a comprehensive look at the intersection of cloud computing, data and storage management, the configuration of environments that end-user organizations are gravitating to, and priorities of selected capabilities over the next several years On April 13, 2023, SNIA Cloud Storage Technologies Initiative (CSTI) is pleased to host SODA in a live webcast “Top 12 Trends in Data and Cloud Storage” where SODA members who led this research will share key findings. I hope you will join us for a live discussion and in-depth look at this important research to hear the trends that are driving data and storage decisions, including: Read More

50 Speakers Featured at the 2023 SNIA Compute+Memory+Storage Summit

SNIA’s Compute+Memory+Storage Summit is where architectures, solutions, and community come together. Our 2023 Summit – taking place virtually on April 11-12, 2023 – is the best example to date, featuring a stellar lineup of 50 speakers in 40 sessions covering topics including computational storage real-world applications, the future of memory, critical storage security issues, and the latest on SSD form factors, CXL™, and UCIe™. “We’re excited to welcome executives, architects, developers, implementers, and users to our 11th annual Summit,” said David McIntyre, C+M+S Summit Co-Chair, and member of the SNIA Board of Directors.  “We’ve gathered the technology leaders to bring us the latest developments in compute, memory, storage, and security in our free online event.  We hope you will watch live to ask questions of our experts as they present, and check out those sessions you miss on-demand.” Read More