Storage is an important component underpinning artificial intelligence ( #AI ) and other emerging technologies with similar infrastructure demands, according to @Robert Lee, VP and chief architect at @Pure Storage, and therefore needs to be included in discussions about such technologies. Lee told ZDNet that significant advancements in technology -- particularly around #parallelisation, #compute, and #networking -- enable new algorithms to apply more compute power against data. "Historically, the limit to how much data has been able to be processed, the limit to how much insight we've been able to garner from data has been bottlenecked by storage's ability to keep the compute fed," said Lee, who previously worked at Oracle before joining Pure Storage in 2013. "Somewhere around the early 2000s, the hardware part of compute, CPUs started getting more parallel. It started doing multi-socket architectures, hyper threading multi-core. Fast-forward a couple of years beyond that, applications, software started getting more parallel. Things like distributed computing, scale-out systems, parallelisation started becoming more prevalent." Enterprises increasingly realised that building out larger compute clusters does not generate better results because the additional compute hardware just sits idly behind storage, Lee said. "It wasn't until we came out with FlashBlade that storage was able to keep up with that parallelism," he claimed. In many ways, the hardware advances that have enabled emerging technologies such as AI to take route is just another form of compute parallelism, Lee added. "We see storage being able to provide that parallelism and that amount of performance and bandwidth as being a key enabler to moving data to compute to drive useful insights," he said. After all, according to Lee, some of the biggest challenges enterprises face today are around amassing large datasets and feeding them into compute for analysis and pattern recognition. "Fundamentally, the more varied data sets that you can provide into AI systems and machine learning and training systems, the better results you're going to get ... in any space, whether it's autonomous driving, natural language processing, facial recognition," he said. Lee said previous storage systems were designed around the physics of spinning media. "We see this to be [especially] true in the file and object and unstructured space where historically, the performance that has been waiting to be unlocked, has actually been trapped behind software," he said. However, hardware is merely one component of the challenge, Lee said. The challenge of building high-performance storage systems that work with flash media is really one that needs to be solved using software. "Removing all of the extra components that you find typically within an SSD and directly writing software to work with hardware that is giving us direct access to those flash chips, has allowed us to drive much better performance as well as much better longevity and efficiency out of the flash usage," he said. "You need to design software systems, you need to rethink how storage controller software is written for that media. The performance characteristics that we're able to drive out of our products ... there's a delicate dance and tight integration between software that's purpose-built for the media and hardware that's [designed] to accelerate the software." Without storage hardware and software working effectively in tandem, data cannot be used to full effect. This is especially important because AI systems and many other emerging technologies depend on the effective utilisation of data, according to Lee. He additionally said that if data is replacing oil as the most valuable resource in the world, then data is more like crude oil that needs to be refined. It's the enterprises that are able to "refine" the data by applying a combination of modern compute, storage, networking, and analytics technologies to extract insights out of that data that will be able to stay ahead of the game, Lee said. One of Pure Storage's customers is autonomous driving technology provider Zenuity, a joint venture between Volvo and Autoliv. The enterprise storage company helped Zenuity build up a reference architecture and production machine learning pipeline for some of its autonomous driving models. Zenuity deployed FlashBlade units, alongside Nvidia DGX training servers and a number of compute nodes to drive this pipeline, which includes the basic setup around data collection and management. "They need to keep those GPUs fed with data and FlashBlade is able to offer enough bandwidth and performance to keep those GPUs fed and to keep their machine learning researchers efficient and busy," Lee said. Earlier this year, Pure Storage announced the launch of a 75-blade all-flash system that operates as one unit, along with a number of software updates. Pure's systems are connected to its Pure1 cloud and collect 1 trillion data points a day, more than 7PB of telemetry data, and thousands of connected arrays. This sensor network provides data to a new Pure1 global dashboard that aggregates information on a storage array fleet. Pure Storage additionally rolled out Meta, a global predictive intelligence system that can be used to manage, analyse, and support its arrays. The company's VP of Product Matt Kixmoeller previously said Meta is really "an evolution of the IoT platform we built from day one since all of our systems had call home sensors". Meta is also a realisation that machine learning has to do the heavy lifting when it comes to understanding workload performance, with Kixmoeller saying that digesting thousands of measurements to predict workloads is "the perfect problem for machine learning, since AI can run scenarios over and over". "Machine learning is the big use case that's driving flash adoption," he said in June.
No comments:
Post a Comment