Taking AI and HPC out of the Laboratory – Panasas show the power of automation and hardware abstraction to bring HPC and AI processing within the reach and capability of the Enterprise
The growth in demand for Big Data analytics and Artificial Intelligence (AI) is moving the High-Performance Computing (HPC) required, to process the data and perform the calculations, out of the laboratory and into the Enterprise. That does bring with it some challenges.
Simply put, the challenge is managing the trade-offs between performance, cost and complexity. Traditionally, HPC systems have been tuned to within an inch of their lives to deliver the particular performance the application required. In a laboratory or research environment, a few users performing large, complex tasks meant that individual, siloed systems were the norm. Now, the power of both Big Data and AI makes it more likely that a wider set of users, in enterprise environments, will want to use HPC for a range of, often smaller and shorter, processing tasks with smaller data sets in more of a shared service environment.
However fast and powerful your processors, the bottleneck tends to be your storage sub-systems. A combination of Non-Volatile Memory express (NVMe) and Solid-State Disk (SSD) is seen as the go-to configuration for AI training systems to ensure as low latency as possible to keep the GPUs busy. More traditional HPC systems have relied on NAS storage systems to deliver the bandwidth and resilience that they need. Panasas have been delivering storage systems designed specifically to meet HPC requirements for around 20 years. Their direct parallel file system overcomes some of the latency issues inherent in NAS storage systems by separating out metadata management from storage management and enabling parallel access to all the disk drives the requested data is held on.
So far, so good. But how do they handle the very different requirements of AI? From a performance point of view, the Panasas ActiveStor solution has been extensively re-engineered to take advantage of further abstraction of functions into software and enable the Metadata Director to use NVMe and SSD. This reduces latency and making it more attractive for AI training systems.
At Bloor we focus very heavily on how technology vendors help businesses become and remain mutable. It’s all very well providing storage systems that handle both HPC and AI workloads. But if they need constant reconfiguration every time the workload changes, the IT operations overhead will become untenable for most enterprises. They then won’t be able to react quickly enough to their changing business environments. So, I am intrigued by their development of automatic workload configuration and reconfiguration capabilities. Panasas certainly trumpet this “load it and leave it” capability and a range of customer testimonials attest to the efficacy of this facility.
Naturally, there is a compromise here. I talked up front about individual configurations being tuned within an inch of their lives for very specific use cases. There is bound to be some level of lower performance when you are effectively trying to tune for multiple different use-cases. Panasas did accept that view but felt that the performance delta was not big. Indeed, the ability to handle multiple shared AI and HPC workloads in a scale out system means that there is less need to configure for the peak, which drives greater utilisation and lower overall costs.
I’m sure the HPC and AI purists, working in labs on very large, long-term projects will view this as dumbing down the technology. But for the majority of businesses, desperate to gain the competitive advantages that HPC and AI system could bring, but who have limited budgets and fairly thin IT resources, the Panasas ActiveStor solution is something they should consider.