Colocation HPC? Why not.
High performance computing (HPC) used to only be within the reach of those with extremely deep pockets. The need for proprietary architectures and dedicated resources meant that everything from the ground up needed to be specially built.
This included the facility the HPC platform ran in – the need for specialised cooling and massive power densities meant that general purpose datacentres were not up to the job. Even where the costs of the HPC platform were just within reach, the extra costs of building specialised facilities counted against HPC being something for anyone who needed that extra bit of ‘oomph’ from their technology platform.
Latterly, however, HPC has moved from highly specialised hardware to more of a commoditised approach. Sure, the platform is not just a basic collection of servers, storage and network equipment, but the underlying components are no longer highly specific to the job.
This more standardised HPC platform, built on commodity CPUs, storage and network components, is within financial reach. This still leaves that small issue of how an organisation can countenance building a dedicated facility for a platform that may be out of data in just a couple of years?
For those with a more generic IT platform, colocation has become a major option for many. Offloading the building and its maintenance has obvious merit, especially for an organisation that is struggling to understand whether its own facility will grow or shrink in the future as equipment densities improve and more workloads move to cloud platforms.
However, the use of colocation for HPC is not so easy. The power, emergency power and cooling requirement needed for HPS will be beyond all but certain specialist co-location providers.
Hyper-dense HPC equipment needs high power densities – far more than your average colocation facility provides. For example, the average power per rack for a ‘standard’ platform rarely exceeds 8kW per rack – indeed, the average in colocation facilities is more like 5kW.
Now consider a dense HPC platform with energy needs of, say 12kW per rack. Can the colocation facility provide that extra power? Will it charge a premium price for routing more power to your system – even before you start using it? Will the multi-cabled power aggregation systems required provide power redundancy, or just more weak links in an important chain?
Also consider the future for HPC. What happens as density increases further? How about 20kW per rack? 30kW? 40kW? Can the colocation facility provider give guarantees that not only will it be able to route enough power to your equipment – but also that it has access to enough grid power to meet requirements?
What happens if there is a problem with grid power? With a general colocation facility, there will be some form of immediate failover power supply (generally battery, but sometimes spinning wheel or possibly – but very rarely – supercapacitors), which is then replaced by auxiliary power from diesel generators. However, such immediate power provision is expensive, particularly when there is a continuous high draw, as is required by HPC. Make sure that the provider not only has an uninterruptable power supply (UPS) and auxiliary power system in place, but that it is also big enough to provide power to all workloads running in the facility at the same time, along with overhead and enough redundancy to deal with any failure within the emergency power supply system itself. Also, make sure that it is not ‘just a bunch of batteries’: look for in-line power systems that smooth out any issues with the mains power, such as spikes, brown-outs and so on.
Remember that a lot of power also gets turned into heat. Hyper-dense HPC platforms, even where they are using solid state drives instead of spinning disks, will still produce a lot of heat. The facility must be able to remove that heat effectively.
Taking an old-style approach of volume cooling, where the air filling the facility is kept at a low temperature and sweeps through equipment to remove the heat which is then extracted outside the facility is unlikely to be good enough for HPC. Even hot and cold aisles may struggle if the cooling is not engineered well enough.
A colocation facility provider that supports HPC will understand this and will have highly targeted means of applying cooling to equipment where it is most needed.
HPC is moving to a price point where many more organisations can now consider it for their big data, IoT, analysis and other workloads. There are colocation providers out there who specialise in providing facilities that can support the highly-specialised needs of an ultra-dense HPC platform. It makes sense to search these providers out.
Quocirca has written a report on the subject, commissioned by NGD and Schneider. The report is available for free download here: http://www.nextgenerationdata.co.uk/white-papers/new-report-increasing-needs-hpc-colocation-facilities/