Quantum HPC Cloud Platform

Today we are more than excited to unveil the fruit of a long development to adapt our HPC Portal to the cloud allowing you to deploy computing clusters directly on Amazon Web Services pre-configured with your favorite CAE/CFD/FEA application: the Quantum HPC Cloud Platform     Powered by AWS Cloud Computing   Among the currently available applications we are focusing mostly on ANSYS and SIEMENS (previously CD-ADAPCO) but shortly we hope on bringing in more applications and more versions of each one.

The concept is simple: instead of sending batch jobs somewhere on the internet, you gain control of your own private and isolated cluster for a pre-defined period . Choose the instance type (determining the number of cores and memory per instance), the number of instance and the application you want on each node and you get access to your own pre-configured cluster running on Linux.

Since you control your own cluster, hardware and jobs are not bound to each other and you can decide to cancel a job and restart it on the same cluster without having to re-transfer your files or use the output of your first job to start another one.   Deploy and control several clusters

Control and access all your clusters in one place

 

Job Scheduling

An headnode is deployed alongside your cluster (without additional fees) to handle scheduling and licensing transfer (more info will come on this). Each cluster is running under PBS Pro to manage jobs and communication between nodes. You are free to use the cluster for one job spawning all the nodes or several jobs on multiple nodes.  

Storage

A local SSD is attached to your cluster as a scratch folder to gain maximum read and write speed during your computation. Currently under development will come 2 other types of cluster storage:

  • A performance mode joining 2 SSDs in a RAID-0 configuration
  • A dynamic mode to allow the scratch storage to grow dynamically with your data

When the cluster is set to be destroyed, the scratch disk is retrieved and you still have complete access to the data while avoiding being charged for the machines.  

Desktop

Currently under development is the ability to deploy a Desktop Instance alongside your cluster to gain access to the GUI version of your application. CAE applications can all run in batch but sometimes you prefer having visual control over your computation. A desktop instance will leverage different CLIENT-SERVER configurations, for example the Client-Server configuration of Star-CCM+ with a server running on computational-optimized hardware and the GUI running on the 3D-accelerated desktop, or ANSYS Workbench with the RSM. The Desktop instance will come under 2 flavors:

  • A regular software-rendered desktop,
  • A 3D-accelerated Desktop using VirtualGL to render OpenGL application using an NVIDIA GPU

To connect to your desktop a small package containing TurboVNC will be available to access your session under a secure SSH tunnel.  

A 3D Remote session under TurboVNC with VirtualGL using the GPU

 

Software

  • Each node is running CentOs 7.3 with the XFCE desktop,
  • Job scheduling is performed by PBS Pro Community Edition,
  • The remote session is handled by TurboVNC server and the OpenGL acceleration by VirtualGL,
  • The SSH tunnel to access your desktop and transfer licenses is performed with the Powershell build of OpenSSH

 

Cost-control

If you are like me, the idea of leaving your credit card number on a website ready to be charged whenever is something that you profoundly dislike. It’s full of surprises and awful for budget planning.

At Quantum HPC, we wanted to go with something a little bit different that it’s why we have based our billing model on a token system : tokens are purchased beforehand and allocated to a specific cluster. When deploying a cluster, you know exactly how much tokens will be required to go to the end of the requested time. Those tokens are now reserved and you are free to deploy another cluster with the remaining of your tokens without the risk to go over-budget, to forget to close a machine or to be out of tokens. If you decide to destroy your cluster before the allocated time, the balance is automatically put back into your token amount.

This pattern could be quite counterproductive when it comes to CAE applications since most of the time it is difficult to evaluate how much time is necessary for a computation to converge but you can still require more time than necessary and destroy the cluster when you see fit. Soon more capabilities will be implemented to control the automatic management of a cluster such as automatically destroying a cluster at the end of a computation or automatically expanding the time.  

Preview Phase

We are currently entering a preview Phase with selected clients and hope to be able to open the platform to public access in the following time. Do not hesitate to contact us for more information.   See below how easy it is to deploy a new Cluster on the cloud on our Platform:  

Share this post

Comments (0)

Leave a comment