Overview

One of the benefits of using the cloud to handle large computing workloads is virtually limitless scalability. In Microsoft Azure, you can create a cluster of virtual machines (VMs) networked to form a high-performance computing (HPC) cluster in a matter of minutes. If you need more computing power than the cluster can provide, you can scale up by creating a cluster with larger and more capable virtual machines (more cores, more RAM, etc.), or you can scale out by creating a cluster with more nodes.

In this lab, you will create a Linux cluster consisting of three virtual machines, or nodes — one master node and two worker nodes — and run a Python script on it to convert a batch of color images to grayscale. You will get first-hand experience deploying HPC clusters in Azure as well as managing and using the nodes in a cluster. And you will learn how easy it is to bring massive computing power to bear on problems that require it when you use the cloud.

To distribute the workload among all the nodes and cores in each cluster, the Python code that you will run to convert the images uses the Simple Linux Utility for Resource Management, also known as the SLURM Workload Manager or simply SLURM. SLURM is a free and open-source job scheduler for Linux that excels at distributing heavy computing workloads across clusters of machines and processors. It is used on more than half of the world's largest supercomputers and HPC clusters, and it enjoys widespread use in the research community for jobs that require significant compute resources.

Objectives

In this hands-on lab, you will learn how to:

  • Create a SLURM cluster using an Azure Resource Manager template
  • Copy local resources to a SLURM cluster
  • Remote into a SLURM cluster
  • Run jobs on a SLURM cluster
  • Start and stop nodes in a SLURM cluster
  • Use the Azure Resource Manager to delete a SLURM cluster

Prerequisites

The following are required to complete this hands-on lab:

Resources

Click here to download a zip file containing the resources used in this lab. Copy the contents of the zip file into a folder on your hard disk.


Exercises