Edição atual tal como às 10h52min de 4 de abril de 2024

Cluster Lovelace - Instituto de Física UFRGS

The cluster is located at Instituto de Física da UFRGS, in Porto Alegre.

Management Committee


The cluster is managed by professors representing the fields of Astronomy, Theoretical Physics, and Experimental Physics, in addition to an IT department employee from the Physics Institute.

Astronomy: Rogério Riffel

Theoretical Physics: Leonardo Brunnet

Experimental Physics: Pedro Grande

TI employee: Gustavo Feller

Users Committee


Users have two channels for communication/discussion: 

1) The fis-linux-if@grupos.ufrgs.br mailing list

2) Direct messages to the IT department via the email fisica-ti@ufrgs.br.

Infraestruture

Management Software

The system of queues and scheduling of tasks is controlled by the Slurm Workload Manager.


Number of jobs per user controlled on demand.

Number of users on 1/24/2023: 150

Account request: mail to fisica-ti@ufrgs.br

Hardware in lovelace nodes

CPU: Ryzen (32 and 2*24 cores) + AMD 16 cores
RAM: 64 GB each
GPU: Three nodes with NVIDIA CUDA
Storage: storage Dell 12TB 
Conection inter-nodes: Gigabit

Installed Software

OS: Debian 12 
Basic packages installed:
gcc
gfortran
python: torch, numba
julia
conda
compucel3d
espresso
gromacs
lammps
mesa
openmpi
povray
quantum-espresso
vasp

Rules for scheduling, access control, and usage of the research infrastructure

Online scheduling

The cluster is accessible using the UFRGS virtual prived network (vpn) through server lovelace.if.ufrgs.br.

To access through a unix-like system use:

ssh <user>@lovelace.if.ufrgs.br

Under windows you may configure winscp to enter the address lovelace.if.ufrgs.br.

If you are not registered, ask for registration sending an email to fisica-ti@ufrgs.br

Using softwares in the cluster

To execute a software in a cluster job this program must:

1. Be already installed

OR

2. Be copied to the user home

Ex:

scp my_programm <user>@cluster-slurm.if.ufrgs.br:~/

If you are compiling your program in the cluster, one option is to use gcc.

Ex:

scp -r source-code/ usuario@cluster-slurm.if.ufrgs.br:~/
ssh <user>@cluster-slurm.if.ufrgs.br:~/
cd source-code
gcc main.c funcoes.c

This will generate file a.out, which is the executable.

Being accessible by methods 1 or 2, the program can be executed in the cluster through one JOB.

OBS: If you execute your executable without submitting as JOB, it will be executed in the server, not in the nodes. This is not recommended since the server computational capabilities are limited and you will be slowing down the server for everyone else.

Criating and executing a Job

Slurm manages jobs and each job represents a program or task being executed.

To submit a new job, you must create a script file describing the requisites and characteristics of the Job.

A typical example of the content of a submission script is below

Ex: job.sh

#!/bin/bash 
#SBATCH -n 1 # Number of cpus to be allocated (Despite the # these SBATCH lines are compiled by the slurm manager!)
#SBATCH -N 1 # Nummber of nodes to be allocated  (You don't have to use all requisites, comment with ##)
#SBATCH -t 0-00:05 # Limit execution time (D-HH:MM)
#SBATCH -p long # Partition to be submitted
#SBATCH --qos qos_long # QOS 
  
# Your program execution commands
./a.out

In option --qos, use the partition name with "qos_" prefix:

partition: short -> qos: qos_short -> limit 2 weeks

partition: long -> qos: qos_long -> limit de 3 month

If you run on GPU, specify the "generic resource" gpu in cluster ada:

#!/bin/bash 
#SBATCH -n 1 
#SBATCH -N 1
#SBATCH -t 0-00:05 
#SBATCH -p long 
#SBATCH --qos qos_long # QOS 
#SBATCH --gres=gpu:1
  
# Comandos de execução do seu programa:
./a.out

To ask for a specific gpu:

#SBATCH --constraint="gtx970"

To submit the job, execute:

sbatch job.sh

Usefull commands

To list jobs:

 squeue

To list all jobs running in the cluster now:

 sudo squeue

To delete a running job:

 scancel [job_id]

To list available partitions:

 sinfo

To list gpu's in the nodes:

 sinfo -o "%N %f"

To list characteristic of all nodes:

 sinfo -Nel

@@ Linha 1: / Linha 1: @@
-= Clusters Ada and Lovelace - Instituto de Física UFRGS =
+= Cluster Lovelace - Instituto de Física UFRGS =
-The clusters are located at Instituto de Física da UFRGS, in Porto Alegre.
+The cluster is located at Instituto de Física da UFRGS, in Porto Alegre.
+== Management Committee ==
+<pre>
+The cluster is managed by professors representing the fields of Astronomy, Theoretical Physics, and Experimental Physics, in addition to an IT department employee from the Physics Institute.
+Astronomy: Rogério Riffel
+Theoretical Physics: Leonardo Brunnet
+Experimental Physics: Pedro Grande
+TI employee: Gustavo Feller
+</pre>
+== Users Committee ==
+<pre>
+Users have two channels for communication/discussion:
+) The fis-linux-if@grupos.ufrgs.br mailing list
+) Direct messages to the IT department via the email fisica-ti@ufrgs.br.
+</pre>
 == Infraestruture ==
 === Management Software ===
+The system of queues and scheduling of tasks is controlled by the [https://slurm.schedmd.com/ Slurm Workload Manager].
 <pre>
-Slurm Workload Manager
-Site :https://slurm.schedmd.com/
+Number of jobs per user controlled on demand.
-</pre>
-=== Hardware in the  ada nodes ===
+Number of users on 1/24/2023: 150
-<pre>
+Account request: mail to fisica-ti@ufrgs.br
-CPU: 16 nodes x86_64
-RAM: varies between 8 GB - 16 GB
-GPU: 3 nodes have NVIDIA CUDA
-Storage: storage with 50GB  quota per user
 </pre>
-=== Hardware in the lovelace nodes ===
+=== Hardware in lovelace nodes ===
 <pre>
-CPU: Ryzen (32 and 2*24 cores)
+CPU: Ryzen (32 and 2*24 cores) + AMD 16 cores
 RAM: 64 GB each
-GPU: two nodes have NVIDIA CUDA
+GPU: Three nodes with NVIDIA CUDA
-Storage: storage with 50GB  quota per user
+Storage: storage Dell 12TB
 Conection inter-nodes: Gigabit
 </pre>
-=== Software in the nodes ===
+=== Installed Software ===
 <pre>
-OS: Debian 8 (in cluster ada)
+OS: Debian 12
-OS: Debian 11 (in cluster lovelace)
 Basic packages installed:
- GCC
+gcc
- gfortran
+gfortran
- python2
+python: torch, numba
- python3
+julia
+conda
+compucel3d
+espresso
+gromacs
+lammps
+mesa
+openmpi
+povray
+quantum-espresso
+vasp
 </pre>
-== How to use ==
+== Rules for scheduling, access control, and usage of the research infrastructure ==
-=== Conect to  cluster-slurm ===
-The clusters are accessible through server cluster-slurm.if.ufrgs.br (ou ada.if.ufrgr.br). To access through a unix-like system use:
+=== Online scheduling ===
-<pre>
-ssh <user>@cluster-slurm.if.ufrgs.br
-</pre>
-or
+The cluster is accessible using the  UFRGS virtual prived network ([https://www1.ufrgs.br/CatalogoServicos/servicos/servico?servico=3178 vpn]) through server lovelace.if.ufrgs.br.
+To access through a unix-like system use:
 <pre>
-ssh <user>@ada.if.ufrgs.br
+ssh <user>@lovelace.if.ufrgs.br
 </pre>
-Under windows you may use winscp.
+Under windows you may configure winscp to enter the address lovelace.if.ufrgs.br.
 If you are not registered, ask for registration sending an email to fisica-ti@ufrgs.br
 === Using softwares in the cluster ===
@@ Linha 79: / Linha 108: @@
 </pre>
-If you are compiling your program in the cluster, one option is to user <code>gcc</code>.
+If you are compiling your program in the cluster, one option is to use <code>gcc</code>.
 Ex:
@@ Linha 92: / Linha 121: @@
 Being accessible by methods 1 or 2, the program can be executed in the cluster through one <strong>JOB</strong>.
 OBS: If you execute your executable without submitting as <strong>JOB</strong>, it will be executed in the server, not in the nodes. This is not recommended since the server computational capabilities are limited and you will be slowing down the server for everyone else.
 === Criating and executing a Job ===

Cluster: mudanças entre as edições

Edição atual tal como às 10h52min de 4 de abril de 2024

Índice

Cluster Lovelace - Instituto de Física UFRGS

Management Committee

Users Committee

Infraestruture

Management Software

Hardware in lovelace nodes

Installed Software

Rules for scheduling, access control, and usage of the research infrastructure

Online scheduling

Using softwares in the cluster

Criating and executing a Job

Usefull commands

Menu de navegação

Ações da página

Ações da página

Ferramentas pessoais

Navegação

Pesquisa

Ferramentas