Aura
The
aura.fi.muni.cz
server is available to FI staff and PhD students for longer term, more demanding or GPU computations. For study or research purposes, FI staff can request access to others via
unixVgm-J1JZt@fifKudm5KRT.muni=6dOsqa9-.cz
. Access to the Aura server is only possible from the MU network.
Hardware configuration
The
Aura
server is built on the
Asus RS720A-E11-RS24U platform in the following configuration:
- two 64-core AMD EPYC 7713 2.0 GHz processors ( 128 physical cores and 256 threads in total).
- 2 TiB DDR4 RAM 3200 MHz
- 10 Gbps Ethernet connection
- 2 SATA SSDs with 960 GB capacity in RAID 1
- 2 6 TB NVMe drives in RAID 1
- 2 NVIDIA A100 80 GB PCIe GPU cards with NVLink
- Red Hat Enterprise Linux operating system
See also the blog post introducing this server.
How to work on compute servers
We recommend you also familiarize yourself with general information about running compute.
Run long-running processes (an hour or more)
at a reduced priority (in the range 10-19, 19 being the lowest), for example
nice ./your_program
or
nice -n 15 ./your_program
.
To change the priority of an already running process, you can use the command
renice
command, but beware that a process may be running on multiple threads and changing the priority for one process may change the priority of only one thread. For example, you can get a listing of all the threads of your processes, including priority, as follows:
ps x -Lo pgid,pid,tid,user,nice,tty,time,args
You can run short-term processes or interactive debugging of your programs with normal priority.
If your process does not adhere to the priority constraint and uses a large amount of computing power, all your processes will be set to the lowest priority 19 to prevent other users from being constrained. Repeated or more serious violations of this rule may result in temporary disabling of your faculty account.
Memory limitations using systemd
The upper limit of memory usage on the system can be determined using the command below. When this limit is exceeded, the OOM mechanism will be triggered to attempt to terminate the appropriate process.
systemctl show -p MemoryMax user-$UID.slice
However, you can create your own systemd scope in which a more stringent (lower) limit on usable memory can be set:
systemd-run --user --scope -p MemoryMax=9G program
The program can also be a command line (e.g.
bash
). The memory limit will be applied to it and all its children together. This is different from the
ulimit
mechanism, where the limit applies to each process separately.
Monitoring both the created scope and the user scope can be useful:
# monitoring of the memory and CPU usage of your processes
systemd-cgtop /user.slice/user-$UID.slice
Resource constraints using
ulimit
Resource limiting commands:
# limit available resources
help ulimit
# cap the size of virtual memory to 20000 kB
ulimit -v 20000
# cap the amount of total CPU time to 3600 seconds
ulimit -t 3600
# cap the number of concurrently running threads/processes
ulimit -u 100
The above commands limit the resources of
the shell and all its children to the specified values. These cannot be rolled back; another separate shell will need to be run to restore the environment without the limits set. Note, however, that the
resources set by
ulimit
apply to each process separately. Thus, if you set the limit to 20 MB of memory and run 10 processes in such an environment, they may allocate a
total of 200 MB of memory. If you just want to limit the total memory to 20 MB, use
systemd-run
.
Specific software
If you need to install libraries or tools for your work, you have several options (besides local compilation):
- if they are part of the distribution (
dnf search software-name
), you can ask the maintainer to install, - you can make a module,
- if it is a Python package, you can ask the maintainer to install it in a module
python3
. You can also install it locally usingpip/pip3 install --user
. If you usevirtualenv, conda
etc, we recommend installing the environment into/var/tmp/login
(see below for file lifetimes).
Disk capacities
For temporary data that should be quickly available locally, two directories are available on the Aura server.
- The directory
/tmp
is of type tmpfs. Due to its location in RAM, access is very fast, but the data is not persistent between server reboots and the capacity is very small. Do not store the results of calculations here, lest you cause system-wide problems when it gets full. - Directory
/var/tmp
is on a fast NVMe RAID 1 volume.
The advantage of using them, especially for I/O-intensive computations, is also lower network and server load with
home
and
data
storage.
To use this space, store your data in a directory with your login. Data that is not accessed (according to
atime
) is automatically deleted, for
/tmp
when it is a few days old, for
/var/tmp
when it is a few months old (see
/etc/tmpfiles.d/tmp.conf
for exact settings). Disk quotas do not apply here; however, be considerate of others in your use of space.
GPU calculations
The Aura server has two GPU cards, namely the NVIDIA A100 80 GB PCIe.
If you have suggestions about the functionality or how to work with the GPUs on Aura, would be happy to hear about them.
GPU computations on Aura are currently not system limited in any way, and it is important to be respectful of others.
Choosing a card
Compared to before, it is not downright problematic to run computations concurrently on a single GPU. If the GPU card is partitioned using MIG (Multi-Instance GPU) technology, then it is possible to have several non-interacting virtual GPUs (instances).
Before starting the computation, we need to set the environment variable
CUDA_VISIBLE_DEVICES
appropriately. You can use the information from
nvidia-smi
or also the tool to select a suitable value
nvisel
:
[user@aura ~]$ nvidia-smi
Thu Jan 18 10:48:06 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:21:00.0 Off | On |
| N/A 59C P0 236W / 300W | 35141MiB / 81920MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe On | 00000000:61:00.0 Off | 0 |
| N/A 43C P0 44W / 300W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+--------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+================================+===========+=======================|
| 0 3 0 0 | 25MiB / 19968MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 0 4 0 1 | 17577MiB / 19968MiB | 28 0 | 2 0 1 0 0 |
| | 2MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 0 6 0 2 | 17514MiB / 19968MiB | 14 0 | 1 0 1 0 0 |
| | 2MiB / 32767MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 0 11 0 3 | 12MiB / 9728MiB | 14 0 | 1 0 1 1 1 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
| 0 12 0 4 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 16383MiB | | |
+------------------+--------------------------------+-----------+-----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 4 0 902451 C ./computation0 17544MiB |
| 0 6 0 902453 C ./computation1 17494MiB |
+---------------------------------------------------------------------------------------+
Here in the first table we see a listing of the GPU cards and especially in the last field the information about enabling MIG (indicated by the words
Enabled
and
Disabled
). In this example, the GPU0 card is partitioned by MIG and the second card GPU1 is not partitioned.
In the second table, we can see the individual GPU instances with their allocated resources. If none of the cards are partitioned, the
MIG devices:
table is not displayed.
In the last table we see the running computations. In the example, computations are running on the partitioned GPU0 card on partitions 4 and 6. So we can choose from the unpartitioned GPU1 card, which has 80GB of memory, or the partitioned GPU0 card, which has free instances with GI 3 (20GB), 11 (10GB), and 12 (10GB).
If we chose an unpartitioned card we set
CUDA_VISIBLE_DEVICES
to the GPU index of the card (
CUDA_VISIBLE_DEVICES=1
). If we chose a partition with GI 11 (
MIG Dev
3) from a partitioned GPU0 card, we need to find out its UUID. To do this, use
nvidia-smi -L
.
[user@aura ~]$ nvidia-smi -L
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-309d72fd-b4f8-d6e8-6a66-e3f2253e8540)
MIG 2g.20gb Device 0: (UUID: MIG-ee0daf5f-9543-5e3f-8157-308a15c318b4)
MIG 2g.20gb Device 1: (UUID: MIG-fbb89bfe-6460-508c-ab51-9b961def7e01)
MIG 1g.20gb Device 2: (UUID: MIG-102d7a8b-5941-5275-be02-72ff5819ead4)
MIG 1g.10gb Device 3: (UUID: MIG-c4dc2f6b-2c55-566d-8738-fa8176580fda)
MIG 1g.10gb Device 4: (UUID: MIG-cd46e799-21e5-54d8-b751-f4a3afb52a46)
GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-04712e69-7356-4de5-f983-84083131460e)
We set the variable
CUDA_VISIBLE_DEVICES
to:
CUDA_VISIBLE_DEVICES=MIG-c4dc2f6b-2c55-566d-8738-fa8176580fda
Computation monitoring
We can monitor our computation either by using the
nvidia-smi
command or by using the interactive graphical tool
nvitop
or
nvtop
(it is recommended to have a larger terminal window than 80x24). Monitoring tools cannot display GPU usage (Util) for split cards and instead display
N/A
.
Changing GPU partitioning
If the existing configuration of MIG instances is not suitable for you, it can be changed by agreement at
unix4sB5c6F8d@fiWNEtN3EI7.muniZ0jmK2CCU.cz
if circumstances (other running computations) reasonably allow it.
Container support - Podman
For computations on the Aura server, there is also a tool called Podman that provides the same functionality as Docker. Each user is assigned a scope
subuid
and
subgid
according to their
UID
and can thus use
rootless containers. The scope is 100000 in size and starts from
UID*100000
. By default, containers are placed in
/var/tmp/containers/xlogin
. No quotas are currently applied on this volume, so please take care of your files and delete your containers properly if you no longer need the repository.
Podman and GPU
Unlike normal container launching, we need to specify which GPU partition we want to use. To set the selected GPU partition, instead of its
UUID
, the partition is specified in the format
GPU:Device
, where
GPU
and
Device
are the numbers from the
nvidia-smi -L
listing. To use GPU 0 and partition 4, we add
--device
nvidia.com/gpu=0:4
as a parameter.
Sample
If you would like to use JupyterLab with TensorFlow and GPU support, an example gpu-jupyter image can be found on the Docker Hub site (it is a large image, the first run may take time). On the Aura server you just need to run:
podman run --rm --security-opt label=disable -p 127.0.0.1:11000:8888 \
-v "${PWD}":/home/jovyan/work --device nvidia.com/gpu=0:4 \
cschranz/gpu-jupyter:v1.5_cuda-11.6_ubuntu-20.04_python-only
Beware, containers mapped in this way are only available from the Aura server. For wider availability, you need to create an SSH tunnel (
ssh -L
8888:localhost:11000 aura
) for example. The port can also be mapped using
-p 11000:8888
- it will then be accessible from the FI network.
The
-v "${PWD}":/home/jovyan/work
part maps the current working directory inside the container to
/home/jovyan/work
. This directory is used by JupyterLab as the working directory.
We can then verify the functionality using, for example, this code, which returns the number of available GPUs (the expected output is 1):
import tensorflow as tf
print("Number of available GPUs: ", len(tf.config.list_physical_devices('gpu')))
The next run is already faster, as the downloaded image is not automatically deleted after completion. To delete the remnants (unused images, containers, ...) we can use the command
podman system prune -a
.