The sections below outline details about the system hardware, software, and user management. Go to usage guide for more information on how to get access and how to use the system.
Hardware
The hardware’s physical location is at HPC2N and is managed by people employed at said infrastructure. It’s a smaller high-performance computing cluster with compute and storage nodes.
Compute
Compute consists of two nodes, each with
CPU — 64 core, 2.45(3.7) GHz AMD EPYC 9534, resulting in 15 Tflops/s
GPU — 16896 CUDA core Nvidia H100, resulting in 120 Tflops/s
RAM — 768GB RAM
Storage — 3.5 TB NVME
Network — Infiniband HDR200
Storage
Storage consists of five nodes, totalling in 2 Petabytes of storage
CPU — 8 core, 2.6 GHz Intel Xeon 4509Y
RAM — 128 GB RAM
Storage — 2x5.8 TB NVME & 400TB Spinning HDD in RAID6
Network — Infiniband HDR200 network switch
Compute and storage are both managed by a master node running Ubuntu.
Usage
The cluster is catered towards researchers and people in the private sector to process, analyse, train, model and use their forestry data, ranging from LiDAR to machine-specific sensor measurements. Examples include;
- Remote and smart forest inventory.
- AI-models for autonomous forestry machinery.
- Data-driven models for predicting ground conditions in forestry operation.
- etc.
Many of these tasks involve a large amount data that has to be processed. This means the system utilizes cutting-edge software and algorithms implemented by other developers or, if needed, developed by us in-house. A lot of these algorithms implement parallel processing to quickly handle large swaths of data. In recent history, the capacity for parallel processing provided by GPUs has shown promise in handling large amounts of geodata. However, slower memory bus transfer between the GPU and working memory compared to CPU is still an issue, especially as data can be distributed across several nodes in the compute chain — requiring different algorithmic solutions depending on the processing requirements. As both compute nodes have powerful CPUs and GPUs any algorithm required should be in theory suitable for our hardware.
The storage nodes provide a common storage space for users to offload large amounts of data for their own use or for sharing with other users, either directly openly or privately used in the training of e.g., ML models.
At the moment, there is no special system in place to handle the sharing of data between storage projects, but it can still be used as project repository for large amounts of data.
Environment
The master node runs Ubuntu Linux. This master nodes provides direct access to a partitioned part of the storage nodes, which the user can manipulate directly.
To use the compute nodes, one has to interface and send jobs (a process running some code on hardware) through the workload manager SLURM, which managed by HPC2N. This manager is necessary in a shared compute system to;
- Precisely specify the hardware needed to run software.
- Specify the software required for the run.
- Distribute compute between users so cores and compute time is efficiently and fairly shared.
The distribution is determined using a priority system. It is calculated based on how much a project is promised in compute time and how much has been used. This priority is used to put jobs on a queue where the first in queue gets to start.
Software
The clusters at HPC2N includes a large set of preinstalled software. In the case where the software does not exist on the cluster, the user can install the software themselves, or contact HPC2N to install the software, which typically takes a few days.
We are currently exploring the software usage required by the users of the cluster to determine what is essential to preinstall. However, we can help users in building portable container images containing their required software that can be used on the cluster or other machines.
User management
The clusters at HPC2N are managed through the SUPR NAISS user and project repository. This service provides logins, applications for compute and storage and support. When applying for a project, there must be a PI managing the application. With a project in place, group members/employees can be added which have their own login details, but with access to the same shared storage/compute resources. More information can be found in the usage guide.
In the project application the PI must decide on the compute requirements (core hours per month), storage requirements, typical usage and software requirements. More information is found in the usage guide, but don’t hesitate to contact us if you have any questions on how much to request for in the application.
Maintenance and security
The hardware and software installed on the cluster is be maintained by the staff at HPC2N, which work specifically to maintain the software and security of the system. The user information is handled by SUPR NAISS, adhering to GDPR guidelines.
The user is responsible for uploading and sharing their data in a manner that fits their specific privacy guidelines. E.g. managing storage projects so that only individuals with explicit data access are added to the project.