Here we have an sbatch-file that runs PDAL on a singular las or laz file and outputs into a new file with the same name in the folder output. This will serve as an example to explain typical commands in sbatch-files.

Submission script

#!/bin/bash
#SBATCH -A hpc2nXXXX-XXX
#SBATCH --reservation=CFL
#SBATCH --job-name=clean_data
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=32G
#SBATCH --time=01:00:00
#SBATCH --mail-type=FAIL,REQUEUE
#SBATCH --mail-use=email@address.com

ml GCC/13.2.0 OpenMPI/4.1.6 PDAL/2.9.0

filename=$(basename "$1")
mkdir output

pdal translate $1 "output/$filename" --json outlier_cleaning_ground_csf.json

Commands explained

Scheduler directives

All of the commands preprended by #SBATCH are directives to the SLURM-scheduler.

-A hpc2nXXXX-XXX

Associates the job with a specific project assigned through SUPR.

--reservation=CFL

Runs the job under CFL, if the project is allocated towards it.

--job-name=clean_data

Readable name if checking job status.

--ntasks=1

Number of parallel tasks. Since PDAL uses a single core we only allocate one.

--cpus-per-task=1

How many CPU cores each task may use. You can set more when using multithreaded processes.

--mem=32G

Total memory requested. Adjust based on input size and PDAL pipeline memory needs.

--time=01:00:00

Maximum wall‑clock runtime. The job is killed if it exceeds this. Choose a safe upper bound; you can request more if needed.

--mail-type=FAIL,REQUEUE

When to send e‑mail alerts (if job fails or is requeued). Useful to keep track of your jobs!

--mail-use=email@address.com

Destination address for the alerts. Replace with your actual address.

Environment Setup

To load PDAL some extra modules are needed.

ml GCC/13.2.0 OpenMPI/4.1.6 PDAL/2.9.0
  • ml (or module load) pulls the specified software stack into the environment:
  • GCC 13.2.0 – C/C++ compiler required by many scientific libraries.
  • OpenMPI 4.1.6 – MPI implementation; needed if you later run a distributed PDAL job.
  • PDAL 2.9.0 – The point‑cloud processing library you’ll invoke.

You don’t have to know what GCC and OpenMPI is used for specifically, but they are needed for PDAL to work.

Extracting filename and making output folder

The following command gets the filename from the input path and makes the folder output. We need this in the PDAL command. The variable $1 is simply the first input when running the script (explained below).

filename=$(basename "$1")
mkdir output

PDAL Command

pdal translate "$1" "output/$filename" \
    --json outlier_cleaning_ground_csf.json

pdal translate – Runs a PDAL pipeline that reads an input point cloud, applies filters, and writes the result.

"$1" – Positional parameter 1, which is our input file. This is explained more below.

"output/$filename" – Destination path. The output/ directory must exist (create it beforehand or add mkdir -p output). $filename comes from the previous step.

--json outlier_cleaning_ground_csf.json – Supplies a JSON pipeline definition, which is explained in the PDAL documentation.

Running the command

To run the command, you can go to the folder of the script in a terminal and run as

sbatch submit_script.sbatch data/file1.las

where data/file1.las is an example input file. This could be e.g. a las-file stored in your storage project.

Extra: run as a loop

In this case, we run one job that handles one file. However, we often want to run preprocessing on many smaller las-files, and merge them later. To do this, we can run a loop in the terminal that submits a job per file.

for f in data/*.las; do sbatch submit_script.sbatch $f; done

This iterates through all files in data that ends with .las and submits a job per each file. Important to test the single command above before to check that it all works! Otherwise you might submit many jobs that all fail.

Updated: