Here we have an sbatch-file that runs PDAL on a singular las or laz file and outputs into a new file with the same name in the folder output. This will serve as an example to explain typical commands in sbatch-files.
Submission script
#!/bin/bash
#SBATCH -A hpc2nXXXX-XXX
#SBATCH --reservation=CFL
#SBATCH --job-name=clean_data
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=32G
#SBATCH --time=01:00:00
#SBATCH --mail-type=FAIL,REQUEUE
#SBATCH --mail-use=email@address.com
ml GCC/13.2.0 OpenMPI/4.1.6 PDAL/2.9.0
filename=$(basename "$1")
mkdir output
pdal translate $1 "output/$filename" --json outlier_cleaning_ground_csf.json
Commands explained
Scheduler directives
All of the commands preprended by #SBATCH are directives to the SLURM-scheduler.
-A hpc2nXXXX-XXX
Associates the job with a specific project assigned through SUPR.
--reservation=CFL
Runs the job under CFL, if the project is allocated towards it.
--job-name=clean_data
Readable name if checking job status.
--ntasks=1
Number of parallel tasks. Since PDAL uses a single core we only allocate one.
--cpus-per-task=1
How many CPU cores each task may use. You can set more when using multithreaded processes.
--mem=32G
Total memory requested. Adjust based on input size and PDAL pipeline memory needs.
--time=01:00:00
Maximum wall‑clock runtime. The job is killed if it exceeds this. Choose a safe upper bound; you can request more if needed.
--mail-type=FAIL,REQUEUE
When to send e‑mail alerts (if job fails or is requeued). Useful to keep track of your jobs!
--mail-use=email@address.com
Destination address for the alerts. Replace with your actual address.
Environment Setup
To load PDAL some extra modules are needed.
ml GCC/13.2.0 OpenMPI/4.1.6 PDAL/2.9.0
ml(or module load) pulls the specified software stack into the environment:- GCC 13.2.0 – C/C++ compiler required by many scientific libraries.
- OpenMPI 4.1.6 – MPI implementation; needed if you later run a distributed PDAL job.
- PDAL 2.9.0 – The point‑cloud processing library you’ll invoke.
You don’t have to know what GCC and OpenMPI is used for specifically, but they are needed for PDAL to work.
Extracting filename and making output folder
The following command gets the filename from the input path and makes the folder output. We need this in the PDAL command. The variable $1 is simply the first input when running the script (explained below).
filename=$(basename "$1")
mkdir output
PDAL Command
pdal translate "$1" "output/$filename" \
--json outlier_cleaning_ground_csf.json
pdal translate – Runs a PDAL pipeline that reads an input point cloud, applies filters, and writes the result.
"$1" – Positional parameter 1, which is our input file. This is explained more below.
"output/$filename" – Destination path. The output/ directory must exist (create it beforehand or add mkdir -p output). $filename comes from the previous step.
--json outlier_cleaning_ground_csf.json – Supplies a JSON pipeline definition, which is explained in the PDAL documentation.
Running the command
To run the command, you can go to the folder of the script in a terminal and run as
sbatch submit_script.sbatch data/file1.las
where data/file1.las is an example input file. This could be e.g. a las-file stored in your storage project.
Extra: run as a loop
In this case, we run one job that handles one file. However, we often want to run preprocessing on many smaller las-files, and merge them later. To do this, we can run a loop in the terminal that submits a job per file.
for f in data/*.las; do sbatch submit_script.sbatch $f; done
This iterates through all files in data that ends with .las and submits a job per each file. Important to test the single command above before to check that it all works! Otherwise you might submit many jobs that all fail.