Skip to content

Minimal YAML Examples

This page gives small working shapes for common tasks.

Replace paths such as POSCAR, /path/to/vasp_std, /path/to/GPUMD/src, /path/to/nep.txt, /path/to/lmp, and MACE model paths with your local files.

Normal use:

pesmaker validate run.yaml
pesmaker next run.yaml

Do not add a workflow field for ordinary runs. PESMaker infers the flow from the sections present in the YAML.

Generate Structures First

Use this when you first want to generate structures, then decide how to label them.

project: 2D_Te_defect

structures:
  - POSCAR

generation:
  output_dir: generated
  supercell: [3, 3, 3]

Run:

pesmaker validate run.yaml
pesmaker next run.yaml

PESMaker generates generated/ and writes run.next.yaml. Edit that file before running the next stage:

project: 2D_Te_defect

labeling:
  engine: vasp
  output_dir: run_vasp_scf
  input_dir: generated
  incar: /path/to/INCAR
  potcar_library: /path/to/VASP/potentials
  command: /path/to/vasp_std

jobs:
  submit_command: sbatch
  cores_cpu: 36
  skip_completed: true
  check_scf_convergence: true
  sub_file: /path/to/sub.sh

Then continue:

pesmaker validate run.next.yaml
pesmaker next run.next.yaml

Generate Then VASP SCF

Use this when generated structures go directly to DFT labeling.

project: direct_scf

structures:
  - POSCAR

generation:
  output_dir: generated
  supercell: [3, 3, 3]

labeling:
  engine: vasp
  output_dir: labeling
  incar: templates/vasp/INCAR
  potcar_library: /path/to/VASP/potentials
  command: /path/to/vasp_std
  dataset_path: train.xyz

jobs:
  submit_command: sbatch
  cores_cpu: 36
  sub_file: templates/sbatch/vasp_cpu_36.sh

Put {command} in templates/sbatch/vasp_cpu_36.sh where VASP should run if you want PESMaker to render mpirun -np 36 /path/to/vasp_std. If the script contains a literal mpirun /path/to/vasp_std line, PESMaker keeps it unchanged.

Run:

pesmaker validate run.yaml
pesmaker next run.yaml

When next prints the submit command, run it. After VASP finishes, run pesmaker next run.yaml again.

GPUMD Sampling, Selection, SCF, Training

Use this when generated structures first seed MD sampling.

project: sampling_training

structures:
  - POSCAR

generation:
  output_dir: generated
  supercell: [3, 3, 3]

sampling:
  engine: gpumd
  output_dir: sampling
  gpumd_dir: /path/to/GPUMD/src
  potential: /path/to/nep.txt
  temperature: "300-1200"
  run_steps: 300000
  selection:
    trajectory_pattern: sampling/**/movie.xyz
    output_dir: selected
    max_count: 200
    min_distance: 0.2
    plot: true

labeling:
  engine: vasp
  output_dir: labeling
  incar: templates/vasp/INCAR
  potcar_library: /path/to/VASP/potentials
  command: /path/to/vasp_std
  dataset_path: train.xyz

training:
  model: nep
  output_dir: training
  dataset: train.xyz
  command: nep

jobs:
  submit_command: sbatch
  cores_cpu: 36
  sub_file:
    sampling: templates/sbatch/gpumd.sh
    labeling: templates/sbatch/vasp_cpu_36.sh
    training: templates/sbatch/nep.sh

Run:

pesmaker validate run.yaml
pesmaker next run.yaml

Then follow the Next block printed by next.

LAMMPS-MACE Sampling, Selection, SCF

Use this when generated structures first seed MACE-omat-small or another MACE MLIAP model through LAMMPS.

project: mace_sampling_training

structures:
  - POSCAR

generation:
  output_dir: generated
  supercell: [4, 4, 4]

sampling:
  engine: mace
  output_dir: sampling
  potential: /path/to/mace-omat-0-small.model-mliap_lammps.pt
  run_in: templates/lammps/in.run_mace_npt
  # Set true if your LAMMPS input is fully configured and should be copied
  # without PESMaker placeholder replacement or automatic MACE/NPT edits.
  # preserve_run_in: true
  temperature: "300-1200"
  selection:
    trajectory_pattern: sampling/**/*.lammpstrj
    output_dir: selected
    descriptor_model: /path/to/mace-omat-0-small.model
    min_distance: 0.0
    max_count: 200

labeling:
  engine: vasp
  output_dir: labeling
  incar: templates/vasp/INCAR
  potcar_library: /path/to/VASP/potentials
  command: /path/to/vasp_std
  dataset_path: train.xyz

jobs:
  submit_command: nohup
  sub_file:
    sampling: templates/lammps/lammps.sh
    labeling: templates/sbatch/vasp_cpu_36.sh

templates/lammps/lammps.sh should contain the real LAMMPS command for your machine:

#!/bin/bash
export CUDA_VISIBLE_DEVICES=0
export MACE_TIME=true

mpirun -np 1 /path/to/lmp -k on g 1 -sf kk -pk kokkos newton on neigh half -in in.run_mace_npt

The LAMMPS input template controls NPT/NVT, D3, dump frequency, thermo frequency, and run length. PESMaker only fills {data_file}, {potential}, {elements}, {temperature_start}, {temperature_end}, and {trajectory}. The recommended workflow is to write and test templates/lammps/in.run_mace_npt yourself for your LAMMPS/MACE build, then let PESMaker render that proven input for every generated structure. Set sampling.preserve_run_in: true if that proven input should be copied verbatim. See sample-setup for complete MACE templates and links to the MACE/LAMMPS references.

SCF Setup From Existing Structures

Use this when structures already exist and you only want VASP folders.

project: scf_from_existing

labeling:
  engine: vasp
  input_dir: generated
  output_dir: labeling
  incar: templates/vasp/INCAR
  potcar_library: /path/to/VASP/potentials
  command: /path/to/vasp_std

jobs:
  submit_command: sbatch
  cores_cpu: 36
  sub_file: templates/sbatch/vasp_cpu_36.sh

Run:

pesmaker validate run.yaml
pesmaker next run.yaml

Collect Existing OUTCAR Files

Use this when VASP calculations are already finished.

project: collect_initial_structure

collecting:
  dataset_path: train.xyz
  test_path: test.xyz
  test_data_frames: 0
  include_virial: true

By default this recursively collects every OUTCAR below the current directory. Use explicit patterns only when you want to restrict the collection:

project: collect_existing

collecting:
  outcar_patterns:
    - "1.Te/**/run_vasp_scf/**/OUTCAR"
    - "2.Pb/**/run_vasp_scf/**/OUTCAR"
    - "3.Te-Pd/**/run_vasp_scf/**/OUTCAR"
    - "4.bulk_pristine/**/run_vasp_scf/**/OUTCAR"
  dataset_path: train.xyz

Run:

pesmaker validate collect.yaml
pesmaker next collect.yaml

Training From Existing Dataset

Use this when train.xyz already exists.

project: train_existing

training:
  model: nep
  output_dir: training
  dataset: train.xyz
  command: nep

jobs:
  submit_command: sbatch
  cores_cpu: 36
  sub_file:
    training: templates/sbatch/nep.sh

Run:

pesmaker validate run.yaml
pesmaker next run.yaml