pesmaker select¶
select chooses representative MD frames.
Normal users can let next run this stage after GPUMD has written
movie.xyz.
Use¶
pesmaker select run.yaml
Minimal YAML¶
sampling:
engine: gpumd
potential: /path/to/nep.txt
selection:
trajectory_pattern: sampling/**/movie.xyz
output_dir: selected
min_distance: 0.2
max_count: 200
plot: true
What It Does¶
PESMaker reads trajectory frames, builds one descriptor vector per frame, then uses farthest point sampling to keep frames that spread across descriptor space.
The descriptor backend follows sampling.engine automatically:
gpumd: Calorine calculates NEP descriptors withsampling.potential;mace,lammps-mace, orlammps_mace: ASE'sMACECalculatorcalculates invariant MACE descriptors withsampling.selection.descriptor_model.
The terminal summary prints which model descriptor was used. You do not need
to set sampling.selection.descriptor.
Descriptor inference can take time for a long trajectory. PESMaker prints the descriptor source once, then uses the same per-trajectory progress format for single and multiple trajectory files. For example:
Trajectory selection
Mode : separate_trajectories
Trajectories : 1
Method : FPS
Descriptor : MACE invariant descriptors
Model : /path/to/mace-omat-0-small.model
Device : cuda
Per trajectory : max_count=200, min_distance=0
Output directory : selected
Trajectory : 1/1 /path/to/movie.xyz
Frames : 1501
Progress : [###---------------------------] 151/1501 frame(s) ( 10.1%)
...
Progress : [##############################] 1501/1501 frame(s) (100.0%)
Descriptor matrix: 1501 frame(s) x 512 feature(s)
Selected : 200 of 1501 frame(s)
Selection completed: Selected 200 of 1501 frame(s) from 1 trajectory file(s).
GPUMD prints the same compact block with
Descriptor: GPUMD / Calorine-calculated NEP descriptors and the NEP potential
path. In an interactive terminal the progress bar updates in place. When output
is redirected to a log, PESMaker prints progress at regular intervals instead.
This uses the Python standard library and does not add another package
dependency.
PESMaker suppresses only known upstream MACE/e3nn/PyTorch model-loading compatibility messages, including the automatic model-dtype notice. Other warnings and all calculation errors remain visible.
min_distance and max_count are two stop rules:
min_distance: stop when the next farthest frame is still closer than this distance to the selected set. This avoids choosing very similar structures.max_count: optional cap on how many frames to keep. Omit it if you only want the distance rule to decide how many structures are different enough.
For example, min_distance: 0.2 and max_count: 200 means "keep at most 200
frames, but stop earlier if the remaining frames are too similar." The distance
is measured in descriptor space, not in Angstrom.
For LAMMPS-MACE trajectories, use:
sampling:
engine: mace
selection:
trajectory_pattern: sampling/**/*.lammpstrj
output_dir: selected
descriptor_model: /path/to/native-mace.model
min_distance: 0.0
max_count: 200
The descriptor model must be a native MACE model loadable by
MACECalculator, not the *.model-mliap_lammps.pt file used by LAMMPS.
PESMaker defaults to CUDA and all MACE interaction layers. It requests
invariants_only=True, then averages atom descriptors separately for each
element and concatenates them into one structure descriptor. This matches
MACE's official fine-tuning FPS implementation. Invariant scalar channels are
used because PESMaker applies ordinary Euclidean distance directly; including
equivariant tensor components would require a rotation-aware metric.
Install the required backend for the selected engine:
# GPUMD/NEP descriptor backend
python -m pip install ".[selection]"
# MACE descriptor backend
python -m pip install ".[mace]"
Because MACE and NEP descriptor distances have different numerical scales,
start a new MACE selection with min_distance: 0.0 and use max_count as the
initial limit. Inspect the distance curve before choosing a nonzero threshold.
Separate Trajectory Sampling¶
When trajectory_pattern matches several MD trajectory files, PESMaker samples
each trajectory independently by default. This keeps every initial structure
represented in the selected DFT labeling set.
sampling:
engine: gpumd
potential: /path/to/nep.txt
selection:
trajectory_pattern: MD_run_2D_Pd_551/*/movie.xyz
output_dir: selected
min_distance: 0.004
max_count: 50
Here max_count: 50 means at most 50 frames from each trajectory, not 50
frames total. For example, two matched movie.xyz files can produce up to 100
selected frames.
The terminal output uses the same format as a single trajectory, with one block per matched trajectory:
Trajectory selection
Mode : separate_trajectories
Trajectories : 2
Method : FPS
Descriptor : GPUMD / Calorine-calculated NEP descriptors
Potential : /path/to/nep.txt
Per trajectory : max_count=50, min_distance=0.004
Output directory : selected
Trajectory : 1/2 MD_run_2D_Pd_551/mp-1186427_Pd_temp_300K/movie.xyz
Frames : 1000
Progress : [##############################] 1000/1000 frame(s) (100.0%)
Descriptor matrix: 1000 frame(s) x 35 feature(s)
Selected : 50 of 1000 frame(s)
Trajectory : 2/2 MD_run_2D_Pd_551/mp-2646997_Pd_temp_300K/movie.xyz
Frames : 1000
Progress : [##############################] 1000/1000 frame(s) (100.0%)
Descriptor matrix: 1000 frame(s) x 35 feature(s)
Selected : 50 of 1000 frame(s)
Selection completed: Selected 100 of 2000 frame(s) from 2 trajectory file(s).
For one matched trajectory, the same screen format is used with
Trajectories: 1 and Trajectory: 1/1. The output files stay directly under
selected/ rather than under an extra trajectory subdirectory.
This default is useful when the trajectories come from different initial structures. If several trajectories are replicas of the same structure and you want one global FPS selection over all frames, disable separate sampling:
sampling:
selection:
trajectory_pattern: MD_run_2D_Pd_551/*/movie.xyz
output_dir: selected
separate_trajectories: false
min_distance: 0.004
max_count: 100
Interval Sampling¶
Use interval sampling when you want frames at a fixed trajectory stride and do
not want PESMaker to calculate descriptors. This mode does not require
sampling.potential, Calorine, MACE, min_distance, or a descriptor model.
sampling:
selection:
method: interval
trajectory_pattern: /path/to/XDATCAR
output_dir: selected
interval: 10
This keeps frames 0, 10, 20, .... Optional offset starts from a later frame,
and optional max_count caps the number of kept frames:
sampling:
selection:
method: interval
trajectory_pattern: sampling/**/movie.xyz
output_dir: selected
interval: 20
offset: 5
max_count: 50
The aliases strategy or mode may be used instead of method, and stride,
step, or frame_interval may be used instead of interval.
Outputs¶
selected/
selected.xyz
manifest.jsonl
selection_features.npy
fps_selection.png
selected.xyz is the combined extxyz file containing all selected frames.
PESMaker no longer writes one selected_000000.xyz file per frame. SCF setup
uses selected/manifest.jsonl to split frames from selected.xyz when it
prepares single-point job folders.
The manifest descriptor field records the descriptor family used for FPS:
nep89 for GPUMD runs with a nep89* potential, nep for other GPUMD NEP
potentials, mace for MACE descriptors, and simple for the geometry fallback.
For GPUMD, Calorine is still the calculation backend; it is not written as the
descriptor family.
selection_features.npy is a NumPy array with shape
(number_of_md_frames, descriptor_dimension). Row i is the descriptor vector
used for MD frame i before FPS selection. It is useful for debugging and
post-analysis: you can reload it with numpy.load, reproduce descriptor-space
PCA plots, compare different min_distance values, or check whether your
descriptor separates different trajectory regions.
fps_selection.png is the diagnostic plot. PESMaker uses a seaborn-styled
plot: all MD frames are shown as light points, while selected frames are drawn
smaller on top so you can see how they sit inside the full cloud.
For multiple matched trajectories in the default separate mode, PESMaker writes a subdirectory per trajectory:
selected/
manifest.jsonl
mp-1186427_Pd_temp_300K/
selected.xyz
manifest.jsonl
selection_features.npy
fps_selection.png
mp-2646997_Pd_temp_300K/
selected.xyz
manifest.jsonl
selection_features.npy
fps_selection.png
The top-level selected/manifest.jsonl combines all selected frames and is the
file used by pesmaker scf-setup or pesmaker next for later labeling.
For interval sampling, PESMaker writes only selected.xyz and manifest.jsonl
because no descriptor matrix or FPS diagnostic plot is calculated.
Next Step¶
pesmaker scf-setup run.yaml
If labeling.input_manifest is omitted, next uses
selected/manifest.jsonl after selection.