Paper: Learning to Place Guards by Reinforcement: A Geo-Free Neural Policy for the Vertex-Guard Art Gallery Problem Domagoj Ševerdija, Jurica Maltar, Nathan Chappel, Domagoj Matijević
This repository contains the code and data for the paper. We study whether an LSTM pointer-network policy trained by reinforcement learning on the vertex-guard Art Gallery Problem (AGPVG) learns a representation that encodes the underlying geometry — independent of what its decoder expresses. The key findings:
- A pointer-network policy trained geo-free (only vertex coordinates at inference, no visibility oracle) places guards at near-greedy cardinality but leaves a tail of under-covered polygons.
- Freezing the encoder and probing it with a small single-shot SetPredictor classifier closes most of the feasibility gap, in and out of distribution (up to 5× the training range), cutting infeasible polygons by roughly an order of magnitude.
- A no-encoder ablation and a linear probe (ROC-AUC = 0.842) confirm that the representation — not the probe's own capacity — carries the guard-relevant geometry.
AGNet/
├── po_agp.py # PO/BT policy training (main entry point)
├── train_set_predictor.py # SetPredictor probe training
├── eval_set_predictor.py # Evaluation: threshold sweep + iterative refinement
├── eval_checkpoint.py # Evaluate policy checkpoint (CGAL visibility)
├── models.py # PointerNet encoder/decoder architecture
├── set_predictor.py # SetPredictor architecture
├── rewards.py # Coverage + guard-count reward definitions
├── dataset.py # AGPVG dataset loader
├── greedy_agp.py # Classical greedy baseline
├── tools/
│ └── build_ls_trajectories.py # Generate LS-editor training targets
├── configs/ # JSON configs for all experiments
│ ├── po_agp_lstm.json
│ └── set_predictor_train_standard.json
└── data/ # Precomputed LS trajectories and greedy baselines
├── ls_trajectories_train.pkl
├── ls_trajectories_dev.pkl
├── ls_trajectories_test.pkl
└── ...
Requirements: Python 3.10+, PyTorch 2.0+, CGAL (for exact visibility at evaluation).
git clone git@github.com:dseverdi/AGNet.git
cd AGNet
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtExact visibility (used only at evaluation, not training) requires scikit-geometry:
# Recommended: via conda
conda install -c conda-forge scikit-geometry
# Alternative: build from source (see scikit-geometry docs)The polygon instances come from the Art Gallery Problem Vertex-Guard (AGPVG) benchmark:
Couto et al., Art Gallery Problem Vertex Guard instances https://www.ic.unicamp.br/~cid/Problem-instances/Art-Gallery/AGPVG/
Download and extract the dataset, then configure the path:
cp .env.example .env
# Edit .env and set DATASET_PATH to the extracted AGPVG directoryThe data/ directory in this repo already contains precomputed LS-editor trajectories
(training targets for the SetPredictor) and greedy baseline results — you do not need to
regenerate these to run evaluation.
Download the pretrained checkpoints from the GitHub Releases page:
| File | Description |
|---|---|
po_agp_best_greedy.pt |
PO/BT LSTM pointer-network policy |
set_predictor_best.pt |
SetPredictor probe (standard τ=0.99 targets) |
Place them at checkpoints/v3/po_agp/lstm_bt/po_agp_best_greedy.pt and
checkpoints/set_predictor/standard/set_predictor_best.pt, then:
# Threshold sweep on the dev_test split (reproduces Table 3 in the paper)
python eval_set_predictor.py \
--checkpoint checkpoints/set_predictor/standard/set_predictor_best.pt \
--split dev_test
# Full Pareto sweep across thresholds and inference passes (Tables 4–5)
python eval_set_predictor.py \
--checkpoint checkpoints/set_predictor/standard/set_predictor_best.pt \
--split test \
--iter-passes-sweep 1 2 3 5python po_agp.py configs/po_agp_lstm.json
# Checkpoint saved to checkpoints/v3/po_agp/lstm_bt/po_agp_best_greedy.ptThe data/ls_trajectories_*.pkl files are already in the repo. To regenerate from scratch:
python tools/build_ls_trajectories.py \
--split train \
--checkpoint checkpoints/v3/po_agp/lstm_bt/po_agp_best_greedy.pt \
--out data/ls_trajectories_train.pkl# Four seeds as used in the paper: 1234, 11, 22, 33
for SEED in 1234 11 22 33; do
python train_set_predictor.py configs/set_predictor_train_standard.json \
--seed $SEED \
--out-dir checkpoints/set_predictor/standard_seed${SEED}
donepython eval_set_predictor.py \
--checkpoint checkpoints/set_predictor/standard_seed1234/set_predictor_best.pt \
--split dev_testAll experiments are driven by JSON configs in configs/. Key files:
| Config | Purpose |
|---|---|
configs/po_agp_lstm.json |
PO/BT LSTM policy training |
configs/set_predictor_train_standard.json |
SetPredictor probe (τ=0.99 targets) |
@article{severdija2026agnet,
title = {Learning to Place Guards by Reinforcement: A Geo-Free Neural Policy
for the Vertex-Guard Art Gallery Problem},
author = {Ševerdija, Domagoj and Maltar, Jurica and Chappel, Nathan
and Matijević, Domagoj},
journal = {Preprint},
year = {2026}
}Code: MIT. Dataset: see AGPVG page for terms.