QD-MAPPER: A Quality Diversity Framework to Automatically Evaluate Multi-Agent Path Finding Algorithms in Diverse Maps

Abstract

We use the Quality Diversity (QD) algorithm with Neural Cellular Automata (NCA) to automatically evaluate Multi-Agent Path Finding (MAPF) algorithms by generating diverse maps. Previously, researchers typically evaluate MAPF algorithms on a set of specific, human-designed maps at their initial stage of algorithm design. However, such fixed maps may not cover all scenarios, and algorithms may overfit to the small set of maps. To seek further improvements, systematic evaluations on a diverse suite of maps are needed. In this work, we propose Quality-Diversity Multi-Agent Path Finding Performance EvaluatoR (QD-MAPPER), a general framework that takes advantage of the QD algorithm to comprehensively understand the performance of MAPF algorithms by generating maps with patterns, be able to make fair comparisons between two MAPF algorithms, providing further information on the selection between two algorithms and on the design of the algorithms. Empirically, we employ this technique to evaluate and compare the behavior of different types of MAPF algorithms, including search-based, priority-based, rule-based, and learning-based algorithms. Through both single-algorithm experiments and comparisons between algorithms, researchers can identify patterns that each MAPF algorithm excels and detect disparities in runtime or success rates between different algorithms.

Introduction

Existing benchmark maps for MAPF algorithms are fixed or human-designed. These maps have several problems. They may not cover all failure modes of certain algorihtm, may not sufficiently understanding pros and cons of different algorithms, and will cause bias while making comparsion.
Our paper applys layout optimization approach based on QD algorithm and NCA from the previous work[1] and use it with an alternative goal of generating diverse benchmark maps for MAPF algorithms. We show that QD-NCA approach can generate diverse maps that are easy or challenging for each MAPF algorihtm to solve and can generate unbiased map set to automatically compare two MAPF algorithsm.

Approach Overview

We adapt the previous work[1] to use CMA-MAE and NCA to generate diverse benchmark maps with the objective and measures computed by running MAPF algorithms.

Figure1: Overview of our approach of using CMA-MAE optimize diverse NCAs to generate benchmark maps.

Objectives:
One algorihtm Experiments: average CPU runtime: \(f(x) = t_{\phi}(x), \phi \in \{\text{CBS}, \text{EECBS}, \text{PBS}\}\)
Regularized Success Rate (RSR): \(r_{\phi}(x) = \sum_{i=1}^{N_e} RSR_{\phi}^{(i)}(x), \phi \in \{\text{PIBT}, \text{LTF}\}\)
Two-algorithm Experiments: EECBS & PBS: \(f(x) = |t_{\text{EECBS}}(x) - t_{\text{PBS}}(x)|\)
PIBT & LTF: \(f(x) = |r_{\text{PIBT}}(x) - r_{\text{LTF}}(x)|\)

Diversity Measures: Number of obstacles & KL divergence of tile pattern distribution

One algorihtm experiments setup

CBS performs well on maps with a large portion of connecting empty space and it struggles on maps with long corridors and one-tile entries.

EECBS performs well on maps with ore empty spaces between each long obstacle component and maps with short obstacle components and struggles on maps with long corridors and one-tile entries.

PBS performs well on maps with long corridors but with more entry spaces in between and hort obstacle components and struggles on maps with long corridors with one-tile entries in between and one-entry spaces.

PIBT performs well on maps with large chunks of empty space and struggles on maps with long corridors and one-entry spaces.

LTF performs well on maps with large chunks of empty space and struggles on maps with long corridors and one-entry spaces.

Two algorithm experiments setup

EECBS outperforms PBS on maps with long but wide corridors and one-entry spaces between adjacent corridors. On the other hand, PBS outperforms EECBS on maps with long corridors with more entries.

PIBT outperforms LTF on maps with more one-entry spaces. On the other hand, LTF outperforms PIBT on maps with more emtpy space between corridors.

Map Gallery

Below are representative maps for our MAPF algorithms. Each column displays multiple maps for the corresponding algorithm.

CBS

EECBS

PBS

PIBT

LTF

BibTeX

@inproceedings{Qian2026qdmapper, author = {Cheng Qian and Yulun Zhang and Varun Bhatt and Matthew C. Fontaine and Stefanos Nikolaidis and Jiaoyang Li}, title = {QD-MAPPER: A Quality Diversity Framework to Automatically Evaluate Multi-Agent Path Finding Algorithms in Diverse Maps}, booktitle = {Proceedings of International Conference on Autonomous Agents and Multiagent Systems (AAMAS)}, pages = {}, year = {2026}, doi = {https://doi.org/10.65109/} }