Object navigation. NOT instruction following.

SPL: Success weighted by (inverse) Path Length

SP L = \frac{1}{N} i = 1 \sum N S_{i} \frac{l _{i}}{ma x ( p _{i} , l _{i} )}

where,
$l_{i}$ = length of shortest path between goal and target for an episode
$p_{i}$ = length of path taken by agent in an episode
$S_{i}$ = binary indicator of success in episode $i$

RxR-Habitat Challenge - CVPR 2021 Embodied AI Workshop [5]

Instruction following.

Primary

NDTW: Normalized Dynamic Time Warping

“RxR does not have a shortest path prior, we care more about an agent’s ability to follow a path than its ability to reach the specific endpoint of the path.” [5]

nDTW scores how faithfully an agent’s trajectory shadows a reference path. It aligns the two sequences with Dynamic Time Warping, sums the point-wise distances, then normalizes by the reference length and a goal-success radius before passing the result through a negative exponential. The output is a smooth 0-to-1 similarity value that is order-aware, density-agnostic, and works on either graph or continuous representations—higher means the agent stayed closer to the intended route throughout. (one paragraph explanation courtesy of o3, adapted from [3])

Secondary

measures the distance between the last node in the predicted path and the last reference path node. [1]

d (p_{∣ R ∣}, r_{∣ R ∣})

SR: Success Rate

measures how often the last node in the predicted path is within a threshold distance $d_{t h}$ of the last reference path node.[1]

1 [NE (P, R) \leq d_{t h}]

SPL: Success weighted by (inverse) Path Length

Equipped with a binary definition of episodic success, we conduct $N$ test episodes. In each episode, the agent is tasked with navigating to a goal. Let $ℓ_{i}$ be the shortest path distance from the agent’s starting position to the goal in episode $i$ , and let $p_{i}$ be the length of the path actually taken by the agent in this episode. Let $S_{i}$ be a binary indicator of success in episode $i$ . We define a summary measure of the agent’s navigation performance across the test set as follows: [2]

\frac{1}{N} i = 1 \sum N S_{i} \frac{ℓ _{i}}{ma x ( p _{i} , ℓ _{i} )}

SDTW: Success weighted by Normalized Dynamic Time Warping

SDTW folds goal completion into nDTW by multiplying the similarity score by a binary success flag: if the agent ends within the specified success radius, SDTW equals its nDTW; otherwise it is forced to zero. (one paragraph explanation courtesy of o3, adapted from [3]

PL: Path Length

measures the total length of the predicted path, which has the optimal value equal to the length of the reference path.[1]

i = 1 \sum ∣ P ∣ d (p_{i}, p_{i + 1})

References

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation https://arxiv.org/abs/1905.12255
On Evaluation of Embodied Navigation Agents https://arxiv.org/abs/1807.06757
General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping https://arxiv.org/abs/1907.05446
Habitat Navigation Challenge 2023 https://aihabitat.org/challenge/2023/
RxR-Habitat Challenge - CVPR 2021 Embodied AI Workshop https://www.youtube.com/watch?v=YGwHGgD-9gQ

Gustavo Moura's blog

VLN Evaluation

Habitat Navigation Challenge 2023 [4]

SPL: Success weighted by (inverse) Path Length

RxR-Habitat Challenge - CVPR 2021 Embodied AI Workshop [5]

Primary

NDTW: Normalized Dynamic Time Warping

Secondary

NE: Navigation Error

SR: Success Rate

SPL: Success weighted by (inverse) Path Length

SDTW: Success weighted by Normalized Dynamic Time Warping

PL: Path Length

References

Table of Contents

Backlinks