Habitat Navigation Challenge 2023 [4]
Object navigation. NOT instruction following.
SPL: Success weighted by (inverse) Path Length
where,
= length of shortest path between goal and target for an episode
= length of path taken by agent in an episode
= binary indicator of success in episode
RxR-Habitat Challenge - CVPR 2021 Embodied AI Workshop [5]
Instruction following.
Primary
NDTW: Normalized Dynamic Time Warping
“RxR does not have a shortest path prior, we care more about an agent’s ability to follow a path than its ability to reach the specific endpoint of the path.” [5]
nDTW scores how faithfully an agent’s trajectory shadows a reference path. It aligns the two sequences with Dynamic Time Warping, sums the point-wise distances, then normalizes by the reference length and a goal-success radius before passing the result through a negative exponential. The output is a smooth 0-to-1 similarity value that is order-aware, density-agnostic, and works on either graph or continuous representations—higher means the agent stayed closer to the intended route throughout. (one paragraph explanation courtesy of o3, adapted from [3])
Secondary
NE: Navigation Error
measures the distance between the last node in the predicted path and the last reference path node. [1]
SR: Success Rate
measures how often the last node in the predicted path is within a threshold distance of the last reference path node.[1]
SPL: Success weighted by (inverse) Path Length
Equipped with a binary definition of episodic success, we conduct test episodes. In each episode, the agent is tasked with navigating to a goal. Let be the shortest path distance from the agent’s starting position to the goal in episode , and let be the length of the path actually taken by the agent in this episode. Let be a binary indicator of success in episode . We define a summary measure of the agent’s navigation performance across the test set as follows: [2]
SDTW: Success weighted by Normalized Dynamic Time Warping
SDTW folds goal completion into nDTW by multiplying the similarity score by a binary success flag: if the agent ends within the specified success radius, SDTW equals its nDTW; otherwise it is forced to zero. (one paragraph explanation courtesy of o3, adapted from [3]
PL: Path Length
measures the total length of the predicted path, which has the optimal value equal to the length of the reference path.[1]
References
- Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation https://arxiv.org/abs/1905.12255
- On Evaluation of Embodied Navigation Agents https://arxiv.org/abs/1807.06757
- General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping https://arxiv.org/abs/1907.05446
- Habitat Navigation Challenge 2023 https://aihabitat.org/challenge/2023/
- RxR-Habitat Challenge - CVPR 2021 Embodied AI Workshop https://www.youtube.com/watch?v=YGwHGgD-9gQ