Autonomous agents increasingly operate in environments shared with other agents whose behavior affects mission success. How can we enable these agents to reason systematically about others' behaviors given only high-level information about their objectives? Can we develop approaches that avoid making unnecessary assumptions about others' decision-making processes while maintaining computational tractability?

Task-Aware Behavior Fields (TAB-Fields)

Autonomous agents operating in adversarial scenarios face a fundamental challenge: while they may know their adversaries' high-level objectives, such as reaching specific destinations within time constraints, the exact policies these adversaries will employ remain unknown. Traditional approaches address this challenge by treating the adversary's state as a partially observable element, leading to a formulation as a Partially Observable Markov Decision Process (POMDP). However, the induced belief-space dynamics in a POMDP require knowledge of the system's transition dynamics, which, in this case, depend on the adversary's unknown policy. Hence, instead of assuming an adversary's policy, we propose characterizing the space of possible adversary behaviors through constraints derived from mission objectives and environmental factors. In this paper, we develop Task-Aware Behavior Fields (TAB-Fields), a representation that systematically captures adversary state distributions over time using principles of maximum entropy. By encoding only what is known—mission constraints and environmental limitations—TAB-Fields enable reasoning about the full range of feasible adversary behaviors without relying on policy assumptions or hand-crafted rewards. We integrate TAB-Fields with standard planning algorithms by introducing TAB-conditioned POMCP, an adaptation of Partially Observable Monte Carlo Planning. Through extensive experiments in simulation with underwater robots and hardware implementations with ground robots, we demonstrate that our approach achieves superior performance compared to baselines that either assume specific adversary policies or neglect mission constraints altogether.

Task Description:

Reach target [x,y] after visiting any three different checkpoints, taking no more than 10s between checkpoints, while avoiding the center of the environment.

Teaser: TAB-Field's performance in response to an adversarial agent with a given task. The red line represents the adversary's trajectory, while green line shows TAB-Field's planned path. The agent is able to intercept the adversary just before it reaches the second observable zone, demonstrating TAB-Field's effectiveness.

Experimental Videos

These videos demonstrate TAB-Field's performance in an "interception mission" of an adversary with various high-level objectives. In the videos, the adversarial agent is noted by a white hat and the checkpoints are depicted by green squares on the floor.

Task 1:

Start at the bottom-left corner, sequentially visit Checkpoint 3, Checkpoint 2, and Checkpoint 1 in that order, ensuring each checkpoint is reached within 7 seconds, and eventually finish at the top-right corner.

Task 2:

Start at the top-left corner, ensure you visit at least one checkpoint while always avoiding the center of the environment, and eventually reach the bottom-right corner within 50 seconds.

Task 3:

Start at the top-left corner, ensure you reach Checkpoint 1 exactly at 5 seconds, and eventually visit Checkpoint 2 within 10 seconds.

Task 4:

Start at the top-left corner, ensure that you are in a checkpoint every 10 seconds, and always avoid entering the right half of the environment.

Task 5:

Start at the top-right corner, ensure that exactly every 10 seconds you pass through a checkpoint, include Checkpoint 4 (the central checkpoint) as one of the three checkpoints, and eventually reach the bottom-right corner.

Task 6:

Start at the bottom-left corner, ensure that within 40 seconds you pass through the top-right corner and at least one other checkpoint, and eventually reach the center checkpoint within 80 seconds.

Scalability of TAB-Fields: Experiments with underwater robots

The simulation results reveal critical insights about scaling TAB-conditioned policies to higher-dimensional spaces. First, the performance gap between TAB-POMCP and baselines widens as mission complexity increases, particularly in missions with complex temporal dependencies. This suggests that the maximum entropy formulation becomes more valuable precisely when the search space expands. Second, even in the most complex scenarios with multiple interacting constraints (M3, see paper), TAB-POMCP maintains a 3-4x improvement in interception efficiency over methods that make explicit policy assumptions.The key driver behind this scalability is TAB-Fields' ability to automatically identify and exploit mission-constrained regions of the state space. Rather than maintaining beliefs over the full 6-DOF state space, TAB-POMCP effectively ``collapses" the belief to high-probability regions defined by mission constraints. This implicit dimensionality reduction enables efficient planning even as the raw state space grows.

Authors

Gokul Puthumanaillam, Jae Hyuk Song, Nurzhan Yesmagambet, Shinkyu Park, Melkior Ornik

Citation


@article{puthumanaillam2024tabfieldsmaximumentropyframework,
      title={TAB-Fields: A Maximum Entropy Framework for Mission-Aware Adversarial Planning}, 
      author={Gokul Puthumanaillam and Jae Hyuk Song and Nurzhan Yesmagambet and Shinkyu Park and Melkior Ornik},
      year={2024},
      eprint={2412.02570},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2412.02570}, 
}
            code>