Perception–Awareness–Decision (PAD)

Abstract

Pipeline: SLAM + VLM + ASR + LLM

Output: Situational-awareness map

Robots working in spaces shared by people need more than geometric mapping: they must recognize people, understand social context, and decide whether to proceed or negotiate passage. We introduce a Perception–Awareness–Decision (PAD) framework that combines SLAM with Vision–Language Models (VLMs), speech recognition, and Large Language Models (LLMs) through an explicit situational-awareness representation. In a corridor-blocking task, PAD improves task success, increases safety margins, and produces behavior participants judged as more socially appropriate than a geometric baseline.

(Text adapted from the paper abstract.) When you can publish the PDF (DOI/arXiv/preprint), I’ll wire the “Paper (PDF)” button.

At a glance

Conditions: P0 / P1 / P2

Participants: 8

PAD separates (1) multimodal perception, (2) situation awareness, and (3) decision-making, enabling a robot to switch between safe replanning and context-grounded dialogue when navigation depends on human cooperation.

P0: SLAM-only baseline (no semantics, no dialogue)
P1: Context-aware conservative interaction
P2: Context-aware assertive interaction

Framework

Figure: PAD overview

Key idea: explicit awareness layer

PAD framework overview — **PAD architecture.** Perception fuses LiDAR/SLAM with VLM & language inputs; the situation-awareness layer builds a semantic contextual map; a meta-controller selects between navigation/control and an LLM-driven dialogue module.

Experiment

Task: corridor blocked by participant

Outcome: proceed vs negotiate

Each trial starts with the robot navigating toward a fixed goal. When the participant blocks the corridor, the robot either repeatedly replans (P0) or uses PAD to initiate context-grounded dialogue and then continues navigation (P1/P2).

**Example trial sequence (TPA).** The robot detects blockage, initiates dialogue, verifies clearance, and resumes navigation. (This sequence corresponds to the paper’s illustrated experimental trial.)

Example sequence EDA — **EDA sequence.** Additional example of interaction/navigation flow under PAD.

Example sequence NDA — **NDA sequence.** Additional example illustrating resolution and continuation to goal.

Results

Success: P1/P2 = 8/8

Baseline: P0 = 2/8

PAD-enabled behaviors achieved perfect success and improved safety margins compared to the SLAM-only baseline. P1 tended to finish faster with fewer turns (but longer utterances), while P2 used more turns with shorter messages.

Metric	P0 (SLAM-only)	P1 (Conservative)	P2 (Assertive)
Success rate	2/8	8/8	8/8
Re-planning attempts (avg.)	7.25	4.71	4.29
Stopping distance (m, avg.)	0.70	0.83	0.79
Trial duration (s, avg.)	105.94	99.95	162.21
Dialogue turns (avg.)	—	3.4	6.2
Tokens per turn (avg.)	—	35.45	21.36

Metrics reproduced from Table 1 in the paper.

Clarity of explanation — **Clarity.** Participants rated PAD conditions higher in explanation clarity than the baseline.

Perceived naturalness — **Naturalness.** PAD interaction styles were judged more natural than SLAM-only navigation.

Comfort/safety — **Comfort/Safety.** Self-reported comfort/safety trends favored PAD-enabled behaviors.

Preference for real-life use — **Preference.** If choosing a real-life strategy, participants tended to prefer PAD behaviors over baseline.

Citation & links

DOI: 10.1145/3776734.3794394

DOI (will activate after publication): 10.1145/3776734.3794394

When you’re ready, I can add: (1) the final PDF link, (2) a BibTeX block, and (3) a “Contact” section (email button + links).