GitHub - CXU-TRI/FAIL-Detect: Code for RSS 2025 paper "Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies"

FAIL-Detect: Failure Analysis in Imitation Learning – Detecting failures without failure data

Project website: https://cxu-tri.github.io/FAIL-Detect-Website/.
The paper titled "Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies" is accepted at Robotics: Science and Systems (RSS) 2025.
Please direct implementation questions to Chen Xu ([email protected]).

Prerequisite

We base our environment on diffusion_policy. Set up the environment by running

mamba env create -f conda_environment.yaml

Usage

1. Policy training

Tasks: we consider square, transport, tool_hang, and can tasks in robomimic.

Policy backbone: Either diffusion policy or flow-matching policy. Both policies have the same network architecture and are trained on the same datasets with same hyperparameters.

Usage: see diffusion_policy/configs_robomimic for the set of configs.

# This trains a flow policy (e.g, on the square task)
python train.py --config-dir=diffusion_policy/configs_robomimic --config-name=image_square_ph_visual_flow_policy_cnn.yaml training.seed=1103 training.device=cuda:0 hydra.run.dir='data/outputs/${name}_${task_name}'

# This trains a diffusion policy (e.g, on the square task)
python train.py --config-dir=diffusion_policy/configs_robomimic --config-name=image_square_ph_visual_diffusion_policy_cnn.yaml training.seed=1103 training.device=cuda:0 hydra.run.dir='data/outputs/${name}_${task_name}'

# For other tasks, change 'square' to be among ['transport', 'tool_hang', 'can']

2. Obtain ${(A_t, O_t)}$ given a trained policy

Here,

$O_t$ = [Embedded visual features, non-visual information (e.g., robot states)].
$A_t$ = corresponding action in training data.

# For flow policy (e.g, on the square task)
python save_data.py --config-dir=diffusion_policy/configs_robomimic \
--config-name=image_square_ph_visual_flow_policy_cnn.yaml \
training.seed=1103 training.device=cuda:0 hydra.run.dir='data/outputs/${name}_${task_name}' 

# For diffusion policy (e.g, on the square task)
python save_data.py --config-dir=diffusion_policy/configs_robomimic \
--config-name=image_square_ph_visual_diffusion_policy_cnn.yaml \
training.seed=1103 training.device=cuda:0 hydra.run.dir='data/outputs/${name}_${task_name}'

# For other tasks, change 'square' to be among ['transport', 'tool_hang', 'can']

3. Train scalar scores given ${(A_t, O_t)}$

We give the examples of using logpZO and RND, which are the best performings ones. The other baselines are similar by switching to the corresponding folders

cd UQ_baselines/logpZO/ # Or change to /RND/, /CFM/, /NatPN/, /DER/ ...
# flow policy
python train.py --policy_type='flow' --type 'square'
# diffusion policy
python train.py --policy_type='diffusion' --type 'square'
cd ../..

# For other tasks, change 'square' to be among ['transport', 'tool_hang', 'can']

4. Run evaluation

cd UQ_test
# modify = False is ID
python eval_together.py --policy_type='flow' --task_name='square' --device=0 --modify=false --num=2000
python eval_together.py --policy_type='diffusion' --task_name='square' --device=0 --modify=false --num=2000

# modify = True is OOD
python eval_together.py --policy_type='flow' --task_name='square' --device=0 --modify=true --num=2000
python eval_together.py --policy_type='diffusion' --task_name='square' --device=0 --modify=true --num=2000
cd ..

# For other tasks, change 'square' to be among ['transport', 'tool_hang', 'can']

5. CP band + visualization

cd UQ_test
# flow
python plot_with_CP_band.py # Generate CP band and make decision
python barplot.py # Generate barplots

# diffusion
python plot_with_CP_band.py --diffusion_policy # Generate CP band and make decision
python barplot.py --diffusion_policy # Generate barplots

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
UQ_baselines		UQ_baselines
UQ_test		UQ_test
diffusion_policy		diffusion_policy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_environment.yaml		conda_environment.yaml
method.png		method.png
save_data.py		save_data.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FAIL-Detect: Failure Analysis in Imitation Learning – Detecting failures without failure data

Prerequisite

Usage

1. Policy training

2. Obtain ${(A_t, O_t)}$ given a trained policy

3. Train scalar scores given ${(A_t, O_t)}$

4. Run evaluation

5. CP band + visualization

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

CXU-TRI/FAIL-Detect

Folders and files

Latest commit

History

Repository files navigation

FAIL-Detect: Failure Analysis in Imitation Learning – Detecting failures without failure data

Prerequisite

Usage

1. Policy training

2. Obtain ${(A_t, O_t)}$ given a trained policy

3. Train scalar scores given ${(A_t, O_t)}$

4. Run evaluation

5. CP band + visualization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages