Learning Dynamic Tasks on a Large-Scale Soft Robot in a Handful of Trials

Sicelukwanda Zwane, Daniel Cheney, Curtis C. Johnson, Yicheng Luo, Yasemin Bekiroglu, Marc D. Killpack, Marc Peter Deisenroth

Abstract

Soft robots offer more flexibility, compliance, and adaptability than traditional rigid robots. They are also typically lighter and cheaper to manufacture. However, their use in real-world applications is limited due to modeling challenges and difficulties in integrating effective proprioceptive sensors. Large-scale soft robots (about two meters in length) have greater modeling complexity due to increased inertia and related effects of gravity. Common efforts to ease these modeling difficulties such as assuming simple kinematic and dynamics models also limit the general capabilities of soft robots and are not applicable in tasks requiring fast, dynamic motion like throwing and hammering. To overcome these challenges, we propose a data-efficient Bayesian optimization-based approach for learning control policies for dynamic tasks on a large-scale soft robot. Our approach optimizes the task objective function directly from commanded pressures, without requiring approximate kinematics or dynamics as an intermediate step. We demonstrate the effectiveness of our approach through both simulated and real-world experiments.

Code

Paper

Video

Methodology Overview

Our approach consists of a low-dimensional control policy whose optimal parameters are obtained using Bayesian optimization (BayesOpt). Specifically, we simultaneously model the unknown task objective function using a Gaussian process and find inputs (policy parameters) that maximize this function using an acquisition function.

Standard Bayesian optimization is only practical in low-dimensional settings, i.e, about 10 dimensions in input space. However, we still need a policy parameterization that can produce a multi-horizon action sequence. We create such a policy by letting the policy parameters refer to indices of discrete pressure commands within a set we obtain by generating all possible action commands under a given discretization level P. For example, if P=2, pressure values can only take on either 0 or 1 (maximum pressure value). Next we generate all possible discrete vectors and index them. Lastly, Bayesian optimization is used to find the best sequence of indices (thus sequence of actions) to solve the task.

Simulated Dynamic Tasks

Despite the aforementioned challenges of learning expressive policies for large scale soft robot, our method can learn open-loop policies for tasks that are challenging to perform using soft robot approaches that rely on analytical models. Unlike these approaches, our method still learns successfully despite the robot's velocity being extremely high.

screencast_2024-01-27_08_54_18.mp4

Throwing

A throwing task where the robot has to throw a cube as far as possible, as measured in terms of distance from the base of the robot. We use a cube (instead of a sphere/ball) to minimize rolling effects.

screencast_2024-02-08_17_08_53.mp4

Hammering

A simulated hammering task where the robot must use a "hammer" (green 2kg ball) to hit a force sensor with a force exceeding some threshold.

Real Experiments

For safety reasons on the real-robot, we learn a policy for a dynamic task related to both throwing and hammering where we maximize the real-robot tip velocities. We measure the tip velocities using vive pucks (shown on the lbelow).

Our Bayesian Optimization-based approach learns a good policy for this task on physical robot hardware in only 80 trials (corresponding to about 6 minutes) of real-world data!

Comparisons against Reinforcement Learning (REDQ)

The absence of rich state information makes applying traditional feedback control approaches, including reinforcement learning (RL), particularly challenging. Nonetheless, we investigated the use of the recent REDQ agent – a data-efficient, model-free RL algorithm. For the REDQ agent, we provided end-effector velocities, positions of task-relevant objects, and step-wise rewards derived from the objective function. For fair comparison with our approach (and other baselines), we provided 2500 transitions (approximately 250 trials) as initial pre-training data for REDQ. Our findings (Table III) indicate that REDQ struggles to learn effective policies for both dynamic tasks when given a limited data budget and restricted state information. Our proposed method doesn't need state information, and learns policies that outperform the RL agent within 500 trials in simulation.

Citation

@inproceedings{Zwane2024,

author = {Sicelukwanda Zwane and Daniel G. Cheney and Curtis C Johnson and Yicheng Luo and Yasemin Bekiroglu and Marc Killpack and Marc P. Deisenroth},

booktitle = {Proceedings of the International Conference on Intelligent Robots and Systems (IROS)},

date = {2024-10-14},

title = {Learning Dynamic Tasks on a Large-scale Soft Robot in a Handful of Trials},

year = {2024}

}

Page updated

Google Sites

Report abuse