Link Search Menu Expand Document

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor

arXiv Github Supplementary Video

Trevor Ablett1, Oliver Limoyo1, Adam Sigal2, Affan Jilani3, Jonathan Kelly1 , Kaleem Siddiqi3, Francois Hogan2, Gregory Dudek2,3

1University of Toronto, 2Samsung AI Centre, Montreal, QC, Canada, 3McGill University

*work completed while Trevor Ablett, Oliver Limoyo, and Affan Jilani were on internship at Samsung AI Centre, Montreal

Submitted to IEEE Transactions on Robotics (T-RO): Special Section on Tactile Robotics


Summary

A summary of our system for multimodal and force-matched imitation learning.
A video showing: (1) kinesthetic teaching with force recording (parts 1 and 2 in diagram above), (2) a force-matched replay with mode switch label (parts 3 and 4 above), and (3) a learned multimodal policy with mode-switch action (part 5 above).

Motivation

When we open doors, our fingers roll, slip, use normal and shear forces, respond to dense tactile feedback, move quickly, and can manipulate small, challenging knobs. In this paper, we combined tactile sensing with imitation learning to attempt to allow robots to achieve some of these same desirable qualities.

Kinesthetic teaching is a popular approach to collecting expert robotic demonstrations of contact-rich tasks for imitation learning (IL), but it typically only measures motion, ignoring the force placed on the environment by the robot. Furthermore, contact-rich tasks require accurate sensing of both reaching and touching, which can be difficult to provide with conventional sensing modalities. We address these challenges with a see-through-your-skin (STS) visuotactile sensor, using the sensor both (i) as a measurement tool to improve kinesthetic teaching, and (ii) as a policy input in contact-rich door manipulation tasks.

Why Kinesthetic Teaching?

Teleoperation. Video in real time. Requires proxy for contact feedback, and is slow.

Teleoperation is a viable strategy for collecting demonstrations, but it requires a proxy for contact feedback. In the teleoperation example above, we used the vibration motor in the VR controller and the force-torque sensor to provide coarse feedback, but demonstrations were still difficult to collect. Even with considerable practice, we would still often cause the robot to enter into a protective stopped state for generating excessive force against the environment.


Left: Kinesthetic teaching.   Right: Replays of kinesthetic teaching without force matching.

Kinesthetic teaching, as shown above, provides contact feedback, alleviating this issue by allowing the operator to naturally react to environmental contact forces.

Kinesthetic teaching has a significant downside, however: the observation and action space during demonstrations does not match that of the robot. This can be partially resolved by using replays, where the robot is commanded to reach the same poses achieved by the human during the demonstration. However, if the human imparts a force on the environment during their demonstration, this replay trajectory can fail.

Approach

See-through-Your-Skin (STS) Tactile Sensor

Left: 2022 version of STS sensor1.   Right: Version from this work, in visual mode (LEDs off) and tactile mode (LEDs on).

An STS sensor1 is a visuotactile sensor, comparable to other gel-based tactile sensors2, that can be switched between visual and tactile modes by leveraging a semi-transparent surface and controllable lighting, allowing for both pre-contact visual sensing and during-contact tactile sensing with a single sensor. In this work, we use the sensor in tactile mode to record a signal linearly related to force during demonstrations, and show its value in both visual and tactile mode as an input to learned policies through extensive experiments.

Tactile Force Matching

Left: Our force-matched trajectories are generated using a spring relationship between desired poses and recorded poses, matching the functionality of our Cartesian impedance controller. Right: Kinesthetic teaching while reading the signal linearly related to force with an STS sensor.

Our force-matched poses were generated by measuring a signal linearly related to force with an STS sensor, which was then used to modify the desired poses before they were input back into our Cartesian impedance controller. For more details, see our paper.

Learned STS Mode Switching

Left: Our learned policy includes sensor mode as an output, in addition to robot motion. Right: Mode switch labels are provided during the force matched replay by the demonstrator, who pushes a key when the robot makes contact with the environment.

To further leverage an STS sensor for imitation learning, we add mode switching as a policy output, allowing the policy to learn the appropriate moment to switch an STS from its visual to its tactile mode.

Experiments

To verify the efficacy of force matching, learned mode switching, and tactile sensing in general, we study multiple observation configurations to compare and contrast the value of visual and tactile data from an STS with visual data from a wrist-mounted eye-in-hand camera.

Experimental Tasks

Demonstrations and force-matched replays of our four experimental tasks, GlassKnobOpen, GlassKnobClose, FlatHandleOpen, and FlatHandleClose.

In this work, we study four challenging robotic tasks: door opening and closing with a flat handle a glass knob, including a randomized initial position requiring a reach to initially make contact. This scenario is meant to to simulate a scenario with a true mobile manipulator, where the initial pose between the arm and the handle/knob would not be consistent between each open/close. The closing tasks required shutting the door “quietly” without allowing it to slam.

Training details

We collected 20 raw demonstrations for each task, and then completed replays of each demonstration under different conditions, such as including/excluding force matching and/or mode switching.

As shown in the Learned STS Mode Switching section, we trained policies with STS images, wrist camera images, and relative robot pose (i.e., relative to the beginning of an episode) as inputs, with delta end-effector poses and mode switch control as outputs.

For all experiments, we trained 3 seeds of each policy, and then ran each policy for 10 episodes in each configuration.

Results – Overall

The videos below show learned policies with force matching, mode switching, and STS data included.

Examples of our learned policies for the four tasks we tested in this work.   Left: GlassKnobOpen, GlassKnobClose.   Right: FlatHandleOpen, FlatHandleClose.
Left: Performance results with/without force matching, mode switching, and STS data. Right: Example runs with no force matching, no mode switching, and no STS data.

Results – Force Matching

Check back soon!

Results – Mode Switching

Check back soon!

What if we only use the Wrist Camera?

Check back soon!

Conclusion

Through over 3,000 test episodes from real-world manipulation experiments, we find that the inclusion of force matching raises average policy success rates by 62.5%, STS mode switching by 30.3%, and STS data as a policy input by 42.5%. Our results highlight the utility of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.

Code

Available on Github

Citation

@misc{ablettMultimodalForceMatchedImitation2023,
  title = {Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor},
  author = {Ablett, Trevor and Limoyo, Oliver and Sigal, Adam and Jilani, Affan and Kelly, Jonathan and Siddiqi, Kaleem and Hogan, Francois and Dudek, Gregory},
  year = {2023},
  month = dec,
  number = {arXiv:2311.01248},
  eprint = {2311.01248},
  primaryclass = {cs},
  publisher = ,
  doi = {10.48550/arXiv.2311.01248},
  urldate = {2024-02-15},
  archiveprefix = {arxiv},
  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning,Computer Science - Robotics}
}

Bibliography

  1. F. R. Hogan, J.-F. Tremblay, B. H. Baghi, M. Jenkin, K. Siddiqi, and G. Dudek, “Finger-STS: Combined Proximity and Tactile Sensing for Robotic Manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10865–10872, Oct. 2022, doi: 10.1109/LRA.2022.3191812. 

  2. C. Chi, X. Sun, N. Xue, T. Li, and C. Liu, “Recent Progress in Technologies for Tactile Sensors,” Sensors, vol. 18, no. 4, Art. no. 4, Apr. 2018, doi: 10.3390/s18040948. 


Space and Terrestrial Autonomous Systems Lab - University of Toronto Institute for Aerospace Studies