INF: Implicit Neural Fusion for LiDAR and Camera

Shuyi Zhou1, 2, Shuxiang Xie1, 2, Ryoichi Ishikawa1, Ken Sakurada2, Masaki Onishi2, Takeshi Oishi1,
1The University of Tokyo, 2National Institute of Advanced Industrial Science and Technology

Abstract

Sensor fusion has become a popular topic in robotics. However, conventional fusion methods encounter many difficulties, such as data representation differences, sensor variations, and extrinsic calibration. For example, the calibration methods used for LiDAR-camera fusion often require manual operation and auxiliary calibration targets. Implicit neural representations (INRs) have been developed for 3D scenes, and the volume density distribution involved in an INR unifies the scene information obtained by different types of sensors. Therefore, we propose implicit neural fusion (INF) for LiDAR and camera. INF first trains a neural density field of the target scene using LiDAR frames. Then, a separate neural color field is trained using camera images and the trained neural density field. Along with the training process, INF both estimates LiDAR poses and optimizes extrinsic parameters. Our experiments demonstrate the high accuracy and stable performance of the proposed method.

Method Overview

Method Overview Diagram

Processes shown by dashed lines indicate back-propagation, update and optimization.

Neural Density Field

We use LiDAR measurement data to train a neural density field that can represent the space geometry. Neural density field outputs the densities of sampled points, which can be integrated using volume rendering to calculate depth values of LiDAR rays. Notice that LiDAR poses can also be estimated in a sequential process using the neural density field.

Neural Color Field

We also use camera data and trained density field to generate and refine a neural color field for color representation. nstead of directly defining camera poses, we derive the camera poses from LiDAR poses and LiDAR-camera extrinsic parameters. In this way, the extrinsic parameters can be optimized, which is the key part of sensor fusion.

Video

BibTeX

@inproceedings{zhou2023inf,
    author    = {Shuyi, Zhou and Shuxiang, Xie and Ryoichi, Ishikawa and Ken, Sakurada and Masaki, Onishi and Takeshi, Oishi},
    title     = {INF: Implicit Neural Fusion for LiDAR and Camera},
    journal   = {IROS},
    year      = {2023}
    }