Monocular Dynamic Object SLAM in Autonomous Driving

A review of monoDOS as of 2020

Jul 4 ·13min read

Conventional SLAM (simultaneous localization and mapping) algorithms commonly have a static world assumption. Even for practical SLAM systems that are able to run in dynamic environments, they usually treat dynamic objects as outliers and try to filter them out to get a static environment before applying the conventional SLAM pipeline. This severely limits its online application in autonomous driving where explicit handling of dynamic objects is critical.

Monocular dynamic object SLAM (MonoDOS)extends conventional SLAM methods in two ways. It is object -aware in that it detects and tracks not only keypoints but also objects with higher-level semantic meaning. It is also dynamic in that it can handle scenes with dynamic objects and tracks the motion of these objects.

It may be good to keep in mind that not all object SLAM systems are dynamic, and not all dynamic SLAM systems are object-aware. The seminal work of object SLAM is SLAM++ (CVPR 2013) but it still requires a static scene with static objects. Some dynamic SLAM systems improve pose estimation based on the rigid body and constant velocity constraint, but without the explicit notion of objects.

This post reviews several state-of-the-art papers in the field of dynamic object SLAM. It mainly focuses on monocular methods, and some stereo ones which can be modified to a monocular setup. This is by no means an exhaustive review, and let me know if you recommend other relevant studies.

The Elements of Dynamic Object SLAM

The dynamic object SLAM system introduces the notion of an object, and this has several implications. First, it needs to have an object proposal stage from a single frame, just as the keypoint proposal stage in conventional SLAM systems (such as ORB in ORB-SLAM). This stage will give 2D or 3D object detection results. Recent advances in monocular 3D object detection will shine here. Second, it has a more complicated data association. Static SLAM only cares about keypoints and data association just means keypoint matching across frames with feature vectors. Now we introduced the notion of objects, we have to also perform data association between keypoints and object in each frame, and objects across frames. Third, as a natural extension to the bundle adjustment in conventional SLAM, now we have to add tracked objects (tracklets) and dynamic keypoints on these objects, optionally with a velocity constraint from an assumed motion model.

I made the following chart to capture the three fundamental elements of dynamic object SLAM. The green block captures the Data Association process, the blue block captures the Bundle Adjustment process, and red squares are the factors to be optimized in the factor graph representation of Bundle Adjustment.

A review of monoDOS as of 2020

The Elements of Dynamic Object SLAM

Recommend

Go 语言基础系列：基础语法

Go 语言操作 MySQL 之事务操作

图解|什么是缺页错误Page Fault

浅谈差分约束系统

助你进大厂，这些Mysql索引底层知识你是必须知道的。

阅文新合同一个月：谁在逃离，谁在回归？

抖音响彻校园

贾跃亭“金蝉脱壳”后，谁将会为梦想窒息？

禁用中国App后，印度成了最大输家

“技术中立”消亡史

About Joyk