40

Monocular Dynamic Object SLAM in Autonomous Driving

 3 years ago
source link: https://towardsdatascience.com/monocular-dynamic-object-slam-in-autonomous-driving-f12249052bf1?gi=851df26c562d
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A review of monoDOS as of 2020

Conventional SLAM (simultaneous localization and mapping) algorithms commonly have a static world assumption. Even for practical SLAM systems that are able to run in dynamic environments, they usually treat dynamic objects as outliers and try to filter them out to get a static environment before applying the conventional SLAM pipeline. This severely limits its online application in autonomous driving where explicit handling of dynamic objects is critical.

Monocular dynamic object SLAM (MonoDOS)extends conventional SLAM methods in two ways. It is object -aware in that it detects and tracks not only keypoints but also objects with higher-level semantic meaning. It is also dynamic in that it can handle scenes with dynamic objects and tracks the motion of these objects.

It may be good to keep in mind that not all object SLAM systems are dynamic, and not all dynamic SLAM systems are object-aware. The seminal work of object SLAM is SLAM++ (CVPR 2013) but it still requires a static scene with static objects. Some dynamic SLAM systems improve pose estimation based on the rigid body and constant velocity constraint, but without the explicit notion of objects.

This post reviews several state-of-the-art papers in the field of dynamic object SLAM. It mainly focuses on monocular methods, and some stereo ones which can be modified to a monocular setup. This is by no means an exhaustive review, and let me know if you recommend other relevant studies.

The Elements of Dynamic Object SLAM

The dynamic object SLAM system introduces the notion of an object, and this has several implications. First, it needs to have an object proposal stage from a single frame, just as the keypoint proposal stage in conventional SLAM systems (such as ORB in ORB-SLAM). This stage will give 2D or 3D object detection results. Recent advances in monocular 3D object detection will shine here. Second, it has a more complicated data association. Static SLAM only cares about keypoints and data association just means keypoint matching across frames with feature vectors. Now we introduced the notion of objects, we have to also perform data association between keypoints and object in each frame, and objects across frames. Third, as a natural extension to the bundle adjustment in conventional SLAM, now we have to add tracked objects (tracklets) and dynamic keypoints on these objects, optionally with a velocity constraint from an assumed motion model.

I made the following chart to capture the three fundamental elements of dynamic object SLAM. The green block captures the Data Association process, the blue block captures the Bundle Adjustment process, and red squares are the factors to be optimized in the factor graph representation of Bundle Adjustment.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK