CBNet: A Novel Composite Backbone Network Architecture for Object Detection Revi...

As of today, the object detection network that performs best on the COCO dataset is CBNet, having an average precision of 53.3 on the COCO test dataset.

The authors claim that incorporating a more powerful backbone increases the performance of the object detector. To do so, they propose a novel strategy for assembling multiple identical backbones by composite connections between the adjacent backbones. By doing this they came up with a more powerful backbone called the Composite Backbone Network.

rAJz6zF.png!web

As it’s shown in the above figure, CBNet is composed of multiple identical backbone networks and composite connections between the neighbor backbones. From left to right, the output of each stage in an Assistant Backbone, which also can be seen as higher-level features. The outputs from each feature level flow to the parallel stage of the succeeding backbone as part of inputs through composite connections. By doing this, multiple high-level and low-level features are fused to generate richer feature representation.

The paper introduces two types of architectures: Dual-Backbone (DB) and Triple-Backbone (TB) . As you can guess from the naming, DB consists of two identical backbones, and TB consists of three identical backbones. The performance difference will be discussed later in this post.

To compose multiple outputs from the backbones, the paper introduces a Composite Connection block. This block consists of a 1x1 convolution followed by a batch normalization layer. These layers are added to reduce the number of channels and to perform an upsample operation.

The final backbone (placed rightmost in the figure), named as a Lead Backbone, is used for object detection. The output feature from the Lead Backbone is fed into the RPN/detection head, while the output of each Assistant Backbones is fed into its adjacent backbone.

Composite Styles

UVFVnyV.png!web

There are also four kinds of composite styles.

Adjacent Higher Level Composition is the style explained in the earlier section. Each output feature from the Assistant Backbone is fed into the adjacent backbone using the Composite Connection block.
Same Level Composition is another simple composition style, which feeds the output of the adjacent lower-level stage of the previous backbone to the succeeding backbone. As it’s shown in the figure, this style does not make use of the composite connection block. The feature from the lower level backbone is added straight to the adjacent backbone.
Adjacent Lower-Level Composition is very similar to the AHLC. The only difference is that the feature from the lower level stage of the previous backbone is passed on to the succeeding backbone.
Dense Higher-Level Composition is inspired by the DenseNet paper, where each layer is connected to all the subsequent layers to build a dense connection in a stage.

The table above shows the comparison between different composition styles. We can observe that the AHLC style outperforms other composite styles. The reason behind this is well explained in the paper. The authors claim that directly adding the lower-level features of the previous backbone to the higher-level ones of the succeeding backbone harms the semantic information of the latter features. On the other hand, adding deeper features of the previous backbone to the shallow ones of the succeeding backbone enhances the semantic information of the latter features.

Results

Ev2yaiz.png!web

The table above shows the detection results on the MS-COCO test dataset. Column 5–7 shows the object detection results while column 8–10 shows instance segmentation results. It clearly shows that utilizing more backbone architectures pulls up the performance of the network.

Conclusion

The paper shows a novel architecture called CBNet. By composing multiple backbone architectures, the proposed network increases the accuracy of the detection network by about 1.5 to 3 percent.

It would be worth inspecting further about the increased parameter size and the training time.

Composite Styles

Results

Conclusion

Recommend

Microsoft's “Love” of Linux

深入解析：分布式系统的事务处理经典问题及模型

可怕！爱在朋友圈晒自拍的妹子们一定要看!

XDA Developers on Twitter: "Samsung Galaxy S11 gets certified with 5G and 2...

FF总部将举行贾跃亭债权人会议多位债权人确认赴美

5G融入百业，未来移动先行

Steam now lets you invite iOS and Android devices to join PC multiplayer games r...

Petitionen an den Stadtrat: Beibehaltung von LiMux als Betriebssystem für die St...

外卖平台“掐架”？西安有商户被饿了么要求退出美团

拼多多股价大跌23% 创始人黄峥一夜之间损失48亿美元

About Joyk