Abstract
Precise and rapid delineation of sharp boundaries and robust semantics is essential for numerous downstream robotic tasks, such as robot grasping and manipulation,
real-time semantic mapping, and online sensor calibration performed on edge computing units. Although boundary detection and semantic segmentation are
complementary tasks, most studies focus on lightweight models for semantic segmentation but overlook the critical role of boundary detection. In this work,
we introduce Mobile-Seed, a lightweight, dual-task framework tailored for simultaneous semantic segmentation and boundary detection. Our framework features a
two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. The encoder is divided into two pathways: one captures category-aware
semantic information, while the other discerns boundaries from multi-scale features. The AFD module dynamically adapts the fusion of semantic and boundary
information by learning channel-wise relationships, allowing for precise weight assignment of each channel. Furthermore, we introduce a regularization loss to
mitigate the conflicts between dual-task learning and deep diversity supervision. Compared to existing methods, the proposed Mobile-Seed offers a lightweight
framework to simultaneously improve semantic segmentation performance and accurately locate object boundaries. Experiments on the Cityscapes val dataset
have shown that Mobile-Seed achieves notable improvement over state-of-the-art (SOTA) baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score,
while maintaining an online inference speed of 23.9 frames-per-second (FPS) with 1024x2048 resolution input on an RTX 2080Ti GPU.
Additional experiments on CamVid and PASCAL Context datasets confirm our method's generalizability.
Introduction Video
If you are not interested in our in-depth analysis of Mobile-Seed, please skip ahead to 3min35s for qualitative results
Acknowledgements:
We borrow this template from FreeReg.