[Day 3] Mask R-CNN
Key Idea
Mask R-CNN extends Faster R-CNN, a popular object detection framework, to also perform pixel-wise mask segmentation. It does this by adding a branch for predicting segmentation masks on each Region of Interest (RoI), enabling simultaneous object detection and semantic segmentation.
Why It Matters
While object detection (bounding boxes) and semantic segmentation (pixel-wise classification) have traditionally been handled separately, Mask R-CNN seamlessly integrates both tasks. This unified approach allows for precise object boundaries, improving the quality of results in applications like instance segmentation.
Technical Bite
The core innovation is the RoIAlign layer, which preserves the exact spatial locations by avoiding any quantization of the RoI boundaries, thus enabling the prediction of masks at a pixel level of detail.
Impact
Has become a foundational work in instance segmentation, influencing many subsequent models and being widely adopted in various computer vision applications.
Paper
Authors - Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross B. Girshick
Paper - [Link]