[Day 3] Mask R-CNN

Key Idea

Mask R-CNN extends Faster R-CNN, a popular object detection framework, to also perform pixel-wise mask segmentation. It does this by adding a branch for predicting segmentation masks on each Region of Interest (RoI), enabling simultaneous object detection and semantic segmentation.

Why It Matters

While object detection (bounding boxes) and semantic segmentation (pixel-wise classification) have traditionally been handled separately, Mask R-CNN seamlessly integrates both tasks. This unified approach allows for precise object boundaries, improving the quality of results in applications like instance segmentation.

Technical Bite

The core innovation is the RoIAlign layer, which preserves the exact spatial locations by avoiding any quantization of the RoI boundaries, thus enabling the prediction of masks at a pixel level of detail.

Impact

Has become a foundational work in instance segmentation, influencing many subsequent models and being widely adopted in various computer vision applications.

Paper

Authors - Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross B. Girshick

Paper - [Link]