Segmentating and Counting Grapes Bunches
using MaskRCNN + Tracker DeepSort
Divego (Omia AI Group)

Description

overview

This project is an additional implementation of Mask R-CNN for grapes mask detection, grapes bunches counting, and heat maps generation. We based this work on the implementation from GitHub matterport/Mask_RCNN and johncuicui/grapeMRCNN for grape sample detection. Our work's main contributions are the addition of a DeepSort Tracker (programmed with Pytorch) and heat maps generation. This implementation helps us to count the number of bunches without repetitions detected in a specific video. After that, we extrapolate the counting information to satellite images to generate heat maps that show the number of grapes per parcel in a yield. These images comprise the visual interpretability of the grapes bunches in an area.

Skills:
Python, TensorFlow, PyTorch, OpenCV, Pandas and Matplotlib.

Mask RCNN

Mask R-CNN, introduced by Kaiming He et al., is a two-step approach that is the continuation of Fast R-CNN.

  1. The first step (Region Proposal Network) scans the image and generates proposals (areas likely to contain an object).
  2. The second step (RoI Classification & Bounding Box Regressor) classifies the proposals and generates bounding boxes and masks.

Mask R-CNN (2017) by Kaiming He et al.

Region Proposal Network
During the first step, a sliding window approach is implemented to extract the regions of interest (RoI). However, this sliding window approach is powered by convolutions to get all predictions in one step forward. The convolutional neural network uses a backbone based on a Feature Pyramid Network (FPN). The sliding window approach is repeated with different window sizes. Finally, overlapping regions are refined through Non-max Suppression.

Overlapping Regions

RoI Classification & Mask Generation
During the second stage, another convolutional network is applied to each region of interest (RoI). The network generates two outputs: the object class and the respective bounding box.

Fast R-CNN (2015) by Ross Girshick

Until this point, we have similar behavior to Faster R-CNN. Then, Mask R-CNN adds additional convolutions to generate a mask in the bounding boxes already detected.

DeepSort Tracker

DeepSORT by Nicolai Wojke et al. is built upon the SORT implementation but integrates appearance information to improve the performance. This extension enables tracking objects through longer periods of occlusion, reducing identity switching. It is worth mentioning that we don't work much at this stage. Instead, we used an already code implementation from nwojke/deep_sort, and we adjusted it to our work case.

Heat Map Generation

We generate heat maps using a pragmatic (practical) approach. We take a satellite image from our objective parcel and record several metadata detections per row in a parcel. Per each parcel, we perform the following:

1. Collect polygonal coordinates (including diagonal and rows angle) from the parcel satellite image. These coordinates are called COORDINADAS_POLY.

2. Then, perform a model that automatically finds each row's start and endpoints.

  • Use the rows angle to divide rows (with points) along the diagonal.
  • Intersect the dotted lines with the polygonal boundaries to get the start and endpoints per each row.

3. Divide (in rectangular shapes) each row automatically based on the grapes' distribution along the row.

Additional Results

Counting Grape Bunches

Related links

Powered by Jon Barron and Michaƫl Gharbi.