Segmentating and Counting Grapes Bunches
using MaskRCNN + Tracker DeepSort
Divego (Omia AI Group)
- Oscar Guarnizo Digevo Yachay Tech University
- Diego Suntaxi Digevo Yachay Tech University
- Fabricio Crespo Digevo Yachay Tech University
Description
This project is an additional implementation of Mask R-CNN for grapes mask detection, grapes bunches counting, and heat
maps generation. We based this work on the implementation from GitHub matterport/Mask_RCNN and
johncuicui/grapeMRCNN for grape sample detection. Our work's
main contributions are the addition of a DeepSort Tracker (programmed with Pytorch) and heat maps generation. This
implementation helps us to count the number of bunches without repetitions detected in a specific video. After that, we
extrapolate the counting information to satellite images to generate heat maps that show the number of grapes per parcel
in a yield. These images comprise the visual interpretability of the grapes bunches in an area.
Skills:
Python, TensorFlow, PyTorch, OpenCV, Pandas and Matplotlib.
Mask RCNN
Mask R-CNN, introduced by Kaiming He et al., is a two-step approach that is the continuation of Fast R-CNN.
- The first step (Region Proposal Network) scans the image and generates proposals (areas likely to contain an object).
- The second step (RoI Classification & Bounding Box Regressor) classifies the proposals and generates bounding boxes and masks.
Region Proposal Network
During the first step, a sliding window approach is implemented to extract the regions of interest (RoI). However, this
sliding window approach is powered by convolutions to get all predictions in one step forward. The convolutional neural
network uses a backbone based on a Feature Pyramid Network (FPN).
The sliding window approach is repeated with different
window sizes. Finally, overlapping regions are refined through Non-max Suppression.
RoI Classification & Mask Generation
During the second stage, another convolutional network is applied to each region of interest (RoI). The network
generates two outputs: the object class and the respective bounding box.
Until this point, we have similar behavior to Faster R-CNN. Then, Mask R-CNN adds additional convolutions to generate a mask in the bounding boxes already detected.
DeepSort Tracker
DeepSORT by Nicolai Wojke et al. is built upon the SORT implementation but integrates appearance information to improve the performance. This extension enables tracking objects through longer periods of occlusion, reducing identity switching. It is worth mentioning that we don't work much at this stage. Instead, we used an already code implementation from nwojke/deep_sort, and we adjusted it to our work case.
Heat Map Generation
We generate heat maps using a pragmatic (practical) approach. We take a satellite image from our objective parcel and
record several metadata detections per row in a parcel. Per each parcel, we perform the following:
1. Collect polygonal coordinates (including diagonal and rows angle) from the parcel satellite image. These coordinates
are called COORDINADAS_POLY
.
2. Then, perform a model that automatically finds each row's start and endpoints.
- Use the rows angle to divide rows (with points) along the diagonal.
- Intersect the dotted lines with the polygonal boundaries to get the start and endpoints per each row.
3. Divide (in rectangular shapes) each row automatically based on the grapes' distribution along the row.
Additional Results