Object Detection with RCNN

Jan 17, 2015

Identify tennis players from a image frame obtained from a video of a tennis match

Our project had started to progress after we had done all the background work, installed the required toolboxes and collected relevant data.

RCNN is the state of art visual object detection system, which uses the region proposals obtained via selective search with the convoluted neural networks.It is built upon the caffe deep learning framework. The toolbox is well documented and offers demo codes to demonstrate its usage. We utilised pre trained model which was trained on the imagenet dataset for the task of extracting tennis players from the video frames.

RCNN toolbox offers a demo code with the name rcnn_demo.m under the examples foler. It makes a call to the function named showboxes.m defined under the vis folder. ImageNet dataset has a large number of objects with around 200 labelled class of objects with classes as person, racket,ball, etc.The demo code returns the score of all the classes which may be found in the image together with the location of the bounding boxes around that object in the image.Since we were only interested in the person class we set a threshold on the score upto which regions will be reported and filtered the person class from other classes.

The results from RCNN were fairly accurate and are illustrated below.

RCNN Results

But since it detected a lot of extra regions for the person class, as the refree and the ball boys, we have to filter them out and extract precise information of the players only. Thus we took to our next objective of detecting the court which can filter the players from other persons present in the scene. Also, court detection gave information regarding the the players location with respect to the court, distiguish between frames with match sequence from other frames, and make other interesting inferences.