Start of a Wonderful Experience

Jan 17, 2015

My Experience at CMU Winter School

Last year I visited CMU-NITK Winter School organised by Carnegie Mellon University at National Institute of Technology, Karnataka. It was a wonderful opportunity and exposed me to large number of interesting topics in the field of machine learning.

The Winter School began on 10 December. After some introductory lectures, we were asked to brainstorm and come up with some interesting ideas that we would like to execute in a short span of just 15 days. This whole procedure was quite interesting and we came across a lot of interesting ideas. We had a group of four team members each from different background. This was very helpful as it exposed us to lot of new interesting ideas and approaches.

Despite the fact that we had only a span of 15 days for the project, we came up with a rather ambitious idea. We decided to make an application that can generate commentary for any match given just its video. This was an interesting problem and more importantly it was a great learning experience.

After some excellent guidance from the mentors, we were able to modularize the whole problem statement into discrete parts. Owing to the non availability of data, we restricted ourselves to generation of commentary for tennis matches. Given a video of a tennis match, we tried to find the key elements that can characterise the important information of the match. We divided the whole problem statement as:

  • Court Detection
  • Object Recognition
  • Action Recognition
  • Audio Analysis
  • Ball tracking
  • Commentary Generation

We divided the task among the four team members and set on to our tasks individually. However, we had to face a lot of challenges in the early phase before we can formalise our approach. The major drawback was the non availability of action labelled data for any sports dataset. We were not left with any other choice but to annotate data ourselves. We created a HTML 5 application utilising HTML 5 canvas that required the user to move around the rectangles, find the bounding box for the action region and label the corresponding part. We will try improving the code and provide it as an annotation tool as well share the annotated sports action dataset for tennis matches.

Next, we plunged to the job of object recognition. We discovered that RCNN (Regions with Convoluted Neural Networks) gave pretty superior results for object recognition as compared to other techniques and had a pre trained model on imagenet dataset which can be used directly. Since running the code for RCNN required a GPU, we were provided GPU access from IIT Bombay. Thereafter this began the next challenging part, installation of caffe and rcnn toolbox on the GPU………………