AI BLOG | December 09, 2020

AI Meets Sports:

Analyzing Tennis Shots Using
Advanced Computer Vision

Written by : Nghia Truong
tennis

Computer Vision is a rapidly growing subfield within the broader fields of Computer Science and Artificial Intelligence, as it is both fairly new, and has a wide range of applications. Computers are now capable of taking in images from videos or photos and sorting objects into categories at a speed and level of detail and accuracy that humans are not capable of. From those outputs, we are able to analyze data more specifically and efficiently and come up with compelling results.

One can easily imagine how computer vision and AI can be applied in sports. Give the computer a camera feed and you get back multiple insights that potentially enhance the athletes’ performance, assist in making fair decisions, etc. One such application would be using computer vision to analyze tennis shots, and potentially provide amateur players with the level of specificity and data necessary to actually help them improve. This is the theme of our latest research topic with Lisa Baily (The American School in Japan), Yoshi Truong (Tokyo Techies), Johnathan Lai (Tokyo Coding Club) and Phong Nguyen (Tokyo Techies), which has been presented at the International Conference on Sport Sciences Research and Technology Support (icSPORTS) 2020.

While there is no shortage of researches on improving tennis performance, the number of actual applications are few and far between. One of the reasons is that most of the approaches require data to be gathered in a very specific way- through EMG sensors, dual camera setups, super high-speed cameras, etc. – which are not easily available to normal tennis players. Our research aims at providing a method to analyze tennis data using just a single video. The obvious benefit is that data gathering is greatly simplified, to the point where any tennis player, pros and amateurs alike, can attempt with just a simple smartphone and a tripod. There is also the obvious downside: there is no way to accurately get the data in 3D coordinates from a single video. However, our research showed that even with 2-dimensional data, it is possible and valuable to do statistical analysis on the video.

The approach is fairly simple and is illustrated by the following figure:

To keep things simple, we only focused on the
serve, which is arguably the most important
stroke in tennis. We collected data from YouTube
and also did some recordings by ourselves

From the video, we extracted the relevant frames, then ran them through a pose estimation and tracking framework to get the athletes’ poses in numerical form. After labeling and normalizing the data, we had a nice database of poses ready to be analyzed. The pose estimation framework chosen was AlphaPose, mostly due to my familiarity with it through different on-going research.

It’s crucial to normalize the pose data to account for the difference in camera angles and distances. Without scaling and centralization, the poses cannot be compared in a meaningful way.

It’s crucial to normalize the pose data to account for the difference in camera angles and distances. Without scaling and centralization, the poses cannot be compared in a meaningful way.

And this is how it looks like after normalization. 
We also flipped the pose of the left-handed players.

After that, it’s the matter of statistically comparing the poses and drawing some conclusions. In this research, we are interested in the spatial difference between the poses in corresponding frames. We chose Euclidean distance as our comparison metric and looked at the distribution of distances between all pairs of serves we collected. The results are not particularly surprising.

  • Comparing data of the same player reveals how consistent they are when performing a serve. Professional players show a much higher level of consistency than amateur players.

Sum of body part differences in  all pairs of serves
from the same player between professionals and amateur

  • Comparing data of different players shows the relationship between different groups of players. The pro group again shows more consistency among themselves

Histogram of total sum of distances
between amateurs

Histogram of total sum of distances
between professionals

Not stopping at the consistency, we also successfully identified which body parts move most differently between amateurs and professionals: the left hip and both wrists. We did it by performing t-tests on the distributions of distances of different body parts. I choose not to put the details here; if you are interested, please refer to the actual paper [to be referred later].

Our research proved that it’s possible to get meaningful insights from a single 2-dimensional video. I hope it opens the door to let computer vision and AI come closer to amateur and professional athletes alike. For future works, we are thinking of automating more parts of the pipeline, such as serve extraction and labeling, and doing more analysis on the dynamic of the serve instead of just the distributions of distances.

Here is the research paper explaining how advanced computer vision can help us differentiate between a Profession Tennis Player and an Amateur Player by analysing their technique. 
https://www.scitepress.org/Papers/2020/101458/101458.pdf

Want to know more about our AI services?