If you are listening to the current data science buzz at all, for sure you have heard about Kaggle. What is it, and why is everyone talking about it? In this post, we will talk a little bit about Kaggle, and how to get started with it too.
Kaggle is a platform made by Anthony Goldbloom in 2010 for data scientists to compete with and learn from each other. Since its inception, it has attracted millions of people, with over two million models having been submitted to the platform. Just from that, you can tell that people are excited about it! The reason, of course, is that it’s the biggest congregation of data scientists in the world, so there’s much wisdom to be had from it. Also, competitions offering handsome prize money are regularly hosted, so if you can prove yourself best-in-class, you can make a killing there too!
Isn’t it for, like, really clever people?
The very top tier in Kaggle is admittedly occupied by really smart people — research scientists and top practitioners. But the bar to get started is surprisingly low. There are people at all levels of expertise in Kaggle, learning from each other. Although there’s quite a long way to go to become an expert data scientist, the journey might as well start now.
Okay, I’m sold! How do I start?
Great! If you are ready to get started, just head over to the Kaggle website. They have great documentation. In particular, you can look at the competitions docs to know about the different kinds of competitions hosted on Kaggle, and how to join them. The page also lists some great resources for getting started with and gaining insights into Kaggle — particularly Kaggle learn. Once you have whetted your appetite, you can join a competition once chosen.
Choosing a competition
It’s important to choose a competition that’s appropriate for your level and your interests. If you look at the competitions page, you’ll be greeted with an invitation to join the Titanic competition, which asks you to predict the survival of the passengers on that fateful voyage. There will also be a list of active, completed, and “in class” competitions, which you can filter by categories and sort by properties like prize money, the number of teams joining, and others. A good place to start for beginners is the “getting started” category. If you’re a bit more comfortable with machine learning, you can also try the “playground” category. From there, you can move on to the more advanced levels. For our discussion today, we will choose the delightfully named competition “What’s Cooking?”.
Making a notebook and submitting
Assuming you have made a Kaggle account, the first thing you need to do is join the competition, “What’s Cooking?”, which you can do by clicking on the button on the top right corner and accepting the rules of the competition. You’ll get what the competition is about from the overview tab, and you can read about how you’ll be scored from the evaluation page within it. Next, you can see what data you’ll be working with from the data tab.
The fun begins in the notebooks tab, where you can go and make a new notebook for yourself. This is where you will do all your work, and make the submission. When you open a new notebook, it’ll contain a single cell that imports some useful libraries, and prints the contents of the “input” directory. You can type (or copy) the following code into a cell to prepare the submission file:
!cp ../input/whats-cooking-kernels-only/sample_submission.csv.zip . !unzip -oq sample_submission.csv.zip !mv sample_submission.csv submission.csv
Let’s go ahead and make a submission with this file to get into the game. These steps will walk you through the submission.
- On the top right corner of the notebook, you’ll see a button named “Save Version”. Click that and select “Save & Run All” from the dialog that appears. If you click “Save” then, Kaggle runs all cells in your notebook and saves them with their output as a new version.
- Now click the “1” that appears beside the “Save Version” button, and you’ll see the version you just created.
- Click the ellipsis button (…) at the right of “Version 1”, and click “Open in Viewer” to open it in a new tab.
- Now scroll down to the output section, and choose “submission.csv”.
- Finally, click the submit button on the right.
Congratulations, you just made your first submission!
The default submission file predicts all dishes to be Italian, which gives us an accuracy of about 20%. To get a more reasonable score, we have to do actual machine learning. In a separate post, we’ll talk about some ways you can go about to improve this score. In the meantime, go ahead and make some models to get better predictions, and see how well you fare against the other people on Kaggle. Take a look at the notebooks created by them — it’s a great way of learning. Now that you’ve done the necessary work to set up an environment, you’re well on your way to learning and applying your machine learning knowledge. I hope this is the beginning of a great journey!