Top Tips For Competing in a Kaggle Competition

Hi, my name is Prashant Kikani and in this blog post, I share some tricks and tips to compete in Kaggle competitions and some code snippets which help in achieving results in limited resources. Here is my Kaggle profile.

For Deep Learning competitions

1. TTA(test time augmentation) in inference similar to be done in training.

TTA is making predictions on the same data-sample multiple times but each time, the sample will be augmented. So, the overall sample is the same, but we do some data augmentation similar to be done while training.

Doing TTA is a common way to make our model predictions more robust & reduce the variance in our predictions. Therefore, ultimately improving the score.

2. Change image size at inference time

While training a computer vision model, we often resize our images to 512x512 or 256x256 size. We do that to fit our data and model in the GPU memory. In most cases, we can't train the model with original high-resolution images.

But, at inference time, we can keep the images in the original shape or in high resolution. Because we don't do/can't do back-propagation at inference time. Doing this helps because we are giving our model more relevant data in the form of those extra pixels.

3. Ensemble of multiple diverse models

Ensemble is a technique in which we combine multiple diverse models (which are mostly trained on the same data) by using all the models at inference time. For example, averaging predictions of all the models on a test data-sample. The goal of ensemble is to reduce the biad and/or variance in our predictions. Here's a great notebook to learn more about ensemble with Python code. Also here is a great blog about the ensemble.

In Kaggle, people share their codes with certain performances on the public leaderboard. What we can do is, we can train our own model and we can ensemble our model with that best public model.

4. Gradient Accumulation or effective batch size.

Generally, GPU RAM becomes a hurdle train bigger models in robust manner. Kaggle GPU provided 16 GB of GPU memory. But in some cases, we can't fit higher

batch_size

in that RAM. Higher

batch_size

is good to train robust models. So, we can do gradient accumulation to make our

batch_size

effectively higher. Here's a sample code in PyTorch from this gist.

model.zero_grad()                                   # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step
        model.zero_grad()                           # Reset gradients tensors
        if (i+1) % evaluation_steps == 0:           # Evaluate the model when we...
            evaluate_model()                        # ...have no gradients accumulated

We do

optimizer.step()

not every batch, but for every

accumulation_steps

. Which will also change the back-propagation period. And that will effectively change our

batch_size

higher.

5. Post-processing on predictions

Once we get our predictions on the test data, we can do some processing on it based on the metric of the competition & nature of data. For example,

If
```
AUC
```
is the metric measure performance, then rank average to ensemble models will perform better than a simple average. You can find a sample Python code for this here.
If
```
LogLoss
```
is the metric of the competition, then simple average of labels gives the best naive baseline. Also, multiplying all the metrics with 0.99 or 1.01 helps in
```
LogLoss
```
metric.
Sometimes, rather than a simple average of predictions of models in the ensemble, geometric mean can be better. Here is a sample Python for this.
Most post-processing techniques depend on the nature of the competition data. Training data may have some signals to tweak the model predictions to improve the score.

6. Feature Engineering and Data Augmentation

This is very obvious and famous thing to do in Kaggle. Using all the given data in a competition, can we make more relevant data to make our model more robust & better?

In tabular competitions, we combine multiple columns from our data to make more relevant columns in our data. For example,

height

and

width

columns of a house are given, we can create its

total_area

by multiplying those two columns. Lots of things can be done in feature engineering - we just need to use our brain in how we can create more relevant data out of existing data.

Data augmentation is mostly done in image and text data. In image data, we can apply all sorts of transformations to make more data with the same label as the original image like:

Rotate the image by any degree between 0-360.
Crop the irrelevant part.
Changing the opacity of the image.
Flip the image horizontally/vertically.

In text-data, we can augment using back-translation. Like, given an English sentence, we translate that sentence to let say German & translate it back from German to English. Now, that new English sentence may not be exactly the same as the original sentence but the meaning will be more or less the same.

Feature engineering is a skill that requires creativity & logical thinking. And that's what differentiates a good Kagglers from a novice!

If you have enjoyed this blog, you may find some of my other blogs interesting!

Happy Kaggling. Cheers!