Image by Author
Introduction
Have you ever spent hours debugging a machine learning model but can’t seem to find a reason the accuracy does not improve? Have you ever felt everything should work perfectly but for some mysterious reason you are not getting exemplary results?
Well no more. Exploring PyTorch as a beginner can be daunting. In this article, you explore tried and tested workflows that will surely improve your results and boost your model’s performance.
1. Overfit a Single Batch
Ever trained a model for hours on a large dataset just to find the loss isn’t decreasing and the accuracy just flattens? Well, do a sanity check first.
It can be time-consuming to train and evaluate on a large dataset, and it is easier to first debug models on a small subset of the data. Once we are sure the model is working, we can then easily scale training to the complete dataset.
Instead of training on the whole dataset, always train on a single batch for a sanity check.
batch = next(iter(train_dataloader)) # Get a single batch # For all epochs, keep training on the single batch. for epoch in range(num_epochs): inputs, targets = batch predictions = model.train(inputs)
Consider the above code snippet. Assume we already have a training data loader and a model. Instead of iterating over the complete dataset, we can easily fetch the first batch of the dataset. We can then train on the single batch to check if the model can learn the patterns and variance within this small portion of the data.
If the loss decreases to a very small value, we know the model can overfit this data and can be sure it is learning in a short time. We can then train this on the complete dataset by simply changing a single line as follows:
# For all epochs, iterate over all batches of data. for epoch in range(num_epochs): for batch in iter(dataloader): inputs, targets = batch predictions = model.train(inputs)
If the model can overfit a single batch, it should be able to learn the patterns in the complete dataset. This overfitting batch method enables easier debugging. If the model can not even overfit a single batch, we can be sure there is a problem with the model implementation and not the dataset.
2. Normalize and Shuffle Data
For datasets where the sequence of data is not important, it is helpful to shuffle the data. For example, for the image classification tasks, the model will fit the data better if it is fed images of different classes within a single batch. Passing data in the same sequence, we risk the model learning the patterns based on the sequence of data passed, instead of learning the intrinsic variance within the data. Therefore, it is better to pass shuffled data. For this, we can simply use the DataLoader object provided by PyTorch and set shuffle to True.
from torch.utils.data import DataLoader dataset = # Loading Data dataloder = DataLoader(dataset, shuffle=True)
Moreover, it is important to normalize data when using machine learning models. It is essential when there is a large variance in our data, and a particular parameter has higher values than all the other attributes in the dataset. This can cause one of the parameters to dominate all the others, resulting in lower accuracy. We want all input parameters to fall within the same range, and it is better to have 0 mean and 1.0 variance. For this, we have to transform our dataset. Knowing the mean and variance of the dataset, we can simply use the torchvision.transforms.Normalize function.
import torchvision.transforms as transforms image_transforms = transforms.Compose([ transforms.ToTensor(), # Normalize the values in our data transforms.Normalize(mean=(0.5,), std=(0.5)) ])
We can pass our per-channel mean and standard deviation in the transforms.Normalize function, and it will automatically convert the data having 0 mean and a standard deviation of 1.
3. Gradient Clipping
Exploding gradient is a known problem in RNNs and LSTMs. However, it is not only limited to these architectures. Any model with deep layers can suffer from exploding gradients. Backpropagation on high gradients can lead to divergence instead of a gradual decrease in loss.
Consider the below code snippet.
for epoch in range(num_epochs): for batch in iter(train_dataloader): inputs, targets = batch predictions = model(inputs) optimizer.zero_grad() # Remove all previous gradients loss = criterion(targets, predictions) loss.backward() # Computes Gradients for model weights # Clip the gradients of model weights to a specified max_norm value. torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1) # Optimize the model weights AFTER CLIPPING optimizer.step()
To solve the exploding gradient problem, we use the gradient clipping technique that clips gradient values within a specified range. For example, if we use 1 as our clipping or norm value as above, all gradients will be clipped in the [-1, 1] range. If we have an exploding gradient value of 50, it will be clipped to 1. Thus, gradient clipping resolves the exploding gradient problem allowing a slow optimization of the model toward convergence.
4. Toggle Train / Eval Mode
This single line of code will surely increase your model’s test accuracy. Almost always, a deep learning model will use dropout and normalization layers. These are only required for stable training and ensuring the model does not either overfit or diverge because of variance in data. Layers such as BatchNorm and Dropout offer regularization for model parameters during training. However, once trained they are not required. Changing a model to evaluation mode disables layers only required for training and the complete model parameters are used for prediction.
For a better understanding, consider this code snippet.
for epoch in range(num_epochs): # Using training Mode when iterating over training dataset model.train() for batch in iter(train_dataloader): # Training Code and Loss Optimization # Using Evaluation Mode when checking accuarcy on validation dataset model.eval() for batch in iter(val_dataloader): # Only predictions and Loss Calculations. No backpropogation # No Optimzer Step so we do can omit unrequired layers.
When evaluating, we do not need to make any optimization of model parameters. We do not compute any gradients during validation steps. For a better evaluation, we can then omit the Dropout and other normalization layers. For example, it will enable all model parameters instead of only a subset of weights like in the Dropout layer. This will substantially increase the model’s accuracy as you will be able to use the complete model.
5. Use Module and ModuleList
PyTorch model usually inherits from the torch.nn.Module base class. As per the documentation:
Submodules assigned in this way will be registered and will have their parameters converted too when you call to(), etc.
What the module base class allows is registering each layer within the model. We can then use model.to() and similar functions such as model.train() and model.eval() and they will be applied to each layer within the model. Failing to do so, will not change the device or training mode for each layer contained within the model. You will have to do it manually. The Module base class will automatically make the conversions for you once you use a function simply on the model object.
Moreover, some models contain similar sequential layers that can be easily initialized using a for loop and contained within a list. This simplifies the code. However, it causes the same problem as above, as the modules within a simple Python List are not registered automatically within the model. We should use a ModuleList for containing similar sequential layers within a model.
import torch import torch.nn as nn # Inherit from the Module Base Class class Model(nn.Module): def __init__(self, input_size, output_size): # Initialize the Module Parent Class super().__init__() self.dense_layers = nn.ModuleList() # Add 5 Linear Layers and contain them within a Modulelist for i in range(5): self.dense_layers.append( nn.Linear(input_size, 512) ) self.output_layer = nn.Linear(512, output_size) def forward(self, x): # Simplifies Foward Propogation. # Instead of repeating a single line for each layer, use a loop for layer in range(len(self.dense_layers)): x = layer(x) return self.output_layer(x)
The above code snippet shows the proper way of creating the model and sublayers with the model. Th use of Module and ModuleList helps avoid unexpected errors when training and evaluating the model.
Conclusion
The above mentioned methods are the best practices for the PyTorch machine learning framework. They are widely used and are recommended by the PyTorch documentation. Using such methods should be the primary way of a machine learning code flow, and will surely improve your results.
Muhammad Arham is a Deep Learning Engineer working in Computer Vision and Natural Language Processing. He has worked on the deployment and optimizations of several generative AI applications that reached the global top charts at Vyro.AI. He is interested in building and optimizing machine learning models for intelligent systems and believes in continual improvement.
- Top 6 Tools to Improve Your Productivity on Snowflake
- Maximize Your Productivity as a Data Scientist by Organizing
- 7 AI-Powered Tools to Enhance Productivity for Data Scientists
- How to label time series efficiently – and boost your AI
- 5 ChatGPT Features to Boost your Daily Work
- Boost Your AI and ML Skills for Free at NVIDIA Conference