Hamid Serry, WMG Graduate Trainee, University of Warwick
Early stopping is an action applied during the training of a neural network, which allows it to halt further training when it detects that no more learning is being achieved. This is identified by checking if the model no longer improves its loss; alternatively, the model could begin overfitting to the provided dataset, which would also trigger an early stopping action.
Overfitting occurs where a model aligns too closely to the training dataset, whereby the decrease in overall error is not representative of an actual learning curve, instead it is a result of the model learning exactly how to interpret the dataset and not spot patterns in the data. This can be determined by measuring the validation error in addition to the training error, when the validation error begins rising again but the training error does not, it indicates the presence of overfitting , Figure 1.
Figure 1 – Error vs Time Trained (number of epochs) showing the effect of overfitting 
A model may also stop learning naturally, as the loss does not decrease regardless of the extended number of epochs. In this case, the training error and validation error will remain similar as training progresses, which can trigger an early stopping event.
As hyperparameter tuning involves many training cycles of a given model, the side benefit of early stopping becomes very useful as a means to save time and resources associated with the high cost of training, as this expense is multiplied across the various times a model is retrained under varying hyperparameters. However, as the main goal is to prevent overfitting, the model would perform much
better on validation and testing datasets with this implementation.
If we were analysing three hyperparameters while tuning, this would create a three-dimensionalspace in which a sampler would attempt to find the optimal values for the hyperparameters in. As our tuning process contains more than three hyperparameters, this creates an n-dimensional space, where n is the number of hyperparameters involved. As this number increases, so does the complexity of the solution, and the time it takes to complete the tuning. A pruning algorithm will influence when to stop searching in directions which give poorer resulting performance (mAP) and continue in ones which raise the performance, thus pruning unpromising trails . This enables the search to be carried out
effectively, and will result in a more accurate tuning process.
We have investigated some of the processes that take place while Hyperparameter Tuning, how they save us time and also add to the robustness of the resulting model. As some also apply to the general training process, these techniques will allow further training to occur efficiently, saving further expenses down the line. The next step involves the creation of the Anyverse dataset, where it can be used to re-train the model and begin analysing results. Before then there’ll be a one week break in reports, so we will come back with some more machine learning fun in week 21!
 X. Ying, “An Overview of Overfitting and its Solutions”, Journal of Physics: Conference Series, vol. 1168, p. 022022, 2019. Available: 10.1088/1742-6596/1168/2/022022.
 P. Sánchez, “Different methods for mitigating overfitting on Neural Networks”, Quantdare, 2021. [Online]. [Accessed: 12- May- 2022].
 “Efficient Optimization Algorithms — Optuna 2.10.0 documentation”, Optuna.readthedocs.io, 2022. [Online]. [Accessed: 13- May- 2022].
 D. Banik, A. Ekbal and P. Bhattacharyya, “Machine Learning Based Optimized Pruning Approach for Decoding in Statistical Machine Translation,” in IEEE Access, vol. 7, pp. 1736-1751, 2019, doi: 10.1109/ACCESS.2018.2883738.
Read more >>>