Hamid Serry, WMG Graduate Trainee, University of Warwick
Last week we investigated Early stopping and Pruning, and their essential role in both the machine learning model training and hyperparameter tuning. This week we are looking at some of the specifics of our hyperparameter tuning process and some of the results showcasing the tuned network.
In the 6th post of this series, details on many different types of hyperparameters were described. Of these, the following hyperparameters were chosen for tuning in this project: Batch Size, Learning Rate, Learning Rate Step Size and Weight Decay. Two more hyperparameters were tuned, these were as follows:
There are two different types of distributions that the hyperparameters can have. A Categorical distribution has a fixed list of values, such as numbers with no pattern or a Boolean, which is the case for Batch Size and Amsgrad. Batch size has a fixed set of possible values which were: 8, 16, 32 and 64; with Amsgrad being a Boolean and taking either a True or False state. The other distribution is a Discrete Uniform distribution, where the values lie between a minimum and a maximum and discretised with a step, as in Table 1.
Table 1 – Discrete Uniform Distributed Hyperparameters
The hyperparameter tuning program of the selected Faster R-CNN was written in python, using pytorch, with optuna as the optimisation framework. The Tree-structured Parzen Estimator (TPE) algorithm was used as the sampler  with Successive Halving Pruner  as the pruner.
The optimisation study consisted of 200 trials distributed across 4 GPUs (thus running 50 trials each) with mean Average Precision (mAP) over the validation set chosen as the metric to maximise. For the purposes of our testing, 4 of the classes from the Kitti dataset were used in the training; these were: Car, Person, Cyclist and Van.
Figure 1 shows the validation mAP for each trial. As you can see, some of the trials proved they were not promising, and thus were pruned before their completion. The final result of the tuning can be seen in Error! Reference source not found. 2; these hyperparameters have been then used for training the baseline network on the KITTI dataset. Across these 4 classes, the model obtained a ~89% mAP on the validation set, and some detection examples are shown in Figures 2 – 4, with different detected
Figure 1 – Intermediate values graph from hyperparameter optimisation
Table 2 – Final hyperparameters from results of hyperparameter optimisation
Figure 2 – Kitti dataset image analysed using the trained model: Showing Pedestrian and Cyclist classes
Figure 3 – Kitti dataset image analysed using the trained model: Showing Car and Cyclist classes
Figure 4 – Kitti dataset image analysed using the trained model: Showing Car and Van classes
We’ve walked through the results of the hyperparameter tuning, with some details on the configuration of the setup and some visual aids to help understanding previous concepts such as early stopping and pruning, as well as the bounding boxes of detection. We are now setting up the scenes and configuration to generate the Anyverse dataset, so the next blog post will walk through some of the generation process. See you then!
 I. Loshchilov and F. Hutter, “Decoupled weight decay regularization” (2017). ArXiv Preprint ArXiv:1711.05101.
 Optuna “optuna.samplers.TPESampler — Optuna 2.10.0 documentation“, Optuna.readthedocs.io, 2022. [Online]. [Accessed: 26- May- 2022].
 Optuna “optuna.pruners.SuccessiveHalvingPruner — Optuna 2.10.0 documentation”,
Optuna.readthedocs.io, 2022. [Online]. [Accessed: 26- May- 2022].
Read more >>>