SHARE
CHAPTER 8
Author
Hamid Serry, WMG Graduate Trainee, University of Warwick
Hyperparameters
In the 6th post of this series, details on many different types of hyperparameters were described. Of these, the following hyperparameters were chosen for tuning in this project: Batch Size, Learning Rate, Learning Rate Step Size and Weight Decay. Two more hyperparameters were tuned, these were as follows:
- Learning Rate Gamma: A multiplier used to alter the size of the learning rate.
- Amsgrad: This refers to a variant of the AdamW optimiser which utilises a Moving Average algorithm rather than the original AdamW algorithm [1].
There are two different types of distributions that the hyperparameters can have. A Categorical distribution has a fixed list of values, such as numbers with no pattern or a Boolean, which is the case for Batch Size and Amsgrad. Batch size has a fixed set of possible values which were: 8, 16, 32 and 64; with Amsgrad being a Boolean and taking either a True or False state. The other distribution is a Discrete Uniform distribution, where the values lie between a minimum and a maximum and discretised with a step, as in Table 1.

Table 1 – Discrete Uniform Distributed Hyperparameters
Tuning Setup
The hyperparameter tuning program of the selected Faster R-CNN was written in python, using pytorch, with optuna as the optimisation framework. The Tree-structured Parzen Estimator (TPE) algorithm was used as the sampler [2] with Successive Halving Pruner [3] as the pruner.
The optimisation study consisted of 200 trials distributed across 4 GPUs (thus running 50 trials each) with mean Average Precision (mAP) over the validation set chosen as the metric to maximise. For the purposes of our testing, 4 of the classes from the Kitti dataset were used in the training; these were: Car, Person, Cyclist and Van.
Figure 1 shows the validation mAP for each trial. As you can see, some of the trials proved they were not promising, and thus were pruned before their completion. The final result of the tuning can be seen in Error! Reference source not found. 2; these hyperparameters have been then used for training the baseline network on the KITTI dataset. Across these 4 classes, the model obtained a ~89% mAP on the validation set, and some detection examples are shown in Figures 2 – 4, with different detected
classes.

Figure 1 – Intermediate values graph from hyperparameter optimisation

Table 2 – Final hyperparameters from results of hyperparameter optimisation

Figure 2 – Kitti dataset image analysed using the trained model: Showing Pedestrian and Cyclist classes

Figure 3 – Kitti dataset image analysed using the trained model: Showing Car and Cyclist classes

Figure 4 – Kitti dataset image analysed using the trained model: Showing Car and Van classes
Conclusion
We’ve walked through the results of the hyperparameter tuning, with some details on the configuration of the setup and some visual aids to help understanding previous concepts such as early stopping and pruning, as well as the bounding boxes of detection. We are now setting up the scenes and configuration to generate the Anyverse dataset, so the next blog post will walk through some of the generation process. See you then!
References
[1] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization” (2017). ArXiv Preprint ArXiv:1711.05101.
[2] Optuna “optuna.samplers.TPESampler — Optuna 2.10.0 documentation“, Optuna.readthedocs.io, 2022. [Online]. [Accessed: 26- May- 2022].
[3] Optuna “optuna.pruners.SuccessiveHalvingPruner — Optuna 2.10.0 documentation”,
Optuna.readthedocs.io, 2022. [Online]. [Accessed: 26- May- 2022].
Read more >>>