SHARE
CHAPTER 6
Author
Hamid Serry, WMG Graduate Trainee, University of Warwick
Hyperparameters
Hyperparameters are fixed variables which are used, within machine learning, to control the training process and determine the quality and efficiency of the resulting network. They are dictated before training has begun, so scheduled values such as changes to learning rate over the epochs could be set in advance, and cannot be manually changed during the training, so selecting effective parameters is an important task.
Our main focus are the parameters relating to the training algorithm, a sample of which is as follows:
- Number of Epochs: Defines the maximum number of times the dataset is shown to the model during its training cycle.
- Batch Size: The number of samples from the dataset supplied to the network at a time, before which the model parameters are updated.
- Loss Function: Quantitatively defines the performance of the model over the entire training dataset [5].
- Solver: Performs the optimisations and creates the network for training, managing the parameter updates and evaluates the model. There are many different solvers which can be selected based on the specific model setup, and the pros of each solver.
- Learning Rate: The model parameters and weights are updated in response to the estimated error of the model, therefore controlling the speed of learning,, aiming to minimise the overall loss of the model. [1]
- Learning Rate Step Size: The number of epochs after which the learning rate is adjusted.
- Weight Decay: A multiplier used to “improve the generalization performance of neural networks by encouraging the weights to be small in magnitude” [2]. This is achieved through a number ranging between 0 and 1 to multiply the sum of the squared weights by.
Hyperparameter Tuning
Hyper parameter tuning fully trains a model under specific hyper parameters, and then repeats this process, varying the hyperparameter values in order to optimise performance of the network with relation to a main performance metric (such as mAP in this case). This tracks the performance of each hyperparameter combination and is used to select the final configuration of hyperparameters with the best mAP.
Effective tuning processes will remove parameters which have little to no effect on the overall performance of the model, as well as end a cycle early when it is clearly less successful than other trained models. At the end of the tuning, the best parameters can be selected for the model deployment. There are many methods such as manual tuning [3] and other complicated methods such as Bayesian Optimisation [4].
As this process is equivalent to repeating a training procedure many times, in order to find the most efficient solution, it requires a much larger processing time than simply one training cycle. Although this process can be a very expensive endeavour, the results can show a very significant difference between distinct hyperparameter setups, leading to some organisations keeping their parameters
secret, and opened the doors for hyper parameter theft [5].
Conclusion
We have discussed some of the various hyperparameters used within machine learning, and how they can be altered to increase performance of a model. Implementing the optimisation to the current selected neural network will allow us to optimise performance, and achieve a more precise model to test the generated Anyverse dataset on.
References
[1] D. Yimin, “The Impact of Learning Rate Decay and Periodical Learning Rate Restart on Artificial Neural Network”, 2021 2nd International Conference on Artificial Intelligence in Electronics Engineering (AIEE 2021). Association for Computing Machinery, New York, NY, USA, 6–14. https://doi.org/10.1145/3460268.3460270
[2] G. Zhang, C. Wang, B. Xu and R. Grosse, “Three Mechanisms of Weight Decay Regularization”, 2018. arXiv preprint arXiv:1810.12281.
[5] B. Wang and N. Z. Gong, “Stealing Hyperparameters in Machine Learning,” 2018 IEEE Symposium on Security and Privacy (SP), 2018, pp. 36-52,
Read more >>>
SHARE