Stochastic Schemata Exploiter-Based Optimization of Hyper-parameters for XGBoost
DOI:
https://doi.org/10.24423/cames.2024.1296Keywords:
evolutionary computation, Stochastic Schemata Exploiter, hyper-parameter optimization, XGBoostAbstract
XGBoost is well-known as an open-source software library that provides a regularizing gradient boosting framework. Although it is widely used in the machine learning field, its performance depends on the determination of hyper-parameters. This study focuses on the optimization algorithm for hyper-parameters of XGBoost by using Stochastic Schemata Exploiter (SSE). SSE, which is one of Evolutionary Algorithms, is successfully applied to combinatorial optimization problems. SSE is applied for optimizing hyper-parameters of XGBoost in this study. The original SSE algorithm is modified for hyper-parameter optimization. When comparing SSE with a simple Genetic Algorithm, there are two interesting features: quick convergence and a small number of control parameters. The proposed algorithm is compared with other hyper-parameter optimization algorithms such as Gradient Boosted Regression Trees (GBRT), Tree-structured Parzen Estimator (TPE), Covariance Matrix Adaptation Evolution Strategy (CMA-ES), and Random Search in order to confirm its validity. The numerical results show that SSE has a good convergence property, even with fewer control parameters than other methods.
References
2. J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, Journal of Machine Learning Research, 13(10): 281–305, 2012.
3. R.G. Mantovani, A.L. Rossi, J. Vanschoren, B. Bischl, A. C. De Carvalho, Effectiveness of random search in SVM hyper-parameter tuning, [in:] Proceedings of 2015 International Joint Conference on Neural Networks, Killarney, Ireland, pp. 1–8, 2015, doi: 10.1109/IJCNN.2015.7280664.
4. A.C. Florea, R. Andonie, Weighted random search for hyperparameter optimization, International Journal of Computers, Communications and Control, 14(2): 154–169, 2019.
5. Y. Xia, C. Liu, Y.Y. Li, N. Liu, A boosted decision tree approach using Bayesian hyperparameter optimization for credit scoring, Expert Systems with Applications, 78: 225–241, 2017, doi: 10.1016/j.eswa.2017.02.017.
6. J. Snoek, H. Larochelle, R.P. Adams, Practical Bayesian optimization of machine learning algorithms, Advances in Neural Information Processing Systems, 25: 2960–2968, 2012.
7. M. Feurer, F. Hutter, Hyperparameter optimization, [in:] F. Hutter, L. Kotthoff, J. Vanschoren [Eds.], Automated Machine Learning: Methods, Systems, Challenges, pp. 3–33, Springer, Cham, 2019, doi: 10.1007/978-3-030-05318-5_1.
8. N. Hansen, S.D. Müller, P. Koumoutsakos, Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES), Evolutionary Computation, 11(1): 1–18, 2003, doi: 10.1162/106365603321828970.
9. F. Friedrichs, C. Igel, Evolutionary tuning of multiple SVM parameters, Neurocomputing, 64: 107–117, 2005, doi: 10.1016/j.neucom.2004.11.022.
10. I. Loshchilov, F. Hutter, CMA-ES for hyperparameter optimization of deep neural networks, 2016, arXiv: 1604.07269v1.
11. A.N. Aizawa, Evolving SSE: A stochastic schemata exploiter, [in:] Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, Orlando, FL, USA, Vol. 1, pp. 525–529, 1994, doi: 10.1109/ICEC.1994.349895.
12. A.N. Aizawa, Evolving SSE: A new population-oriented search scheme based on schemata processing, Systems and Computers in Japan, 27(2): 41–52, 1996, doi: 10.1002/scj.4690270204.
13. T. Maruyama, E. Kita, Extension of stochastic schemata exploiter to real-valued problem, The Special Interest Group MPS Technical Reports of Information Processing Society of Japan, 61: 17–20, 2006.
14. T. Maruyama, E. Kita, Investigation of real-valued stochastic schemata exploiter, Information Processing Society of Japan Transactions on Mathematical Modeling and its Applications, 48: 10–22, 2007.
15. L. Breiman, Arcing the Edge, Technical Report 486, Statistics Department, University of California, Berkeley, CA, 1997.
16. J.H. Friedman, Greedy function approximation: A gradient boosting machine, Annals of Statistics, 29(5): 1189–1232, 2001.
17. J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization, [in:] Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, pp. 2546–2554, 2011.
18. T. Head, M. Kumar, H. Nahrstaedt, G. Louppe, I. Shcherbatyi, scikitoptimize/scikitoptimize (v0.8.1), 2020, https://zenodo.org/records/4014775.
19. T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next-generation hyperparameter optimization framework, [in:] Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631, 2019, doi: 10.1145/3292500.3330701.
20. M. Feureret al., OpenML-Python: An extensible Python API for OpenML, The Journal of Machine Learning Research, 22(1): 4573–4577, 2019.
21. A.G. Koru, D. Zhang, H. Liu, Modeling the effect of size on defect proneness for opensource software, [in:] 29th International Conference on Software Engineering (ICSE’07 Companion), Minneapolis, MN, USA, pp. 115–124, 2007, doi: 10.1109/ICSECOMPANION.2007.54.
22. J. Vanschoren, J.N. van Rijn, B. Bischl, L. Torgo, OpenML: Networked science in machine learning, SIGKDD Explorations, 15(2): 49–60, 2013, doi: 10.1145/2641190.2641198.
23. W.J. Nash, T.L. Sellers, S.R. Talbot, A.J. Cawthorn, W.B. Ford, The population biology of abalone (Haliotis species) in Tasmania. I. Blacklip abalone (H. rubra) from the North Coast and Islands of Bass Strait, Technical Report, No. 48, Sea Fisheries Division, Department of Primary Industry and Fisheries, Tasmania, 1994.
24. P. Cortez, A. Cerdeira, F. Almeida, T. Matos, J. Reis, Modeling wine preferences by data mining from physicochemical properties, Decision Support Systems, 47(4): 547–553, 2009.