Scientific and technical journal

«Automation and Informatization of the fuel and energy complex»

ISSN 0132-2222

Automation and Informatization of the fuel and energy complex
Research of algorithms for constructing decision trees in the inductive learning system

UDC: 681.5
DOI: 10.33285/2782-604X-2023-5(598)-34-44

Authors:

FOMICHEVA OLGA E.1

1 National University of Oil and Gas "Gubkin University", Moscow, Russia

Keywords: data mining, machine learning, predictive analytics, decision trees, inductive learning, decision-making support system, Python sklearn package for Data Science and Machine Learning

Annotation:

Traditional and modern approaches to decision trees construction, which are one of the most effective instruments in the field of data mining and predictive analytics, are considered. The main problems of machine learning such as classification and regression in various subject areas can be solved with the help of decision trees. A decision-making support system, based on rules, which are decision trees obtained by inductive learning method, is proposed. The inductive learning algorithms are investigated, an example of a decision-making system implementation is given, the proposed algorithm effectiveness is compared with the solver from the sklearn module – one of the most widely used Python packages for Data Science and Machine Learning.

Bibliography:

1. Shakhidi A. Derev'ya resheniy: obshchie printsipy. – URL: http://www.basegroup.ru/library/analysis/tree/description/ (data obrashcheniya 30.01.2023).
2. Zhou V. A Simple Explanation of Information Gain and Entropy. – URL: https://victorzhou.com/blog/information-gain/ (data obrashcheniya 30.01.2023).
3. Painsky A., Rosset S. Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance // IEEE Transactions on Pattern Analysis and Machine Intelligence. – 2017. – Vol. 39, Issue 11. – P. 2142–2153. – DOI: 10.1109/TPAMI.2016.2636831
4. Sujan N.I. What is Entropy and why Information gain matter in Decision Trees? – URL: https://medium.com/coinmonks/what-is-entropy-and-why-information-gain-is-matter-4e85d46d2f01 (data obrashcheniya 30.01.2023).
5. An Introduction to Statistical Learning with Applications in R / G. James, D. Witten, T. Hastie, R. Tibshirani. – New York: Springer, 2017. – P. 303–336. – URL: https://www.springer.com/gp/book/9781461471370 (data obrashcheniya 30.01.2023).
6. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. – 2nd Edition. – Springer, 2009. – XXII, 745 p. – URL: http://www-stat.stanford.edu/~tibs/ElemStatLearn (data obrashcheniya 30.01.2023).
7. DOT Language. – URL: https://www.graphviz.org/doc/info/lang.html (data obrashcheniya 30.01.2023).
8. Scikit-learn. Machine Learning in Python. – URL: https://scikit-learn.org (data obrashcheniya 30.01.2023).
9. Vizualizatsiya dereva resheniy. – URL: https://russianblogs.com/article/89531287808/ (data obrashcheniya 30.01.2023).