Optical character recognition by machine learning in application when designing domain
UDC: 004.896
DOI: -
Authors:
ANDREEVA NATALIA N.
1,
GONCHARENKO NIKITA A.
1,
DIAS MIKHAIL A.
1
1 National University of Oil and Gas "Gubkin University", Moscow, Russia
Keywords: Optical Character Recognition – OCR, IDP, engineering design, capital construction objects, CAD, neural networks, machine learning
Annotation:
Optical Character Recognition (OCR) technologies have long been known and are widely used in the modern world in industries with predominantly text-based documents. The widespread use of the technology is hampered by the limitations of its operation. OCR technology cannot be used to identify illustrations and mixed text, which consequently prevents it from being used for a wide range of tasks. Together with the development of neural networks and machine learning technologies OCR technologies get a new round of development, which eliminates the existing shortcomings and is characterized by qualitatively new opportunities. This development is represented by Intelligent Document Processing (IDP) technology. IDP can be applied in engineering design works, allowing increase the work efficiency with printed documentation on the existing capital construction objects, in particular, during re-engineering works. Besides, on the basis of IDP technology it is possible to realize integration of drawings and specialized software. Despite the fact that IDP technology is just beginning to develop, at this stage a potential algorithm of such software is presented. In addition, the main players of the IDP market, which will develop dynamically in the coming years, are being formed now.
Bibliography:
1. Andreeva N.N., Kononov V.V. Sovmestnaya rabota IT-kompanii i tekhnicheskogo vuza po podgotovke spetsialistov // Avtomatizatsiya i informatizatsiya TEK. – 2023. – № 11(604). – S. 41–50. – DOI: 10.33285/2782-604X-2023-11(604)-41-50
2. Cutting G.A., Cutting-Decelle A.-F. Intelligent Document Processing – Methods and Tools in the Real World. – 2021. – URL: https://arxiv.org/ftp/arxiv/papers/2112/2112.14070.pdf (data obrashcheniya 28.03.2024).
3. Rossiya bez FineReader: rynok OCR za god upal na chetvert'. – 2023. – URL: https://www.cnews.ru/news/top/2023-07-18_rossiya_bez_finereader_rynok_ocr (data obrashcheniya 08.11.2023).