The subject of research is the development of a service for searching through user files for a given set of keywords with parameters. The available approaches to solving such a problem were studied and the most relevant one was selected. The service searches inside files with text content in order to automate the process of selecting the desired files among the entire set. Its work is based on Porter's algorithm and uses a text stemming approach in order to obtain more accurate results. Searches for the stem of a word, taking morphology into account. Performing a morphological parsing of a word, a base is found common for all its grammatical forms, cutting off suffixes and endings. As a result, the algorithm of the service allows you to search not just for the given keywords, but also takes into account their word forms, and also searches for several sets of keywords at once, each set is analyzed separately. In addition, you can specify ranges of numeric values to search for. A feature of the service is that sets of keywords are searched together in nearby paragraphs within the range of -20 to +20 words from each other, thus taking into account the context of their appearance in the text. The service ranks the found documents according to the quality of their matching search criteria. Files in basic formats are processed: doc, xls, pdf, txt. The service operates on a Linux platform under the control of the Apache web server. Free software tools were used for development.
Keywords: search engine, document analysis, stemming, Porter's algorithm, word forms, morphology, arithmetic mean of percent, web service
The emergence of digital X-ray machines and the development of cloud storage technology has led to the accumulation of a huge number of medical X-rays, in particular, chest x-rays - fluorography. The accumulated image bases after high-quality preprocessing can be used to train the deep convolutional neural networks that have received the most development in recent years, the trained network performs preliminary binary classification of the incoming flow of images and can be used as a radiologist assistant. For this purpose, it is necessary to adequately train the neural net-work to minimize errors of the first and second kind. A possible approach to improving the efficiency of neural networks, reducing the computational complexity and quality of image classification by the criteria is the use of auxiliary approaches of image preprocessing and preliminary entropy calculation. The article presents an algorithm for the X-ray image preprocessing, its division into fragments and the calculation of the entropy of individual fragments. During the preprocessing, the region of interest with lungs and the spine is selected from the entire image, constituting about 30-40% of the entire image, then the image is divided into a fragment matrix and the entropy of individual fragments is calculated using the Shannon formula, by analyzing individual pixels. By determining the frequency of each of the 255 colors, the total entropy is calculated. The use of entropy for detecting pathologies is based on the assumption of dif-ferences in its values for individual fragments and the overall picture of its distribution between images with the norm and pathologies. Statistical indicators are analyzed: standard deviation of error, variance.
Keywords: image entropy, fragments, deep convolutional neural network, machine learning, x-rays images, computational experiment, matrix of elements, image preprocessing, statistical analysis, binary classification