Engineering Journal of Don

Search for patent analogues based on a comparison of key phrases
- Abstract
- pdf (rus)
This study describes approaches to automating full-text keyword search in the field of patent information. Automating the search by keywords (n-grams) is a significantly more difficult task than searching by individual words, in addition, it requires morphological and syntactic analysis of the text. To achieve this goal, the following tasks were solved: (a) the full-text search systems were analyzed: Apache Solr, ElasticSearch and ClickHouse; (b) a comparison of the architectures and basic capabilities of each system was carried out; (c) search results in Apache Solr, ElasticSearch and ClickHouse were obtained on the same dataset. The following conclusions were drawn: (a) all the systems considered perform full-text keyword search; (b) Apache Solr is the system with the highest performance, it also has very convenient functions; (b) ElasticSearch has a fast and powerful architecture; (c) ClickHouse has a high data processing speed.

Keywords: search, keyphrases, patent, Apache Solr, Elasticsearch, ClickHouse
Visualization and comparison of semantic trees reflecting the component structure of the patented device
- Abstract
- pdf (rus)
This paper describes approaches to visualization and comparison of semantic trees reflecting the component structure of the patented device and the connections between them using graph databases. DBMS data uses graph structures to store, process, and represent data. The main elements of a graph database are nodes and edges, which, within the framework of the task, model entities of 3 types (SYSTEM, COMPONENT, ATTRIBUTE) and 5 types of connections (PART-OF, LOCATED-AT, CONNECTED-WITH, ATTRIBUTE-FOR, IN-MANNER-OF). According to the results of the study, it can be stated that Neo4j demonstrates the best possibilities for graph visualization; ArangoDB, despite correctly entered queries, performs incomplete visualization; AllegroGraph showed difficult work with code, difficult configuration of graph tree visualization. 3 algorithms for comparing graph representations of information have been tested: Graph Edit Distance, Topological Comparison, Subgraph Isomorphism. The algorithms are implemented in python, compares 2 graph trees, displays visualization and analysis of common graph structures and differences.

Keywords: semantic tree, component structure, patent, graph databases, Neo4j, AllegroGraph, ArangoDB
Automation of recognition of radio listeners' requests
- Abstract
- pdf (rus)
The article describes the automation of the audio recording recognition process in order to identify the ordered song on the radio station. The Golos Russian speech recognition model from the SberDevices was used. An algorithm for correcting the text obtained as a result of audio recording analysis using the Golos model based on the Levenshtein distance method has been developed. For recognized requests from radio listeners, interaction with the DIGISPOT II database is organized (formation and execution of queries to search for artists and their songs).

Keywords: speech recognition, Golos, Digispot II
Analysis of images of mathematical and chemical formulas from patent documents
- Abstract
- pdf (rus)
Currently, patent documents contain graphic images of device drawings, graphs, chemical and mathematical formulas, and formulas often need to be recognized and brought to a unified standard. In this work, the analysis of graphic images extracted from the descriptions of patents of the FIPS of Rospatent is carried out. Thematic filtering of mathematical and chemical formulas contained in patent documents and their recognition is provided. The theoretical value lies in the developed algorithms for parsing patents in the Yandex system.Patents; recognition of chemical and mathematical formulas among graphic patent images; translation of graphic images of chemical formulas into SMILES format; conversion of graphic images of mathematical formulas into LaTeX format. The practical significance of the work lies in the developed software module for analyzing graphic images from patent documents. The field of application of the developed system is the study of patents and the reduction of graphic images to a unified standard for solving patent search problems.

Keywords: patent, image, mathematical formula, chemical formula, LaTeX, SMILES
The technique of analyzing video files for detecting the presence of persons and attractions, using recognition by key, non-repeating frames
- Tozik A.S.
- Korobkin D.M.
- Abstract
- pdf (rus)
In this paper, we consider a technique for automatic analysis of video files for detecting the presence of persons and attractions, using recognition by key, non-repeating frames, based on algorithms for their extraction. Recognition of landmarks and faces only by keyframes will significantly reduce computational costs, as well as avoid overflowing with repetitive information. The effectiveness of the proposed technique is evaluated in terms of accuracy and speed on a set of test videos.

Keywords: keyframe, recognition, computer vision, algorithm, video
Formation of a visualized representation of the patent landscape
- Abstract
- pdf (rus)
Methods and technologies for solving the problem of patent landscape visualization based on cluster analysis of the patent array are considered and used. Algorithms for downloading patent archives, parsing patent documents, clustering patents and visualizing the patent landscape have been developed. A software for clustering patent documents based on the Latent Dirichlet allocation model and visualization of the patent landscape on clustering data using the gensim, PySpark, and sklearn libraries has been implemented. The implemented software has been tested on patents issued by the US Patent and Trademark Office. The accuracy of classification of patents by category has been achieved - 84%.

Keywords: patents, information extraction, clustering, patent landscape, innovation potential
Development of a software module for searching for patent analogues
- Abstract
- pdf (rus)
With the development of industry and science, the size of the patent base is growing, as well as the number of patent applications received by the agencies regulating the issue of patents is growing. Each patent application must be checked for the uniqueness of the patented technology, for this, the patent office experts need to search the patent database and find analog patents. In the absence of analog patents, this technology can be considered unique and accepted for patenting. Since the patent database of various departments can number tens of millions of patents, such a patent search and evaluation of the uniqueness of the patented technology can take a very long time. The existing systems do not meet all the requirements and do not have the full necessary functionality. This article describes the development of an automated system for searching for analog patents in the patent array.

Keywords: patent, database, search, patent-analog, Hadoop, Solr, Django, Python, Haystack, HDFS

Search for patent analogues based on a comparison of key phrases

Visualization and comparison of semantic trees reflecting the component structure of the patented device

Automation of recognition of radio listeners' requests

Analysis of images of mathematical and chemical formulas from patent documents

The technique of analyzing video files for detecting the presence of persons and attractions, using recognition by key, non-repeating frames

Formation of a visualized representation of the patent landscape

Development of a software module for searching for patent analogues

News

News archive