Engineering Journal of Don

Moving from a university data warehouse to a lake: models and methods of big data processing
- Abstract
- pdf (rus)
The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management. The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management.

Keywords: data warehouse, data lake, big data, cloud storage, unstructured data, semi-structured data
Improving the efficiency of working with databases in PHP based on the use of PDO
- Abstract
- pdf (rus)
PHP Data Objects (PDOs) represent a significant advancement in PHP application development by providing a universal approach to interacting with database management systems (DBMSs). This article opens with an introduction describing the need for PDOs as of PHP 5.1, which allows PHP developers to interact with different databases through a single interface, minimising the effort involved in portability and code maintenance. It discusses how PDO can improve security by supporting prepared queries, which is a defence against SQL injection. The main part of the paper analyses the key advantages of PDO, such as its versatility in connecting to multiple databases (e.g. MySQL, PostgreSQL, SQLite), the ability to use prepared queries to enhance security, improved error handling through exceptions, transactional support for data integrity, and the ease of learning the PDO API even for beginners. Practical examples are provided, including preparing and executing SQL queries, setting attributes via the setAttribute method, and performing operations in transactions, emphasising the flexibility and robustness of PDO. In addition, the paper discusses best practices for using PDO in complex and high-volume projects, such as using prepared queries for bulk data insertion, query optimisation and stream processing for efficient handling of large amounts of data. The conclusion section characterises PDO as the preferred tool for modern web applications, offering a combination of security, performance and code quality enhancement. The authors also suggest directions for future research regarding security test automation and the impact of different data models on application performance.

Keywords: PHP, PDO, databases, DBMS, security, prepared queries, transactions, programming

Moving from a university data warehouse to a lake: models and methods of big data processing

Improving the efficiency of working with databases in PHP based on the use of PDO

News

News archive