Why Do You Need to Separate Your Data?
You may choose to opt for online and offline data separation to ensure the performance and capacity of the operational database. Additionally, this will help you to reduce the overall data storage cost and avoid the impact of offline data usage on online data utilization.
You can execute the separation by transferring data generated during the early days of the online business system to an offline database for storage. Furthermore, the data stored offline could be critical for business analysis and future planning; therefore, you still would want to access it from time to time.
This article will introduce a practical method through which you can separate online and offline data using HybridDB for MySQL.
Core Tasks
The task center aggregates the group's entire workflow and provides a unified interface to manage task assignment and tracking.
Figure 1.
Challenges for Data Separation
• The data size of a single table is often quite large, with more than 40 million records.
• You will need to run extensive queries against multiple tables to meet business needs. As the frequency and complexity of queries increase the SQL slows down as well.
How Can You Transform the System Architecture?
Separating online and offline data helps to avoid different sets of data affecting each other at the data access layer. However, you must synchronize the data in real time to ensure high stability. Make sure you are regularly clearing offline data in the online tables to ensure that the online storage is used properly. Also, ensure to secure the closed loop process from data generation to data elimination.
It is also essential to make sure that the offline data must have enough storage space as you cannot delete the old data. Therefore, you must use proper storage mechanisms and ensure that the system has plenty of storage space.
Additionally, it is important to maintain data query efficiency when dealing with massive amounts of offline data. You cannot afford for the queries to be slow-moving.
After experimenting with various technologies, I would recommend HybridDB for MySQL to separate online and offline data.
Separating Online and Offline Data with HybridDB for MySQL
HybridDB for MySQL is a relational database that supports both OLTP and OLAP (HTAP). HTAP refers to Hybrid Transaction/Analytical Processing. It includes the integration of Transaction Processing (TP) and Analytical Processing to achieve the real-time data transaction and analysis.
Three Features of HybridDB that You Should Know
1.HybridDB for MySQL uses only one copy of data for OLTP and OLAP processing. This solves the need to replicate data multiple times for separate business transactions and data analysis. Additionally, it significantly reduces the cost of data storage.
2.Since HybridDB for MySQL only uses one copy of the data, it saves you from the trouble of loading data between the operational database and the data warehouse. As a result, it significantly shortens the delay and data analysis time and makes real-time analysis decisions possible.
3.It is compatible with MySQL syntax and functions and supports common Oracle analytic functions. Moreover, HybridDB is 100% compatible with TPC-H and TPC-DS testing standards. These factors allow it to reduce development, data migration, and maintenance costs.
You can understand the reason for preferring HybridDB for MySQL by going through the advantages of HybridDB mentioned below.
Advantages of HybridDB
• Supports logs, tracks, and historical data storage.
• Solves the problem of crowded online data
• Addresses the problem of offline data affecting online data when you share the same database
• The solution is mature and widely used in Alibaba Cloud
• Almost completely compliant with MySQL
• Supports synchronization through DTS and ensures the reliability and timeliness of data
• You can expand the storage through linear scaling.
Performance Comparison of HybridDB
Below is a table that illustrates the comparisons of queries against reference data.
Table 1. Performance Comparison of HybridDB.
You will notice that MySQL has a faster response time than HybridDB for SQL queries running on small volumes of data. However, as the data size increases it lags far behind HybridDB as the data size increases. Therefore, I recommend MySQL for SQL queries on smaller tables, and HybridDB for MySQL for queries on tables with more than ten thousand records.
Main Application Scenarios for HybridDB
• Storing and Analyzing Big Data: HybridDB for MySQL supports hybrid processing of OLTP and OLAP based on the same copy of data. It removes the hassle of data replication, transferring, loading, and storing massive amounts of data between the operational database and offline data warehouse.
• Supports IoT Development: The IoT industry features an enormous number of data collection points, large data size, and high concurrency demands. Therefore, IoT applications need to use sharding and relevant data distribution technologies to fully support application traffic. The distributed architecture of HybridDB for MySQL shields the sharding details, only providing you with a database connection address and the corresponding logic library. HybridDB for MySQL thus minimizes the development and O&M costs of IoT.
• Stores Historical Data: HybridDB for MySQL can store massive amounts of historical data (up to petabytes) and further save storage space by compressing the data. It also allows you to use ordinary hard drives, which significantly reduces the cost of storing data.
Conclusion
I hope that this blog helped you understand why you should use HybridDB for MySQL to separate online and offline data.
Note: At the time of writing, Alibaba Cloud has not made HybridDB for MySQL available for the international market. Instead, you can have a look at its counterpart, HybridDB for PostgreSQL. If you are interested in discovering the product in advance, visit its product page on the China Mainland portal at https://www.aliyun.com/product/petadata.