Get Your HeatWave MySQL Data On-Premises with Replication
MySQL Replication exist for centuries… OK, I’m exaggerating a little bit 🙂 but this feature is as much appreciated as it is useful, because is helping DBAs to easily build scalable and redundant architectures:
MySQL 3.23.15 and up features support for one-way, asynchronous replication, in which one
MySQL 3.23/4.0/4.1 Reference Manual
server acts as the master, while one or more other servers act as slaves
At least from MySQL 3.23… I told you it’s been there for a long time: MySQL first Public Releases
But what does replication do?
Quoting the documentation again (the most recent this time)
Replication enables data from one MySQL database server (the source) to be copied to one or more MySQL database servers (the replicas).
MySQL 8.4 Reference Manual
Replication is asynchronous by default; replicas do not need to be connected permanently to receive updates from a source.
In this article we will see how to replicate data from HeatWave MySQL to a on-premise standalone MySQL Instance.
- Source is HeatWave MySQL 8.4.1
- IP: 137.235.181.51
- Port: 3306
- Replica is MySQL 8.4.1
- IP: 88.65.229.255
- Port: 3306
- Replication user
- ‘rplUser’@’88.65.229.%’
We will use 3 main components:
- HeatWave MySQL as a source
- The asynchronous replication capability of MySQL
- A MySQL instance as a read replica
HeatWave MySQL
HeatWave is a fully-managed database service, powered by the integrated in-memory query accelerator. It is the only cloud-native database service that combines transactions, analytics, machine learning and generative AI services into HeatWave, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication. It also includes HeatWave Lakehouse which allows users to query data stored in object storage in a variety of file formats.
HeatWave is developed, managed, and supported by the MySQL team in Oracle.
HeatWave MySQL accelerate MySQL query performance by orders of magnitude and get real-time analytics on your transactional data. HeatWave MySQL is built on MySQL Enterprise Edition.
The HeatWave MySQL endpoints are not directly accessible from the internet (i.e. private IP). That said, there are different possibilities to connect such as a compute instance, VPN connection, Bastion session, or network load balancer.
In this article I’ll use the latter, which is present on a public subnet, enables you to connect to HeatWave MySQL over the internet.
However please note that it is not recommended, to make a database accessible over the internet, because it is a security risk.
You must restrict the authorized public IP addresses to a single IP address or a small range of IP addresses, and use in-transit encryption.
It is recommended to use a VPN connection.
You are warned!
I assuming your HeatWave MySQL instance is already created. If not, you can use the Console or the command-line interface to create an instance.
Using the console is pretty straightforward, you can find the details in the OCI documentation.
I also assuming that you have already setup your Network Load Balancer. If not I recommend you this great tutorial from my colleague Scott : Connecting to a MySQL HeatWave Database Instance Using an OCI Network Load Balancer.
MySQL Replication setup
Since MySQL replication is a familiar concept, widely covered over the years, let’s get right to the practical details that will help you set up your replication efficiently.
You’ll find all the juicy details in the MySQL 8.4 Reference Manual and some extra here and here.
Because the data stream will be over the Internet, we must, for obvious security reason, encrypted the connection.
More information on setting up Replication to Use Encrypted Connections.
Source configuration
On the source, create the replication user then grant that user the REPLICATION SLAVE
privilege.
If you want to ensure that the source only accepts replicas that connect using encrypted connections (and btw you really want that), use the REQUIRE SSL
option
1 2 3 4 |
-- On the Source -- Create the replication user CREATE USER 'rplUser'@'88.65.229.%' IDENTIFIED BY 'Q{P6@EH$L!YFje^9' REQUIRE SSL; GRANT REPLICATION SLAVE ON *.* TO 'rplUser'@'88.65.229.%'; |
If the HeatWave MySQL instance (the source) contains existing data it is necessary to copy this data to the replica. There are different ways to dump the data from the source database, the recommended way in that context is to use MySQL Shell dump and load utilities.
You will most likely need dumpInstance and the relevant options are dependent of your context, the most important is usually compatibility.
MySQL Shell allows you to store your backup into an object store bucket and load from there, this is very convenient. For this article, the size of my dataset allows me to store it locally, and the only relevant option is dryRun.
More details in the documentation: Instance Dump Utility, Schema Dump Utility, and Table Dump Utility.
1 2 3 4 5 |
$ mysqlsh --js admin@137.235.181.51 JS> util.dumpInstance("/backup/20240714", { dryRun: true }) dryRun enabled, no locks will be acquired and no files will be created. ... |
If the result is satisfactory, then disable dry run:
1 2 |
JS> util.dumpInstance("/backup/20240714") ... |
For your record, below an example of a dump that is stored in an object Storage bucket named MyBck_07 on Oracle Cloud Infrastructure:
1 |
util.dumpInstance("20240714", {osBucketName: "MyBck_07", osNamespace: "abcdefghijkl", ocimds: true }) |
Restore the dump on replica
The dump must be restored on the replica:
1 2 3 4 5 6 |
-- On the Replica \sql SET GLOBAL local_infile=1; util.loadDump("/backup/20240714", {updateGtidSet:"replace", skipBinlog:true}) \sql SET GLOBAL local_infile=0; |
All the option details are available in the documentation. Please note that you can test you command with the dryRun option.
Also for your record, if you want to restore a dump for a bucket, your command will look like:
1 |
util.loadDump("20240714", {osBucketName: "MyBck_07", osNamespace: "abcdefghijkl"}) |
Setting the Source Configuration on the Replica
To set up the replica to communicate with the source for replication, configure the replica with the necessary connection information:
1 2 3 4 |
-- On the Replica CHANGE REPLICATION SOURCE TO SOURCE_HOST = '137.235.181.51', SOURCE_AUTO_POSITION = 1, SOURCE_SSL=1; START REPLICA USER = 'rplUser' PASSWORD = 'Q{P6@EH$L!YFje^9'; |
For a full list of options, see CHANGE REPLICATION SOURCE TO options.
Administration
On the replica, you can manage your replication as you normally do:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
-- Checking Replication Status SQL > SHOW REPLICA STATUS\G *************************** 1. row *************************** Replica_IO_State: Waiting for source to send event Source_Host: 137.235.181.51 Source_User: rplUser Source_Port: 3306 Connect_Retry: 60 Source_Log_File: binary-log.000088 Read_Source_Log_Pos: 378 Relay_Log_File: daz-Latitude-7400-relay-bin.000004 Relay_Log_Pos: 557 Relay_Source_Log_File: binary-log.000088 Replica_IO_Running: Yes Replica_SQL_Running: Yes ... SQL_Remaining_Delay: NULL Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates ... |
Other useful commands on the replica, using performance_schema are:
1 2 3 4 5 6 |
SQL > SELECT * FROM performance_schema.replication_applier_status\G *************************** 1. row *************************** CHANNEL_NAME: SERVICE_STATE: ON REMAINING_DELAY: NULL COUNT_TRANSACTIONS_RETRIES: 0 |
1 2 3 4 5 6 7 8 9 10 |
SQL > SELECT * FROM performance_schema.replication_connection_configuration\G *************************** 1. row *************************** CHANNEL_NAME: HOST: 137.235.181.51 PORT: 3306 USER: rplUser NETWORK_INTERFACE: AUTO_POSITION: 1 SSL_ALLOWED: YES ... |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
SQL > SELECT * FROM performance_schema.replication_connection_status\G *************************** 1. row *************************** CHANNEL_NAME: GROUP_NAME: SOURCE_UUID: 22f79ae9-991d-4b3c-82b9-b81bb2366635 THREAD_ID: 73 SERVICE_STATE: ON COUNT_RECEIVED_HEARTBEATS: 17 LAST_HEARTBEAT_TIMESTAMP: 2024-06-26 10:42:57.615232 RECEIVED_TRANSACTION_SET: 22f79ae9-991d-4b3c-82b9-b81bb2366635:56 LAST_ERROR_NUMBER: 0 LAST_ERROR_MESSAGE: LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000 LAST_QUEUED_TRANSACTION: 22f79ae9-991d-4b3c-82b9-b81bb2366635:56 ... |
On the source, you can check the status of connected replicas:
1 2 3 4 5 6 7 8 9 10 11 |
-- Check the status of connected replicas SQL > SHOW PROCESSLIST \G *************************** 6. row *************************** Id: 547 User: rplUser Host: nlb-mhs-paris-dba.sub03221692542.vcnparisdba.oraclevcn.com:1234 db: NULL Command: Binlog Dump GTID Time: 428 State: Source has sent all binlog to replica; waiting for more updates Info: NULL |
Basic information about the replicas that were started with the --report-host
option and are connected to the source:
1 2 3 4 5 6 7 |
SQL > SHOW REPLICAS \G *************************** 1. row *************************** Server_Id: 1 Host: Replica-8865229255 Port: 3306 Source_Id: 3640816068 Replica_UUID: 46c6de0c-2ff7-4b14-a80a-5dffea08864a |
Other useful commands on the source, using performance_schema are:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SQL > SELECT SERVICE_STATE, HOST, USER, PORT FROM performance_schema.replication_connection_configuration INNER JOIN performance_schema.replication_applier_status USING (CHANNEL_NAME) \G *************************** 1. row *************************** SERVICE_STATE: ON HOST: 137.235.181.51 USER: rplUser PORT: 3306 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
SELECT concat(conn_status.channel_name, ' (', worker_id,')') AS channel, conn_status.service_state AS io_state, applier_status.service_state AS sql_state, format_pico_time(if(GTID_SUBTRACT(LAST_QUEUED_TRANSACTION, LAST_APPLIED_TRANSACTION) = "","0", abs(time_to_sec(if(time_to_sec(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP)=0,0, timediff(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,now()))))) * 1000000000000) latency, format_pico_time((LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP - LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP) * 100000000000) transport_time, format_pico_time((LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP - LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP) * 1000000000000) time_to_relay_log, format_pico_time((LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP - LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP) * 1000000000000) apply_time, conn_status.LAST_QUEUED_TRANSACTION AS last_queued_transaction, applier_status.LAST_APPLIED_TRANSACTION AS last_applied_transaction FROM performance_schema.replication_connection_status AS conn_status JOIN performance_schema.replication_applier_status_by_worker AS applier_status ON applier_status.channel_name = conn_status.channel_name \G *************************** 1. row *************************** channel: (1) io_state: ON sql_state: ON latency: 0 ps transport_time: 0 ps time_to_relay_log: 0 ps apply_time: 0 ps last_queued_transaction: last_applied_transaction: ... |
Peroration
This article guides you through setting up replication between a HeatWave MySQL instance (source) and an on-premise standalone MySQL instance (replica).
It highlights key steps like creating a replication user, securing the connection, and using MySQL Shell utilities for data transfer.
While a Network Load Balancer can be used for the connection, a VPN is recommended for better security.
Resources
- HeatWave
- HeatWave MySQL – Outbound Replication
- MySQL Replication
- CHANGE REPLICATION SOURCE TO Statement
- Setting Up Replication to Use Encrypted Connections
- Checking Replication Status
- HeatWave – A MySQL cloud feature to speed up your queries
- Replicate from MySQL 5.7 to HeatWave MySQL
- MySQL Shell – Instance Dump Utility, Schema Dump Utility, and Table Dump Utility
- MySQL Shell – Dump Loading Utility
Watch my videos on my YouTube channel and subscribe.
Thanks for using MySQL!