Get Your HeatWave MySQL Data On-Premises with Replication
MySQL Replication exist for centuries… OK, I’m exaggerating a little bit 🙂 but this feature is as much appreciated as it is useful, because is helping DBAs to easily build scalable and redundant architectures:
MySQL 3.23.15 and up features support for one-way, asynchronous replication, in which one
MySQL 3.23/4.0/4.1 Reference Manual
server acts as the master, while one or more other servers act as slaves
At least from MySQL 3.23… I told you it’s been there for a long time: MySQL first Public Releases
But what does replication do?
Quoting the documentation again (the most recent this time)
Replication enables data from one MySQL database server (the source) to be copied to one or more MySQL database servers (the replicas).
MySQL 8.4 Reference Manual
Replication is asynchronous by default; replicas do not need to be connected permanently to receive updates from a source.
In this article we will see how to replicate data from HeatWave MySQL to a on-premise standalone MySQL Instance.
- Source is HeatWave MySQL 8.4.1
- IP: 137.235.181.51
- Port: 3306
- Replica is MySQL 8.4.1
- IP: 88.65.229.255
- Port: 3306
- Replication user
- ‘rplUser’@’88.65.229.%’
We will use 3 main components:
- HeatWave MySQL as a source
- The asynchronous replication capability of MySQL
- A MySQL instance as a read replica
HeatWave MySQL
HeatWave is a fully-managed database service, powered by the integrated in-memory query accelerator. It is the only cloud-native database service that combines transactions, analytics, machine learning and generative AI services into HeatWave, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication. It also includes HeatWave Lakehouse which allows users to query data stored in object storage in a variety of file formats.
HeatWave is developed, managed, and supported by the MySQL team in Oracle.
HeatWave MySQL accelerate MySQL query performance by orders of magnitude and get real-time analytics on your transactional data. HeatWave MySQL is built on MySQL Enterprise Edition.
The HeatWave MySQL endpoints are not directly accessible from the internet (i.e. private IP). That said, there are different possibilities to connect such as a compute instance, VPN connection, Bastion session, or network load balancer.
In this article I’ll use the latter, which is present on a public subnet, enables you to connect to HeatWave MySQL over the internet.
However please note that it is not recommended, to make a database accessible over the internet, because it is a security risk.
You must restrict the authorized public IP addresses to a single IP address or a small range of IP addresses, and use in-transit encryption.
It is recommended to use a VPN connection.
You are warned!
I assuming your HeatWave MySQL instance is already created. If not, you can use the Console or the command-line interface to create an instance.
Using the console is pretty straightforward, you can find the details in the OCI documentation.
I also assuming that you have already setup your Network Load Balancer. If not I recommend you this great tutorial from my colleague Scott : Connecting to a MySQL HeatWave Database Instance Using an OCI Network Load Balancer.
MySQL Replication setup
Since MySQL replication is a familiar concept, widely covered over the years, let’s get right to the practical details that will help you set up your replication efficiently.
You’ll find all the juicy details in the MySQL 8.4 Reference Manual and some extra here and here.
Because the data stream will be over the Internet, we must, for obvious security reason, encrypted the connection.
More information on setting up Replication to Use Encrypted Connections.
Source configuration
On the source, create the replication user then grant that user the REPLICATION SLAVE
privilege.
If you want to ensure that the source only accepts replicas that connect using encrypted connections (and btw you really want that), use the REQUIRE SSL
option
-- On the Source
-- Create the replication user
CREATE USER 'rplUser'@'88.65.229.%' IDENTIFIED BY 'Q{P6@EH$L!YFje^9' REQUIRE SSL;
GRANT REPLICATION SLAVE ON *.* TO 'rplUser'@'88.65.229.%';
If the HeatWave MySQL instance (the source) contains existing data it is necessary to copy this data to the replica. There are different ways to dump the data from the source database, the recommended way in that context is to use MySQL Shell dump and load utilities.
You will most likely need dumpInstance and the relevant options are dependent of your context, the most important is usually compatibility.
MySQL Shell allows you to store your backup into an object store bucket and load from there, this is very convenient. For this article, the size of my dataset allows me to store it locally, and the only relevant option is dryRun.
More details in the documentation: Instance Dump Utility, Schema Dump Utility, and Table Dump Utility.
$ mysqlsh --js admin@137.235.181.51
JS> util.dumpInstance("/backup/20240714", { dryRun: true })
dryRun enabled, no locks will be acquired and no files will be created.
...
If the result is satisfactory, then disable dry run:
JS> util.dumpInstance("/backup/20240714")
...
For your record, below an example of a dump that is stored in an object Storage bucket named MyBck_07 on Oracle Cloud Infrastructure:
util.dumpInstance("20240714", {osBucketName: "MyBck_07", osNamespace: "abcdefghijkl", ocimds: true })
Restore the dump on replica
The dump must be restored on the replica:
-- On the Replica
\sql SET GLOBAL local_infile=1;
util.loadDump("/backup/20240714", {updateGtidSet:"replace", skipBinlog:true})
\sql SET GLOBAL local_infile=0;
All the option details are available in the documentation. Please note that you can test you command with the dryRun option.
Also for your record, if you want to restore a dump for a bucket, your command will look like:
util.loadDump("20240714", {osBucketName: "MyBck_07", osNamespace: "abcdefghijkl"})
Setting the Source Configuration on the Replica
To set up the replica to communicate with the source for replication, configure the replica with the necessary connection information:
-- On the Replica
CHANGE REPLICATION SOURCE TO SOURCE_HOST = '137.235.181.51', SOURCE_AUTO_POSITION = 1, SOURCE_SSL=1;
START REPLICA USER = 'rplUser' PASSWORD = 'Q{P6@EH$L!YFje^9';
For a full list of options, see CHANGE REPLICATION SOURCE TO options.
Administration
On the replica, you can manage your replication as you normally do:
-- Checking Replication Status
SQL > SHOW REPLICA STATUS\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 137.235.181.51
Source_User: rplUser
Source_Port: 3306
Connect_Retry: 60
Source_Log_File: binary-log.000088
Read_Source_Log_Pos: 378
Relay_Log_File: daz-Latitude-7400-relay-bin.000004
Relay_Log_Pos: 557
Relay_Source_Log_File: binary-log.000088
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
...
SQL_Remaining_Delay: NULL
Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
...
Other useful commands on the replica, using performance_schema are:
SQL > SELECT * FROM performance_schema.replication_applier_status\G
*************************** 1. row ***************************
CHANNEL_NAME:
SERVICE_STATE: ON
REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0
SQL > SELECT * FROM performance_schema.replication_connection_configuration\G
*************************** 1. row ***************************
CHANNEL_NAME:
HOST: 137.235.181.51
PORT: 3306
USER: rplUser
NETWORK_INTERFACE:
AUTO_POSITION: 1
SSL_ALLOWED: YES
...
SQL > SELECT * FROM performance_schema.replication_connection_status\G
*************************** 1. row ***************************
CHANNEL_NAME:
GROUP_NAME:
SOURCE_UUID: 22f79ae9-991d-4b3c-82b9-b81bb2366635
THREAD_ID: 73
SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 17
LAST_HEARTBEAT_TIMESTAMP: 2024-06-26 10:42:57.615232
RECEIVED_TRANSACTION_SET: 22f79ae9-991d-4b3c-82b9-b81bb2366635:56
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_QUEUED_TRANSACTION: 22f79ae9-991d-4b3c-82b9-b81bb2366635:56
...
On the source, you can check the status of connected replicas:
-- Check the status of connected replicas
SQL > SHOW PROCESSLIST \G
*************************** 6. row ***************************
Id: 547
User: rplUser
Host: nlb-mhs-paris-dba.sub03221692542.vcnparisdba.oraclevcn.com:1234
db: NULL
Command: Binlog Dump GTID
Time: 428
State: Source has sent all binlog to replica; waiting for more updates
Info: NULL
Basic information about the replicas that were started with the --report-host
option and are connected to the source:
SQL > SHOW REPLICAS \G
*************************** 1. row ***************************
Server_Id: 1
Host: Replica-8865229255
Port: 3306
Source_Id: 3640816068
Replica_UUID: 46c6de0c-2ff7-4b14-a80a-5dffea08864a
Other useful commands on the source, using performance_schema are:
SQL > SELECT
SERVICE_STATE,
HOST,
USER,
PORT
FROM performance_schema.replication_connection_configuration
INNER JOIN performance_schema.replication_applier_status
USING (CHANNEL_NAME) \G
*************************** 1. row ***************************
SERVICE_STATE: ON
HOST: 137.235.181.51
USER: rplUser
PORT: 3306
SELECT
concat(conn_status.channel_name, ' (', worker_id,')') AS channel,
conn_status.service_state AS io_state,
applier_status.service_state AS sql_state,
format_pico_time(if(GTID_SUBTRACT(LAST_QUEUED_TRANSACTION, LAST_APPLIED_TRANSACTION) = "","0",
abs(time_to_sec(if(time_to_sec(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP)=0,0,
timediff(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,now()))))) * 1000000000000) latency,
format_pico_time((LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP -
LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP) * 100000000000) transport_time,
format_pico_time((LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP -
LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP) * 1000000000000) time_to_relay_log,
format_pico_time((LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP -
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP) * 1000000000000) apply_time,
conn_status.LAST_QUEUED_TRANSACTION AS last_queued_transaction,
applier_status.LAST_APPLIED_TRANSACTION AS last_applied_transaction
FROM
performance_schema.replication_connection_status AS conn_status JOIN performance_schema.replication_applier_status_by_worker AS applier_status
ON applier_status.channel_name = conn_status.channel_name
\G
*************************** 1. row ***************************
channel: (1)
io_state: ON
sql_state: ON
latency: 0 ps
transport_time: 0 ps
time_to_relay_log: 0 ps
apply_time: 0 ps
last_queued_transaction:
last_applied_transaction:
...
Peroration
This article guides you through setting up replication between a HeatWave MySQL instance (source) and an on-premise standalone MySQL instance (replica).
It highlights key steps like creating a replication user, securing the connection, and using MySQL Shell utilities for data transfer.
While a Network Load Balancer can be used for the connection, a VPN is recommended for better security.
Resources
- HeatWave
- HeatWave MySQL – Outbound Replication
- MySQL Replication
- CHANGE REPLICATION SOURCE TO Statement
- Setting Up Replication to Use Encrypted Connections
- Checking Replication Status
- HeatWave – A MySQL cloud feature to speed up your queries
- Replicate from MySQL 5.7 to HeatWave MySQL
- MySQL Shell – Instance Dump Utility, Schema Dump Utility, and Table Dump Utility
- MySQL Shell – Dump Loading Utility
Watch my videos on my YouTube channel and subscribe.
Thanks for using HeatWave & MySQL!
Cloud Solutions Architect at Oracle
MySQL Geek, author, blogger and speaker
I’m an insatiable hunger of learning.
—–
Blog: www.dasini.net/blog/en/
Twitter: https://twitter.com/freshdaz
SlideShare: www.slideshare.net/freshdaz
Youtube: https://www.youtube.com/channel/UC12TulyJsJZHoCmby3Nm3WQ
—–