Get Your HeatWave MySQL Data On-Premises with Replication

July 16, 2024

Tags: Heatwave

MySQL Replication exist for centuries… OK, I’m exaggerating a little bit 🙂 but this feature is as much appreciated as it is useful, because is helping DBAs to easily build scalable and redundant architectures:

MySQL 3.23.15 and up features support for one-way, asynchronous replication, in which one
server acts as the master, while one or more other servers act as slaves
MySQL 3.23/4.0/4.1 Reference Manual

At least from MySQL 3.23… I told you it’s been there for a long time: MySQL first Public Releases

But what does replication do?
Quoting the documentation again (the most recent this time)

Replication enables data from one MySQL database server (the source) to be copied to one or more MySQL database servers (the replicas).
Replication is asynchronous by default; replicas do not need to be connected permanently to receive updates from a source.
MySQL 8.4 Reference Manual

In this article we will see how to replicate data from HeatWave MySQL to a on-premise standalone MySQL Instance.

Get Your HeatWave MySQL Data On-Premises with Replication

Source is HeatWave MySQL 8.4.1
- IP: 137.235.181.51
- Port: 3306
Replica is MySQL 8.4.1
- IP: 88.65.229.255
- Port: 3306
Replication user
- ‘rplUser’@’88.65.229.%’

We will use 3 main components:

HeatWave MySQL as a source
The asynchronous replication capability of MySQL
A MySQL instance as a read replica

HeatWave MySQL

HeatWave is a fully-managed database service, powered by the integrated in-memory query accelerator. It is the only cloud-native database service that combines transactions, analytics, machine learning and generative AI services into HeatWave, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication. It also includes HeatWave Lakehouse which allows users to query data stored in object storage in a variety of file formats.
HeatWave is developed, managed, and supported by the MySQL team in Oracle.

HeatWave MySQL accelerate MySQL query performance by orders of magnitude and get real-time analytics on your transactional data. HeatWave MySQL is built on MySQL Enterprise Edition.

The HeatWave MySQL endpoints are not directly accessible from the internet (i.e. private IP). That said, there are different possibilities to connect such as a compute instance, VPN connection, Bastion session, or network load balancer.
In this article I’ll use the latter, which is present on a public subnet, enables you to connect to HeatWave MySQL over the internet.
However please note that it is not recommended, to make a database accessible over the internet, because it is a security risk.
You must restrict the authorized public IP addresses to a single IP address or a small range of IP addresses, and use in-transit encryption.
It is recommended to use a VPN connection.
You are warned!

I assuming your HeatWave MySQL instance is already created. If not, you can use the Console or the command-line interface to create an instance.
Using the console is pretty straightforward, you can find the details in the OCI documentation.

I also assuming that you have already setup your Network Load Balancer. If not I recommend you this great tutorial from my colleague Scott : Connecting to a MySQL HeatWave Database Instance Using an OCI Network Load Balancer.

MySQL Replication setup

Since MySQL replication is a familiar concept, widely covered over the years, let’s get right to the practical details that will help you set up your replication efficiently.
You’ll find all the juicy details in the MySQL 8.4 Reference Manual and some extra here and here.

Because the data stream will be over the Internet, we must, for obvious security reason, encrypted the connection.
More information on setting up Replication to Use Encrypted Connections.

Source configuration

On the source, create the replication user then grant that user the REPLICATION SLAVE privilege.
If you want to ensure that the source only accepts replicas that connect using encrypted connections (and btw you really want that), use the REQUIRE SSL option

-- On the Source
-- Create the replication user
CREATE USER 'rplUser'@'88.65.229.%' IDENTIFIED BY 'Q{P6@EH$L!YFje^9' REQUIRE SSL;
GRANT REPLICATION SLAVE ON *.* TO 'rplUser'@'88.65.229.%';

If the HeatWave MySQL instance (the source) contains existing data it is necessary to copy this data to the replica. There are different ways to dump the data from the source database, the recommended way in that context is to use MySQL Shell dump and load utilities.

You will most likely need dumpInstance and the relevant options are dependent of your context, the most important is usually compatibility.
MySQL Shell allows you to store your backup into an object store bucket and load from there, this is very convenient. For this article, the size of my dataset allows me to store it locally, and the only relevant option is dryRun.
More details in the documentation: Instance Dump Utility, Schema Dump Utility, and Table Dump Utility.

$ mysqlsh --js admin@137.235.181.51 

JS> util.dumpInstance("/backup/20240714", { dryRun: true })
dryRun enabled, no locks will be acquired and no files will be created.
...

If the result is satisfactory, then disable dry run:

JS> util.dumpInstance("/backup/20240714")
...

For your record, below an example of a dump that is stored in an object Storage bucket named MyBck_07 on Oracle Cloud Infrastructure:

util.dumpInstance("20240714", {osBucketName: "MyBck_07", osNamespace: "abcdefghijkl", ocimds: true })

Restore the dump on replica

The dump must be restored on the replica:

-- On the Replica
\sql SET GLOBAL  local_infile=1;

util.loadDump("/backup/20240714", {updateGtidSet:"replace", skipBinlog:true})

\sql SET GLOBAL  local_infile=0;

All the option details are available in the documentation. Please note that you can test you command with the dryRun option.

Also for your record, if you want to restore a dump for a bucket, your command will look like:

util.loadDump("20240714", {osBucketName: "MyBck_07", osNamespace: "abcdefghijkl"})

Setting the Source Configuration on the Replica

To set up the replica to communicate with the source for replication, configure the replica with the necessary connection information:

-- On the Replica
CHANGE REPLICATION SOURCE TO SOURCE_HOST = '137.235.181.51', SOURCE_AUTO_POSITION = 1, SOURCE_SSL=1;

START REPLICA USER = 'rplUser' PASSWORD = 'Q{P6@EH$L!YFje^9';

For a full list of options, see CHANGE REPLICATION SOURCE TO options.

Administration

On the replica, you can manage your replication as you normally do:

-- Checking Replication Status
SQL > SHOW REPLICA STATUS\G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 137.235.181.51
                  Source_User: rplUser
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binary-log.000088
          Read_Source_Log_Pos: 378
               Relay_Log_File: daz-Latitude-7400-relay-bin.000004
                Relay_Log_Pos: 557
        Relay_Source_Log_File: binary-log.000088
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
...

Other useful commands on the replica, using performance_schema are:

SQL > SELECT * FROM performance_schema.replication_applier_status\G
*************************** 1. row ***************************
              CHANNEL_NAME: 
             SERVICE_STATE: ON
           REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0

SQL > SELECT * FROM performance_schema.replication_connection_configuration\G
*************************** 1. row ***************************
                   CHANNEL_NAME: 
                           HOST: 137.235.181.51
                           PORT: 3306
                           USER: rplUser
              NETWORK_INTERFACE: 
                  AUTO_POSITION: 1
                    SSL_ALLOWED: YES
...

SQL > SELECT * FROM performance_schema.replication_connection_status\G
*************************** 1. row ***************************
               CHANNEL_NAME: 
                 GROUP_NAME: 
                SOURCE_UUID: 22f79ae9-991d-4b3c-82b9-b81bb2366635
                  THREAD_ID: 73
              SERVICE_STATE: ON
  COUNT_RECEIVED_HEARTBEATS: 17
   LAST_HEARTBEAT_TIMESTAMP: 2024-06-26 10:42:57.615232
   RECEIVED_TRANSACTION_SET: 22f79ae9-991d-4b3c-82b9-b81bb2366635:56
          LAST_ERROR_NUMBER: 0
         LAST_ERROR_MESSAGE: 
       LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
    LAST_QUEUED_TRANSACTION: 22f79ae9-991d-4b3c-82b9-b81bb2366635:56
...

On the source, you can check the status of connected replicas:

-- Check the status of connected replicas
SQL > SHOW PROCESSLIST \G
*************************** 6. row ***************************
     Id: 547
   User: rplUser
   Host: nlb-mhs-paris-dba.sub03221692542.vcnparisdba.oraclevcn.com:1234
     db: NULL
Command: Binlog Dump GTID
   Time: 428
  State: Source has sent all binlog to replica; waiting for more updates
   Info: NULL

Basic information about the replicas that were started with the --report-host option and are connected to the source:

SQL > SHOW REPLICAS \G
*************************** 1. row ***************************
   Server_Id: 1
        Host: Replica-8865229255
        Port: 3306
   Source_Id: 3640816068
Replica_UUID: 46c6de0c-2ff7-4b14-a80a-5dffea08864a

Other useful commands on the source, using performance_schema are:

SQL > SELECT 
   SERVICE_STATE, 
   HOST, 
   USER, 
   PORT
FROM performance_schema.replication_connection_configuration 
   INNER JOIN performance_schema.replication_applier_status 
      USING (CHANNEL_NAME) \G
*************************** 1. row ***************************
SERVICE_STATE: ON
         HOST: 137.235.181.51
         USER: rplUser
         PORT: 3306

SELECT
    concat(conn_status.channel_name, ' (', worker_id,')') AS channel,
    conn_status.service_state AS io_state,
    applier_status.service_state AS sql_state,
    format_pico_time(if(GTID_SUBTRACT(LAST_QUEUED_TRANSACTION, LAST_APPLIED_TRANSACTION) = "","0",
    abs(time_to_sec(if(time_to_sec(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP)=0,0,
    timediff(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,now()))))) * 1000000000000) latency,
    format_pico_time((LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP -
    LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP) * 100000000000) transport_time,
    format_pico_time((LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP -
    LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP) * 1000000000000) time_to_relay_log,
    format_pico_time((LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP -
    LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP) * 1000000000000) apply_time,
    conn_status.LAST_QUEUED_TRANSACTION AS last_queued_transaction,
    applier_status.LAST_APPLIED_TRANSACTION AS last_applied_transaction
FROM
    performance_schema.replication_connection_status AS conn_status JOIN performance_schema.replication_applier_status_by_worker AS applier_status 
        ON applier_status.channel_name = conn_status.channel_name
\G
*************************** 1. row ***************************
                 channel:  (1)
                io_state: ON
               sql_state: ON
                 latency:   0 ps
          transport_time:   0 ps
       time_to_relay_log:   0 ps
              apply_time:   0 ps
 last_queued_transaction: 
last_applied_transaction: 
...

Peroration

This article guides you through setting up replication between a HeatWave MySQL instance (source) and an on-premise standalone MySQL instance (replica).
It highlights key steps like creating a replication user, securing the connection, and using MySQL Shell utilities for data transfer.
While a Network Load Balancer can be used for the connection, a VPN is recommended for better security.

Resources

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

Olivier DASINI

Cloud Solutions Architect at Oracle
MySQL Geek, strong interest in data and AI, author, blogger and speaker
I’m an insatiable hunger of learning.
—–

Blog: www.dasini.net/blog/en/
Twitter: https://twitter.com/freshdaz
SlideShare: www.slideshare.net/freshdaz
Youtube: https://www.youtube.com/channel/UC12TulyJsJZHoCmby3Nm3WQ
—–

dasini.net – Diary of a MySQL expert