Automatic connection failover for Asynchronous Replication

November 5, 2020
Plage par Olivier DASINI

TL;DR

Since MySQL 8.0.22 there is a mechanism in asynchronous replication that makes the receiver automatically try to re-establish an asynchronous replication connection to another sender, in case the current connection gets interrupted due to the failure of the current sender.


Asynchronous automatic connection failover automates the process of re-establishment of an asynchronous replication connection to another sender of sender list.
That means if a the source of a replica crashed, this replica will be able to automatically connect to another source.
One of the biggest interest is to improve Disaster Recovery (DR) architecture.

With this feature, a typical architecture is to have a 3 nodes asynchronous replication cluster.
2 primary nodes in active/passive mode (if you need active/active architecture use MySQL InnoDB Cluster) and the third one is connected to one of the primary, either for DR or for some specialized task like analytics for example.
So in case of unavailability of its primary – if the replication I/O thread stops due to the source stopping or due to a network failure – this replica will automatically connect to the other primary.

Another architecture is to use this asynchronous automatic connection failover feature with MySQL InnoDB Cluster.

Hey guess what? this is the topic of this tutorial and by the way, we will use some of the fanciest MySQL 8.0 features 🙂

Context

4 MySQL 8.0.22 instances :


– mysql_node1 : 172.20.0.11
– mysql_node2 : 172.20.0.12
– mysql_node3 : 172.20.0.13
These 3 are members of a MySQL Group Replication cluster.
I created a MySQL InnoDB Cluster (resources here and here).
I will not use MySQL Router in this tutorial.


– mysql_node4 : 172.20.0.14
It’s my replica.
It will be asynchronously connected to the Group Replication.

Because a picture is worth a thousand words, below is the architecture I want to achieve :

Asynchronous Replica of a MySQL Group Replication by Olivier DASINI

Graphs in this article are from MySQL Enterprise Monitor.

Current status

I’m using MySQL 8.0.22 :

I have a 3 nodes Group Replication cluster up and running :

Setup the replication user

Like you know, there are some preparation steps to be able to use MySQL Asynchronous Replication. If you’re not familiar with replication using GTID, please read this.

On the Group Replication primary – currently mysql_node1 – I setting up the asynchronous replication user:

Setup the clone user

The clone plugin is one of my favorite feature!

It permits cloning data locally or from a remote MySQL server instance.
Cloned data is a physical snapshot of data stored in InnoDB that includes schemas, tables, tablespaces, and data dictionary metadata.
The cloned data comprises a fully functional data directory, which permits using the clone plugin for MySQL server provisioning.

It’s a very convenient way to copy data from the source (the Group Replication) to the replica (mysql_node4).

Thanks to InnoDB Cluster the Clone plugin is already installed on the 3 members.
So on the primary member – currently mysql_node1 – I’ll create a dedicated clone user with the donor privileges for using and monitoring the clone plugin:

Note that I could have used the cluster administrator dedicated user instead of create a specialized clone user.

On mysql_node4, the future replica, I’ll create the same user but with the recipient privileges.
But before I’ll install the clone plugin and set the clone donor list:

Clone a MySQL instance

Now we have everything all set to create the replica from a member of the group.

On the future replica – mysql_node4 – we can now run the clone instance command:

If you want to monitor the cloning progress run the following query:

When the cloning is over, the MySQL instance must be restarted (that will normally happen automatically).
After the restart, you can verify that the clone completed successfully with the queries below:

and:

Add the replica

First I will setup the configuration information for a replication source server to the source list for a replication channel.
To do that we use the function: asynchronous_connection_failover_add_source
Information needed are the replication channel name, the source server address, port and network namespace, and also the weight.

More information are available here.

For this tutorial I chose the following values:

  • Channel name: autof
  • Source server addresses: mysql_node1, mysql_node2, mysql_node3
  • Source server port: 3306
  • Source server network namespace: ” (empty)
  • Weight: respectively: 50, 90, 90

The replica’s source lists for each replication channel for the asynchronous connection failover mechanism can be viewed in the Performance Schema table replication_asynchronous_connection_failover.

To set the parameters that the replica server uses for connect to the source, we use the well known CHANGE MASTER TO statement.
You already know most of its clauses, so let’s only focus on some of them:

  • SOURCE_CONNECTION_AUTO_FAILOVER : activates the asynchronous connection failover mechanism.
  • MASTER_RETRY_COUNT & MASTER_CONNECT_RETRY : define the failover time. The default setting is… 60 days, probably not what you want :). So, you (most likely) should reduced the settings. e.g. 1 minute is respectively 20 and 3. (20 x 3 = 60)
  • FOR CHANNEL : enables you to name which replication channel the statement applies to. The CHANGE MASTER TO statement applies to this specific replication channel.

Now let’s configure the replication:

Please note that my failover time in this tutorial is 15 seconds (3 x 5). Obviously the relevant setting depends on your needs. A longer duration will probably makes more sense in real life.

Then start the replication, on channel autof using START REPLICA:

 Status information of the replication can be seen with SHOW REPLICA:

Automatic connection failover

We have now configured our replication, with an InnoDB Cluster/Group Replication as a source and a standalone MySQL server as a replica.
Let’s see how the automatic connection failover works.

Restart the replica

I want to see the behavior after a restart of the replication.

State before the stop:

Stop the replication:

MySQL Enterprise Monitor

After a while, start the replication:

Replica picks up where it left off… as you would have expected.

Short unavailability of the source

I want to see the behavior after a short unavailability of the source. I mean a duration lower than the failover threshold – connection_retry_interval x connection_retry_count – 15 seconds (5×3) in this example.

State before the stop of the source, mysql_node2:

… Stop mysql_node2 for 10 seconds …

MySQL Enterprise Monitor

State after the start of the source mysql_node2:

MySQL Enterprise Monitor

Well… nothing changed!
The unavailability of the source was not long enough to trigger the failover.
That is awesome to prevent non necessary failover.

Long unavailability of the source

I want to see the behavior after a longer unavailability of the source. I mean a duration greater than the failover threshold – connection_retry_interval x connection_retry_count – 15 seconds (5×3) in this example.

State before the stop of the source, mysql_node2:

… Stop mysql_node2 for 20 seconds …

MySQL Enterprise Monitor

As expected, the asynchronous automatic connection failover took place. \o/
The new source is now mysql_node3, because it has a bigger weight than mysql_node1 (90 vs 50) and because it was available 🙂

Limitations

Please be aware that in 8.0.22 this feature lacks of some of the needed functionality to replace MySQL Router as means to replicate from an InnoDB Cluster/Group Replication cluster.

Things such as:

  • does not automatically learn about new members or members that are removed
  • does not follow the primary role, it stays connected to whatever host it was connected to
  • does not follow the majority network partition
  • does not care if a host is not part of the group any longer, as long as it can connect, it will

These limitations will be lifted in future versions.

This is a very nice feature starting with MySQL 8.0.22, useful for both MySQL Replication and MySQL Group Replication architectures.

And you know what? There is more to come 😉

Stay tuned!

References

Follow me on twitter

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using MySQL!

4 Responses to “Automatic connection failover for Asynchronous Replication”

  1. […] Automatic Asynchronous Replication Connection Failover […]

  2. […] Automatic Asynchronous Replication Connection Failover […]

  3. hi, what about the features in MySQL 8.0.23 version? Are the limitations also the same as version 8.0.22? Thank You.

  4. Hello,
    They are some improvements in 8.0.23+

    Please have a look to:
    https://lefred.be/content/setup-dr-for-your-mysql-innodb-cluster/
    and
    https://mysqlhighavailability.com/asynchronous-replication-connection-failover-automatic-source-list/

    And there is still more to come