MySQL InnoDB Cluster – Recovering and provisioning with mysqldump

July 9, 2019
Butterfly by Olivier DASINI

As the administrator of a cluster, among other tasks, you should be able to restore failed nodes and grow (or shrink) your cluster by adding (or removing) new nodes.

In MySQL, as a backup tool (and if your amount of data is not too big), you can use mysqldump a client utility that performs logical backups.
The results are SQL statements that reproduce the original schema objects and data.

For substantial amounts of data however, a physical backup solution such as MySQL Enterprise Backup is faster, particularly for the restore operation.
Hey! guess what? You can read: MySQL InnoDB Cluster – Recovering and provisioning with MySQL Enterprise Backup

Context

Let’s make it as simple as possible 🙂
I’m using MySQL 8.0.16.
I have an InnoDB Cluster setup – up and running.
So my main assumption is that you already know what is MySQL Group Replication & MySQL InnoDB Cluster.
Additionally you can read this tutorial and this article from my colleague lefred or this one on Windows Platform from my colleague Ivan.

Note:

Depending on how you configured your MySQL InnoDB Cluster, some steps could be slightly different.

Scenario 1 – Node Recovering

  • A 3 nodes MySQL InnoDB Cluster – M1 / M2 / M3, in single primary mode
  • MySQL Router is configured to enable R/W connections on port 3306 and RO connections on port 3307
  • M1 is currently the primary (so in Read/Write mode)
  • M2 & M3 are currently the secondaries (i.e. Read Only mode)
  • M1 failed! Some tables are irreconcilably corrupted 🙁
  • M2 & M3 are now the (new temporary) cluster

The goal then is to rebuild M1 and put it back to the cluster.

So like I stated we have a 3 nodes MySQL 8.0.16 InnoDB Cluster up and running:

$ mysqlsh clusterAdmin@{mysqlRouterIP}:3306 --cluster
...

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

Then M1 failed (status is “MISSING“) :

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "n/a", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "(MISSING)"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

M1 was the primary.
The cluster initiated an automatic database failover to elect a new primary… blablabla
Anyway you already know the story 🙂

After a while M1 is finally fixed and ready to be part of the cluster again.
To minimize the recovery time instead of using the last dump we prefer to take a fresh one.

Speaking of backup, I recommend reading the excellent blog post from my colleague JesperMySQL Backup Best Practices.

Let’s take a fresh dump on a secondary node using MySQL Router, on port 3307 with my custom configuration (we could also used the primary).
The dump will roughly looks like :

mysqldump --defaults-file=/etc/my.cnf -u mdump -p -h {mysqlRouterIP} -P 3307 --all-databases --routines --events --single-transaction --flush-privileges --hex-blob --log-error=/var/log/mysqldump.log --result-file=/data/backups/dump.sql

Minimum privileges for the MySQL user with which mysqldump connects to the server

Actually this minimum privileges depends on what object you wan to dump and what for the dump it is.

However the following privileges should be fine for most of the classical usages:

GRANT SELECT, SHOW VIEW, EVENT, TRIGGER, LOCK TABLES, CREATE, ALTER, RELOAD, REPLICATION CLIENT, REPLICATION_SLAVE_ADMIN ON *.* TO <dumpUser>

Please note that it is highly recommended, in addition to the my.cnf, to include in your backup process a copy of the auto.cnf and mysqld-auto.cnf configuration files for all nodes.

If you “lose” your auto.cnf file, don’t worry the server will generate a new one for you.
However the recovery process will be slightly different… (more on that below).

Now it’s time to restore this dump on node M1.

Because this server is part of InnoDB Cluster, clearly there are some extra steps compare to a standalone server restoration.

Restore the data

First, restore the data on M1:

  • It’s a logical restoration so the server to restore must be up 😀
  • Group Replication plugin must be stopped
    • STOP GROUP_REPLICATION;
  • Disable logging to the binary log
    • SET SQL_LOG_BIN=0;
  • Delete binary log files
    • RESET MASTER;
  • Clear the master info and relay log info repositories and deletes all the relay log files
    • RESET SLAVE;
  • Enable updates
    • SET GLOBAL super_read_only=0;
  • Load the dump
    • source /data/backups/dump.sql

This gives us:

mysqlsh root@M1:3306 --sql
...

M1 SQL> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (12.04 sec)

M1 SQL> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)

M1 SQL> RESET MASTER; 
Query OK, 0 rows affected (0.06 sec)

M1 SQL> RESET SLAVE;
Query OK, 0 rows affected (0.13 sec)

M1 SQL> SET GLOBAL super_read_only=0;
Query OK, 0 rows affected (0.00 sec)

M1 SQL> source /data/backups/dump.sql
Query OK, 0 rows affected (0.00 sec)
...

Put the node back to the cluster

Second, put the node back to the cluster.
Connect to MySQL Router on the primary (port 3306 in my case):

$ mysqlsh clusterAdmin@{mysqlRouterIP}:3306 --cluster
...

MySQL JS> cluster.rejoinInstance("clusterAdmin@M1:3306")
Rejoining the instance to the InnoDB cluster. Depending on the original problem that made the instance unavailable, the rejoin operation might not be successful and further manual steps will be needed to fix the underlying problem.

Please monitor the output of the rejoin operation and take necessary action if the instance cannot rejoin.

Rejoining instance to the cluster ...

The instance 'M1:3306' was successfully rejoined on the cluster.

Now you should check the new cluster status

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Note:

The cluster status of the restored node will be in the status “RECOVERING” before to be “ONLINE”.

Lost the auto.cnf file

As promised, the case when the auto.cnf configuration file is not restored.
In fact, in that case the cluster would see this node as a new node (because the server will have a new UUID).
So the process for putting it back is different.

Also note that if you loose the mysqld-auto.cnf file you’ll probably need to configure (again) the server to be Group Replication aware.

So basically the process is doing some cleaning and then add the old node like it was a new node.


Assuming Group Replication plugin is stopped on M1:

// Check the configuration even more important if you lost mysqld-auto.cnf :)
MySQL JS> dba.checkInstanceConfiguration('clusterAdmin@M1:3306')
Validating MySQL instance at M1:3306 for use in an InnoDB cluster...

This instance reports its own address as M1

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance 'M1:3306' is valid for InnoDB cluster usage.

{
    "status": "ok"
}


// If needed configure your instance
MySQL JS> dba.configureInstance('clusterAdmin@M1:3306')


// Remove the old node from the cluster metadata
MySQL JS> cluster.rescan()
Rescanning the cluster...

Result of the rescanning operation for the 'default' ReplicaSet:
{
    "name": "default", 
    "newTopologyMode": null, 
    "newlyDiscoveredInstances": [], 
    "unavailableInstances": [
        {
            "host": "M1:3306", 
            "label": "M1:3306", 
            "member_id": "a3f1ee50-9be3-11e9-a3fe-0242ac13000b"
        }
    ]
}

The instance 'M1:3306' is no longer part of the ReplicaSet.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('M1:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: Y
Removing instance from the cluster metadata...
The instance 'M1:3306' was successfully removed from the cluster metadata.


// Add the new instance
MySQL JS> cluster.addInstance("clusterAdmin@M1:3306")
A new instance will be added to the InnoDB cluster. Depending on the amount of data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster ...

Validating instance at M1:3306...

This instance reports its own address as M1

Instance configuration is suitable.
The instance 'clusterAdmin@M1:3306' was successfully added to the cluster.


// Check
MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Et voilà !

Scenario 2 – Node Provisioning

  • A 3 nodes MySQL InnoDB Cluster – M1 / M2 / M3 in single primary mode
  • MySQL Router is configured to enable R/W connections on port 3306 and RO connections on port 3307
  • M2 is currently the primary in Read/Write mode
  • M1 & M3 are currently the secondaries in Read Only mode

The goal then is to add 2 new nodes: M4 & M5

So we have the 3 nodes MySQL 8.0.16 InnoDB Cluster that we used in the first part of this article.
And it is up and running.

Actually adding new nodes is very close to what we have done previously.

The process is :

  • Deploy the new MySQL instance preferably already configured for Group Replication
  • Restore the data in the way that we have seen previously

Check the configuration and the configuration itself can be done respectively with dba.checkInstanceConfiguration() and dba.configure() functions (and it could also be useful to use checkInstanceState()see this article).
e.g. on node M4:

$ mysqlsh clusterAdmin@M4:3306 -- dba checkInstanceConfiguration
Validating MySQL instance at M4:3306 for use in an InnoDB cluster...

This instance reports its own address as M4

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance 'M4:3306' is valid for InnoDB cluster usage.

{
    "status": "ok"
}

The first part of the restore process is the same than the one we have seen:

mysqlsh root@M4:3306 --sql
...

M4 SQL> STOP GROUP_REPLICATION; -- if necessary
Query OK, 0 rows affected (12.04 sec)

M4 SQL> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)

M4 SQL> RESET MASTER; 
Query OK, 0 rows affected (0.06 sec)

M4 SQL> RESET SLAVE;
Query OK, 0 rows affected (0.13 sec)

M4 SQL> SET GLOBAL super_read_only=0;
Query OK, 0 rows affected (0.00 sec)

M4 SQL> source /data/backups/dump.sql
Query OK, 0 rows affected (0.00 sec)
...

For the second part, we will now add the new node (M4) :

// Add the new instance
MySQL JS> cluster.addInstance("clusterAdmin@M4:3306")


// Check
MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M4:3306": {
                "address": "M4:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Note:

If necessary, just before the addInstance() you can do the checkInstanceConfiguration() and configureInstance().

Same process for the last node, M5.

And finally you got a 5 nodes MySQL InnoDB Cluster \o/:

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to 2 failures.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M4:3306": {
                "address": "M4:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M5:3306": {
                "address": "M5:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

This is one way to do node recovery and provisioning using mysqldump, when your amount of data is not big.

When logical backup is not efficient it is time to use an hot, online and non-blocking physical backup tool like MySQL Enterprise Backup.
Actually, it’s even easier!
Please read: MySQL InnoDB Cluster – Recovering and provisioning with MySQL Enterprise Backup.

A final word to say that some new features are coming in this area… 🙂
Stay tuned!

References

Misc
Node 1 – Group Replication configuration

mysql> SHOW VARIABLES LIKE 'group_replication%';
+-----------------------------------------------------+-------------------------------------+
| Variable_name                                       | Value                               |
+-----------------------------------------------------+-------------------------------------+
| group_replication_allow_local_lower_version_join    | OFF                                 |
| group_replication_auto_increment_increment          | 7                                   |
| group_replication_autorejoin_tries                  | 1                                   |
| group_replication_bootstrap_group                   | OFF                                 |
| group_replication_communication_debug_options       | GCS_DEBUG_NONE                      |
| group_replication_communication_max_message_size    | 10485760                            |
| group_replication_components_stop_timeout           | 31536000                            |
| group_replication_compression_threshold             | 1000000                             |
| group_replication_consistency                       | EVENTUAL                            |
| group_replication_enforce_update_everywhere_checks  | OFF                                 |
| group_replication_exit_state_action                 | READ_ONLY                           |
| group_replication_flow_control_applier_threshold    | 25000                               |
| group_replication_flow_control_certifier_threshold  | 25000                               |
| group_replication_flow_control_hold_percent         | 10                                  |
| group_replication_flow_control_max_quota            | 0                                   |
| group_replication_flow_control_member_quota_percent | 0                                   |
| group_replication_flow_control_min_quota            | 0                                   |
| group_replication_flow_control_min_recovery_quota   | 0                                   |
| group_replication_flow_control_mode                 | QUOTA                               |
| group_replication_flow_control_period               | 1                                   |
| group_replication_flow_control_release_percent      | 50                                  |
| group_replication_force_members                     |                                     |
| group_replication_group_name                        | d1b109bf-9be3-11e9-9ea2-0242ac13000b|
| group_replication_group_seeds                       | M2:33061,M3:33061,M4:33061,M5:33061 |
| group_replication_gtid_assignment_block_size        | 1000000                             |
| group_replication_ip_whitelist                      | AUTOMATIC                           |
| group_replication_local_address                     | M1:33061                            |
| group_replication_member_expel_timeout              | 0                                   |
| group_replication_member_weight                     | 50                                  |
| group_replication_message_cache_size                | 1073741824                          |
| group_replication_poll_spin_loops                   | 0                                   |
| group_replication_recovery_complete_at              | TRANSACTIONS_APPLIED                |
| group_replication_recovery_get_public_key           | OFF                                 |
| group_replication_recovery_public_key_path          |                                     |
| group_replication_recovery_reconnect_interval       | 60                                  |
| group_replication_recovery_retry_count              | 10                                  |
| group_replication_recovery_ssl_ca                   |                                     |
| group_replication_recovery_ssl_capath               |                                     |
| group_replication_recovery_ssl_cert                 |                                     |
| group_replication_recovery_ssl_cipher               |                                     |
| group_replication_recovery_ssl_crl                  |                                     |
| group_replication_recovery_ssl_crlpath              |                                     |
| group_replication_recovery_ssl_key                  |                                     |
| group_replication_recovery_ssl_verify_server_cert   | OFF                                 |
| group_replication_recovery_use_ssl                  | ON                                  |
| group_replication_single_primary_mode               | ON                                  |
| group_replication_ssl_mode                          | REQUIRED                            |
| group_replication_start_on_boot                     | ON                                  |
| group_replication_transaction_size_limit            | 150000000                           |
| group_replication_unreachable_majority_timeout      | 0                                   |
+-----------------------------------------------------+-------------------------------------+

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

2 Responses to “MySQL InnoDB Cluster – Recovering and provisioning with mysqldump”

  1. […] I stated in my previous article – MySQL InnoDB Cluster – Recovering and provisioning with mysqldump : “As the administrator of a cluster, among other tasks, you should be able to restore failed […]

  2. […] Use mysqldump : see MySQL InnoDB Cluster – Recovering and provisioning with mysqldump […]