MySQL Security – Password Reuse Policy

April 28, 2020

When thinking about security within a MySQL installation, you can consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications.

MySQL provides many tools / features / plugins or components in order to protect your data including some advanced features like Transparent Data Encryption (TDE)Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Verification-Required Policy, Failed-Login Tracking and Temporary Account Locking, Dual Password Support, Connection-Control Plugins, Password Validation Component, etc…

MySQL Security

Basic password policy practices teaches us :

  • Each user must have a password
  • A user’s password should be changed periodically

However, often this is not enough. Actually, some regulations may require that users can not reuse a previous password.

You can do that by setting how often and / or how long an old password can be reuses. In this article, from my new MySQL Security series, we will see how to establish a policy for password reuse with MySQL 8.0 Password Reuse Policy.


TL;DR

MySQL provides password-reuse capability, which allows database administrators to determine the number of unique passwords a user must use before they can use an old password again.

Enable restrictions on reuse of previous passwords with MySQL

The main goal of Password Reuse Policy is to enable restrictions to be placed on reuse of previous passwords.
It can be established globally, and individual accounts can be set to either defer to the global policy or override the global policy with specific per-account behavior.

There are different clauses a DBA can use with CREATE USER or ALTER USER to establish a per account password reuse policy.

Let’s dig into it using MySQL 8.0.

$ mysqlsh daz@localhost --sql

MySQL SQL> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.19    |
+-----------+

PASSWORD HISTORY

Prohibit reusing any of the last 10 (then 24) passwords :

MySQL SQL> 
CREATE USER 'aUser'@'localhost' PASSWORD HISTORY 10;


SELECT user, host, password_reuse_history, password_last_changed FROM mysql.user WHERE user = 'aUser'\G
*************************** 1. row ***************************
                  user: aUser
                  host: localhost
password_reuse_history: 10
 password_last_changed: 2020-04-03 09:45:45


ALTER USER 'aUser'@'localhost' PASSWORD HISTORY 24;


SELECT user, host, password_reuse_history, password_last_changed FROM mysql.user WHERE user = 'aUser'\G
*************************** 1. row ***************************
                  user: aUser
                  host: localhost
password_reuse_history: 24
 password_last_changed: 2020-04-03 09:45:45

PASSWORD REUSE INTERVAL n DAY

Require a minimum of 180 (then 365) days elapsed before permitting reuse :

MySQL SQL> 
CREATE USER 'bUser'@'localhost' PASSWORD REUSE INTERVAL 180 DAY;


SELECT user, host, password_reuse_time, password_last_changed FROM mysql.user WHERE user = 'bUser'\G
*************************** 1. row ***************************
                 user: bUser
                 host: localhost
  password_reuse_time: 180
password_last_changed: 2020-04-03 10:03:20


ALTER USER 'bUser'@'localhost' PASSWORD REUSE INTERVAL 365 DAY;


SELECT user, host, password_reuse_time, password_last_changed FROM mysql.user WHERE user = 'bUser'\G
*************************** 1. row ***************************
                 user: bUser
                 host: localhost
  password_reuse_time: 365
password_last_changed: 2020-04-03 10:03:20

Combine types of reuse restrictions

It is also possible to combine both types of reuse restrictions.
Simply use PASSWORD HISTORY and PASSWORD REUSE INTERVAL n DAY together :

MySQL SQL> 
CREATE USER 'cUser'@'localhost' 
  PASSWORD HISTORY 5 
  PASSWORD REUSE INTERVAL 180 DAY;


SELECT user, host, password_reuse_history, password_reuse_time, password_last_changed FROM mysql.user WHERE user = 'cUser'\G
*************************** 1. row ***************************
                  user: cUser
                  host: localhost
password_reuse_history: 5
   password_reuse_time: 180
 password_last_changed: 2020-04-03 10:11:31


ALTER USER 'cUser'@'localhost' 
  PASSWORD HISTORY 10 
  PASSWORD REUSE INTERVAL 365 DAY;


SELECT user, host, password_reuse_history, password_reuse_time, password_last_changed FROM mysql.user WHERE user = 'cUser'\G
*************************** 1. row ***************************
                  user: cUser
                  host: localhost
password_reuse_history: 10
   password_reuse_time: 365
 password_last_changed: 2020-04-03 10:11:31

Global Policy

Reuse policy can be established globally, as specified by the password_history and password_reuse_interval system variables.

The default password_history value is 0, which disables automatic password expiration.
Same for password_reuse_interval.

password_history and password_reuse_interval variables can be set in the MySQL configuration file (usually my.cnf or my.ini) but it can also be set and persisted at runtime using SET PERSIST :

SET PERSIST password_history = 10;

SET PERSIST password_reuse_interval = 365;


SHOW VARIABLES WHERE Variable_name IN ('password_history','password_reuse_interval');
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| password_history        | 10    |
| password_reuse_interval | 365   |
+-------------------------+-------+

The same behavior can be achieved using the my.cnf (or my.ini) file :

[mysqld]
password_history = 10
password_reuse_interval = 365

However it requires a server restart.

To defer the global policy for an account for both types of reuse restrictions you must use the DEFAULT clause :

MySQL SQL> 
CREATE USER olivier@localhost
  PASSWORD HISTORY DEFAULT
  PASSWORD REUSE INTERVAL DEFAULT;


ALTER USER ethan@localhost
  PASSWORD HISTORY DEFAULT
  PASSWORD REUSE INTERVAL DEFAULT;

To establish a global policy such that none of these restriction exist, set password_history and password_reuse_interval to 0 :

MySQL SQL> 
SET PERSIST password_history = 0;

SET PERSIST password_reuse_interval = 0;


SHOW VARIABLES WHERE Variable_name IN ('password_history','password_reuse_interval');
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| password_history        | 0     |
| password_reuse_interval | 0     |
+-------------------------+-------+

Please note that the empty password does not count in the password history and is subject to reuse at any time.

To Go Further

Reference Manual

MySQL Security Serie (1st edition)

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

3

MySQL Security – Password Expiration Policy

April 21, 2020

When thinking about security within a MySQL installation, you can consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications.

MySQL provides many tools / features / plugins or components in order to protect your data including some advanced features like Transparent Data Encryption (TDE)Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Reuse Policy, Password Verification-Required Policy, Failed-Login Tracking and Temporary Account Locking, Dual Password Support, Connection-Control Plugins, Password Validation Component, etc…

MySQL Security

Basic password policy practices teaches us :

  • Each user must have a password
  • A user’s password should be changed periodically

However, often this is not enough. Actually, some regulations required that the password is renewed in a timely and appropriate manner (e.g. every 90 days).

In this article, we will see how to establish a policy for password expiration with MySQL 8.0 Password Expiration Policy.


TL;DR

MySQL provides password-expiration capability, which enables database administrators to require that users reset their password.

Establish a policy for password expiration with MySQL

The main goal of Password Expiration Policy is to require passwords to be changed periodically.
It can be established globally, and individual accounts can be set to either defer to the global policy or override the global policy with specific per-account behavior.

There are different clauses a DBA can use with CREATE USER or ALTER USER to establish a per account password expiration policy.

Let’s take a closer look using MySQL 8.0.

$ mysqlsh daz@localhost --sql

MySQL SQL> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.19    |
+-----------+

PASSWORD EXPIRE

Force user to change its password at the first connection.

Create a user with a random password and mark that password expired :

-- Mark the password expired so that the user must choose a new one at the first connection to the server
MySQL SQL> 
CREATE USER 'aUser'@'localhost' IDENTIFIED BY RANDOM PASSWORD PASSWORD EXPIRE;
+-------+-----------+----------------------+
| user  | host      | generated password   |
+-------+-----------+----------------------+
| aUser | localhost | (wvx3n7jH)bVNi3tOiQV |
+-------+-----------+----------------------+

We can see if the password is expired with mysql.user table :

MySQL SQL> 
SELECT user, host, password_lifetime, password_expired, password_last_changed FROM mysql.user WHERE user = 'aUser' \G
*************************** 1. row ***************************
                 user: aUser
                 host: localhost
    password_lifetime: NULL
     password_expired: Y
password_last_changed: 2020-04-01 12:31:57

Note that password_expired column is Y.

In clear, this new MySQL user will be able to connect to the server but he must reset its password before being able to executing statements

$ mysql -u aUser -p
Enter password: 
...


mysql> SELECT USER();
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.


mysql> ALTER USER user() IDENTIFIED BY 'n3w_pAssw0rd';
Query OK, 0 rows affected (0.01 sec)


mysql> SELECT USER();
+-----------------+
| USER()          |
+-----------------+
| aUser@localhost |
+-----------------+

Column password_expired is now N.

MySQL SQL> 
SELECT user, host, password_lifetime, password_expired, password_last_changed FROM mysql.user WHERE user = 'aUser' \G
*************************** 1. row ***************************
                 user: aUser
                 host: localhost
    password_lifetime: NULL
     password_expired: N
password_last_changed: 2020-04-01 12:41:25

PASSWORD EXPIRE INTERVAL n DAY

Force user to change its password every N days.

Create a user with password that will expire in 90 days :

MySQL SQL> 
-- Require that a new password be chosen every 90 days
CREATE USER 'aNewUser'@'localhost' IDENTIFIED BY 'aN3w_pAssw0rd' PASSWORD EXPIRE INTERVAL 90 DAY;

We can see the password options in the mysql.user table :

MySQL SQL> 
SELECT user, host, password_lifetime, password_expired, password_last_changed FROM mysql.user WHERE user = 'aNewUser'\G
*************************** 1. row ***************************
                 user: aNewUser
                 host: localhost
    password_lifetime: 90
     password_expired: N
password_last_changed: 2020-04-01 15:40:14

Note that password_lifetime column is 90.

After 90 days any statement will generate error 1820 :

$ mysql -u aNewUser -p
...


mysql {aNewUser}> SELECT USER();
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.

Password could be reset with ALTER USER command :

mysql {aNewUser}> 
ALTER USER user() IDENTIFIED BY '4noth3r_pa5sw0rd';
Query OK, 0 rows affected (0.01 sec)


mysql {aNewUser}> SELECT USER();
+--------------------+
| USER()             |
+--------------------+
| aNewUser@localhost |
+--------------------+

PASSWORD EXPIRE DEFAULT

This clause sets the account so that the global password expiration policy applies, as specified by the default_password_lifetime system variable.

The default default_password_lifetime value is 0, which disables automatic password expiration.
If the value of default_password_lifetime is a positive integer N, it indicates the permitted password lifetime; passwords must be changed every N days.

default_password_lifetime can be set in the MySQL configuration file but it can also be set and persisted at runtime using SET PERSIST :

MySQL SQL> 
SET PERSIST default_password_lifetime = 30;

The same behavior can be achieved using the configuration file (usually my.cnf or my.ini) :

[mysqld]
default_password_lifetime = 30

but it will require a server restart.

To defer the global expiration policy for an account you should use Password Expire Default clause :

MySQL SQL> 
CREATE USER olivier@localhost  PASSWORD EXPIRE DEFAULT;


ALTER USER aNewUser@localhost PASSWORD EXPIRE DEFAULT;

Lastly, to establish a global policy such that passwords never expire, set default_password_lifetime to 0 :

MySQL SQL> 
SET PERSIST default_password_lifetime = 0;


SHOW VARIABLES LIKE 'default_password_lifetime';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| default_password_lifetime | 0     |
+---------------------------+-------+

PASSWORD EXPIRE NEVER

It’s also possible to disable password expiration for an account.

MySQL SQL> 
-- Disables password expiration for the account so that its password never expires
CREATE USER 'pingDB'@'localhost' IDENTIFIED BY 'A-p45swOrd' PASSWORD EXPIRE NEVER;


SELECT user, host, password_lifetime, password_expired, password_last_changed FROM mysql.user WHERE user = 'pingDB'\G
*************************** 1. row ***************************
                 user: pingDB
                 host: localhost
    password_lifetime: 0
     password_expired: N
password_last_changed: 2020-04-02 12:42:03

Note that password_lifetime column is 0.

This expiration option overrides the global policy for all accounts named by the statement.

To Go Further

Reference Manual

MySQL Security Serie (1st edition)

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

3

MySQL Security – Random Password Generation

April 15, 2020

When thinking about security within a MySQL installation, you can consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications.

MySQL provides many tools / features / plugins or components in order to protect your data including some advanced features like Transparent Data Encryption (TDE)Audit, Data Masking & De-Identification, Firewall, Password Expiration Policy, Password Reuse Policy, Password Verification-Required Policy, Failed-Login Tracking and Temporary Account Locking, Dual Password Support, Connection-Control Plugins, Password Validation Component, etc…

MySQL Security

Basic password policy practices teaches us :

  • Each user must have a password
  • A user’s password should be changed periodically

Indeed this is a good start !

What if MySQL make your life easier by helping you to create user with strong secure password?
Well it is now possible in MySQL 8.0 🙂


TL;DR

MySQL has the capability of generating random passwords for user accounts, as an alternative to requiring explicit administrator-specified literal passwords.


A DBA can use CREATE USER, ALTER USER or SET PASSWORD for generate random passwords for user accounts.

Let’s have a quick look using MySQL 8.0

MySQL SQL> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.19    |
+-----------+

Create a user account

To create a new MySQL user account with a random password use the statement CREATE USER with the clause IDENTIFIED BY RANDOM PASSWORD :

MySQL SQL> 
CREATE USER aUser@localhost IDENTIFIED BY RANDOM PASSWORD;
+-------+-----------+----------------------+
| user  | host      | generated password   |
+-------+-----------+----------------------+
| aUser | localhost | XtByqo%asj81CJmM[dfC |
+-------+-----------+----------------------+

Modify a user account

To assign a new random password to a MySQL user account use the statement ALTER USER with the clause IDENTIFIED BY RANDOM PASSWORD :

MySQL SQL> 
ALTER USER aUser@localhost IDENTIFIED BY RANDOM PASSWORD;
+-------+-----------+----------------------+
| user  | host      | generated password   |
+-------+-----------+----------------------+
| aUser | localhost | Sn!b%6(Iu>{34GIOj>vS |
+-------+-----------+----------------------+

Assign a password

Another way to assign a new random password to a MySQL user account is to use the statement SET PASSWORD with the clause TO RANDOM :

MySQL SQL> 
SET PASSWORD FOR aUser@localhost TO RANDOM;
+-------+-----------+----------------------+
| user  | host      | generated password   |
+-------+-----------+----------------------+
| aUser | localhost | 7,ln_3HXG<b8Kvw!o&u, |
+-------+-----------+----------------------+

Please note that by default, generated random passwords have a length of 20 characters.
This length is controlled by the generated_random_password_length system variable, which has a range from 5 to 255.

To Go Further

Reference Manual

MySQL Security Serie (1st edition)

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

4

MySQL 8.0.19 New Features Summary

February 17, 2020
Sakila mozaik by Olivier DASINI

Presentation of some of the new features of MySQL 8.0.19 released on January 13, 2020.

Agenda

  • InnoDB ReplicaSet
  • SQL Improvements
    • Table Value Constructors
    • LIMIT in recursive CTE
    • ALTER TABLE… DROP/ALTER CONSTRAINT
    • More information to Duplicate Key Error
  • Account Management Enhancements
  • Time zone offset for Timestamp & Datetime
  • Information Schema views for SQL Roles
  • MySQL Document Store Enhancements
  • MySQL Shell Enhancements
  • MySQL Router Enhancements
  • MySQL InnoDB Cluster Enhancements
  • MySQL Replication Enhancements
  • MySQL NDB Cluster Enhancements
  • MySQL Enterprise New Features
  • Thanks to the Contributors

Download this presentation and others on my SlideShare account.

That might interest you

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

3

Webinar – Migrating from MariaDB to MySQL

December 16, 2019

This webinar will cover the advantages and process for migrating from MariaDB/Galera cluster to MySQL InnoDB Cluster.

Over these last 2 years and especially with MySQL 8.0, MySQL InnoDB Cluster has matured a lot.
In this webinar our guest speaker Matthias Crauwels from Pythian will go over the key difference between both solutions.
Matthias will use his experience to show how to migrate your application from MariaDB/Galera cluster over to MySQL InnoDB Cluster with the least possible amount of downtime.

WHEN:

Thu, Dec 19: 09:00 Pacific time (America)
Thu, Dec 19: 10:00 Mountain time (America)
Thu, Dec 19: 11:00 Central time (America)
Thu, Dec 19: 12:00 Eastern time (America)
Thu, Dec 19: 14:00 São Paulo time
Thu, Dec 19: 17:00 UTC
Thu, Dec 19: 17:00 Western European time
Thu, Dec 19: 18:00 Central European time
Thu, Dec 19: 19:00 Eastern European time
Thu, Dec 19: 22:30 India, Sri Lanka
Fri, Dec 20: 00:00 Indonesia Western Time
Fri, Dec 20: 01:00 Singapore/Malaysia/Philippines time
Fri, Dec 20: 01:00 China time
Fri, Dec 20: 02:00 ??
Fri, Dec 20: 04:00 NSW, ACT, Victoria, Tasmania (Australia)

The presentation will be approximately 60 minutes long followed by Q&A.

Register for this web presentation

Comments Off on Webinar – Migrating from MariaDB to MySQL

MySQL 8.0.18 New Features Summary

November 26, 2019
Tags: , ,
Sakila mozaik by Olivier DASINI

Presentation of some of the new features of MySQL 8.0.18 released on October 14, 2019.

Agenda

  • Hash Join
  • EXPLAIN ANALYZE
  • Only OpenSSL
  • Random Password
  • MySQL Shell Enhancements
  • MySQL Router Enhancements
  • InnoDB Cluster Enhancements
  • Group Replication Enhancements
  • Replication Enhancements
  • Enterprise New Features
  • Thanks to the Contributors

Download this presentation and others on my SlideShare account.

That might interest you

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

3

MySQL 8.0.17 – New Features Summary

September 19, 2019
Sakila mozaik by Olivier DASINI

This presentation is a summary of the MySQL 8.0.17 new features.
Released on July 22, 2019.

Agenda

  • CLONE Plugin – Native automatic provisioning in the server
  • Multi-valued indexes
  • JSON functions using multi-valued indexes
  • JSON schema validation
  • New binary collation for utf8mb4
  • MySQL Shell Enhancements
  • MySQL Router Enhancements
  • InnoDB Cluster Enhancements
  • Group Replication Enhancements
  • Replication Enhancements
  • Thanks to the Contributors


Download this presentation and others on my SlideShare account.

That might interest you

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

3

MySQL InnoDB Cluster – Easy Recovering and provisioning

September 10, 2019
Cloudy Sky by Olivier DASINI

As the administrator of a cluster, among other tasks, you should be able to restore failed nodes and grow (or shrink) your cluster by adding (or removing) new nodes

Olivier DASINI

Up to MySQL 8.0.16, to perform these tasks you could:

Starting with MySQL 8.0.17, the easiest and more convenient method is to use the CLONE feature.

Note:
Regardless of the recovering and provisioning method, it is important to have a proven backup/restore procedure.

I recommend reading the excellent blog post from my colleague Jesper – MySQL Backup Best Practices.

Context

Let’s make it as simple as possible 🙂
I’m using MySQL 8.0.17.

I running a MySQL InnoDB Cluster well setup.
So my main assumption is that you already know what is MySQL Group Replication & MySQL InnoDB Cluster.
Additionally you can read this tutorial and this article from my colleague lefred or this one on Windows Platform from my colleague Ivan.

Scenario 1 – Node Recovering

  • A 3 nodes MySQL InnoDB Cluster – M1 / M2 / M3, in single primary mode
  • MySQL Router is configured to enable R/W connections on 3306 and RO connections on 3307
  • M1 is currently the primary (that is in Read/Write mode)
  • M2 & M3 are currently the secondaries (that is Read Only mode)
  • M1 crashed! :'(
  • M2 & M3 are now the (new temporary) cluster
  • Then…

The goal then is to rebuild M1 and put it back to the cluster.

So like I said before we have a 3 nodes MySQL 8.0.17 InnoDB Cluster up and running:

$ mysqlsh clusterAdmin@{mysqlRouterIP}:3306 --cluster

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

Then node M1 crashed… (status is “MISSING“ & we have a relevant shellConnectError message) :

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "n/a", 
                "readReplicas": {}, 
                "role": "HA", 
                "shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'M1' (113)", 
                "status": "(MISSING)"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

M1 was the primary.
The cluster initiated an automatic database failover to elect a new primary… blablabla
Anyway you already know the story ?

After a while M1 is finally fixed and ready to be part of the cluster again.

Node Recovery: Auto distributed recovery

Well I have a great news for you most of the time your only task will be to start the fixed MySQL instance.

Before starting the node (M1), status is Missing:

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "(MISSING)"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Then MySQL InnoDB Cluster will automatically choose the relevant distributed recovery method.
During a certain period of time status is Recovering.
Details are available in recoveryStatusText field:

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "recovery": {
                    "state": "ON"
                }, 
                "recoveryStatusText": "Distributed recovery in progress", 
                "role": "HA", 
                "status": "RECOVERING", 
                "version": "8.0.17"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

The node eventually becomes online.
Status: Online.

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

So you get it now, the node recovering process is handle by MySQL InnoDB Cluster using the automated distributed recovery.

That means, if your lucky, your only task (in addition to fixing the issue) is to (re)start the MySQL instance.

If you’re not lucky, you’ll probably need to remove and add the node again.

Node Recovery: Remove and add the node

The strategy here is to remove the node from the cluster metadata and then add it back – as a new node – to the cluster.

For example for node M3, remove the node with removeInstance():

cluster.removeInstance("clusterAdmin@M3:3306", {force:true}) 

force is a boolean. By default, set to false.
Indicating if the instance must be removed (even if only from metadata) in case it cannot be reached.
Useful if the instance is no longer reachable (RTFM).

Then add the node with addInstance():

cluster.addInstance("clusterAdmin@M3:3306")

An alternative, that I prefer, is to use rescan() then addInstance().

Let’s see how to use it, assuming M3 crashed.
The current status is:

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_PARTIAL", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "(MISSING)"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

Let’s (re)scan the cluster and remove the missing node (M3):

MySQL localhost JS> cluster.rescan()
Rescanning the cluster...

Result of the rescanning operation for the 'default' ReplicaSet:
{
    "name": "default", 
    "newTopologyMode": null, 
    "newlyDiscoveredInstances": [], 
    "unavailableInstances": [
        {
            "host": "M3:3306", 
            "label": "M3:3306", 
            "member_id": "93303635-c8c0-11e9-9665-0242ac13000d"
        }
    ]
}

The instance 'M3:3306' is no longer part of the ReplicaSet.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('M3:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: Y
Removing instance from the cluster metadata...
The instance 'M3:3306' was successfully removed from the cluster metadata.

The new cluster status is:

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

Now it’s time to bring back M3, using the Clone distributed process:

MySQL localhost JS> cluster.addInstance("clusterAdmin@M3:3306")

WARNING: A GTID set check of the MySQL instance at 'M3:3306' determined
that it contains transactions that do not originate from the cluster, which
must be discarded before it can join the cluster.

M3:3306 has the following errant GTIDs that do not exist in the cluster:
93303635-c8c0-11e9-9665-0242ac13000d:1-2

WARNING: Discarding these extra GTID events can either be done manually or by completely
overwriting the state of M3:3306 with a physical snapshot from an
existing cluster member. To use this method by default, set the
'recoveryMethod' option to 'clone'.

Having extra GTID events is not expected, and it is recommended to investigate
this further and ensure that the data can be removed prior to choosing the
clone recovery method.

Please select a recovery method [C]lone/[A]bort (default Abort): C
Validating instance at M3:3306...

This instance reports its own address as M3:3306

Instance configuration is suitable.
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster...

Monitoring recovery process of the new cluster member. Press ^C to stop monitoring and let it continue in background.
Clone based state recovery is now in progress.

NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.

* Waiting for clone to finish...
NOTE: M3:3306 is being cloned from M1:3306
** Stage DROP DATA: Completed
** Clone Transfer  
    FILE COPY  ############################################################  100%  Completed
    PAGE COPY  ############################################################  100%  Completed
    REDO COPY  ############################################################  100%  Completed
** Stage RECOVERY: |
NOTE: M3:3306 is shutting down...

* Waiting for server restart... ready
* M3:3306 has restarted, waiting for clone to finish...
* Clone process has finished: 68.99 MB transferred in about 1 second (~inf TB/s)

Incremental distributed state recovery is now in progress.

* Waiting for distributed recovery to finish...
NOTE: 'M3:3306' is being recovered from 'M2:3306'
* Distributed recovery has finished

The instance 'M3:3306' was successfully added to the cluster.

Here we go!
Our 3 nodes MySQL InnoDB Cluster can tolerate up to 1 failure again:

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

Scenario 2 – Node Provisioning

  • A 3 nodes MySQL InnoDB Cluster – M1 / M2 / M3 in single primary mode
  • MySQL Router is configured to enable R/W connections on port 3306 and RO connections on port 3307
  • M1 is currently the primary (that is Read/Write mode)
  • M2 & M3 are currently the secondaries (that is Read Only mode)

The goal then is to add 2 new nodes: M4 & M5

So we have the 3 nodes MySQL 8.0.17 InnoDB Cluster that we used in the first part of this article. The cluster is up and running.

Actually adding new nodes is very close to what we have done previously.

The process is :

  • Deploy the new MySQL instance preferably already configured for Group Replication
  • Use the automatic distributed recovery process

Add node M4:

MySQL localhost JS> cluster.addInstance("clusterAdmin@M4:3306")
NOTE: A GTID set check of the MySQL instance at 'M4:3306' determined
that it is missing transactions that were purged from all cluster members.

Please select a recovery method [C]lone/[A]bort (default Abort): C
Validating instance at M4:3306...

This instance reports its own address as M4:3306

Instance configuration is suitable.
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster...

Monitoring recovery process of the new cluster member. Press ^C to stop monitoring and let it continue in background.
Clone based state recovery is now in progress.

NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.

* Waiting for clone to finish...
NOTE: M4:3306 is being cloned from M3:3306
** Stage DROP DATA: Completed
** Clone Transfer  
    FILE COPY  ############################################################  100%  Completed
    PAGE COPY  ############################################################  100%  Completed
    REDO COPY  ############################################################  100%  Completed
** Stage RECOVERY: \
NOTE: M4:3306 is shutting down...

* Waiting for server restart... ready
* M4:3306 has restarted, waiting for clone to finish...
* Clone process has finished: 116.24 MB transferred in 3 sec (38.75 MB/s)

Incremental distributed state recovery is now in progress.

* Waiting for distributed recovery to finish...
NOTE: 'M4:3306' is being recovered from 'M3:3306'
* Distributed recovery has finished

The instance 'M4:3306' was successfully added to the cluster.

As you can see we use the Clone process.
The new cluster status is:

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M4:3306": {
                "address": "M4:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

Same process for node M5:

MySQL localhost JS> cluster.addInstance("clusterAdmin@M5:3306")
...

Our final status is then:

MySQL localhost JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to 2 failures.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M4:3306": {
                "address": "M4:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }, 
            "M5:3306": {
                "address": "M5:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.17"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

References

Video:

Misc
Node 1 – Group Replication configuration variables

+-----------------------------------------------------+---------------------------------------------+
| Variable_name                                       | Value                                       |
+-----------------------------------------------------+---------------------------------------------+
| group_replication_allow_local_lower_version_join    | OFF                                         |
| group_replication_auto_increment_increment          | 7                                           |
| group_replication_autorejoin_tries                  | 0                                           |
| group_replication_bootstrap_group                   | OFF                                         |
| group_replication_clone_threshold                   | 9223372036854775807                         |
| group_replication_communication_debug_options       | GCS_DEBUG_NONE                              |
| group_replication_communication_max_message_size    | 10485760                                    |
| group_replication_components_stop_timeout           | 31536000                                    |
| group_replication_compression_threshold             | 1000000                                     |
| group_replication_consistency                       | EVENTUAL                                    |
| group_replication_enforce_update_everywhere_checks  | OFF                                         |
| group_replication_exit_state_action                 | READ_ONLY                                   |
| group_replication_flow_control_applier_threshold    | 25000                                       |
| group_replication_flow_control_certifier_threshold  | 25000                                       |
| group_replication_flow_control_hold_percent         | 10                                          |
| group_replication_flow_control_max_quota            | 0                                           |
| group_replication_flow_control_member_quota_percent | 0                                           |
| group_replication_flow_control_min_quota            | 0                                           |
| group_replication_flow_control_min_recovery_quota   | 0                                           |
| group_replication_flow_control_mode                 | QUOTA                                       |
| group_replication_flow_control_period               | 1                                           |
| group_replication_flow_control_release_percent      | 50                                          |
| group_replication_force_members                     |                                             |
| group_replication_group_name                        | bc39053e-c9e0-11e9-9797-0242ac13000b        |
| group_replication_group_seeds                       | M2:33061,M3:33061 |
| group_replication_gtid_assignment_block_size        | 1000000                                     |
| group_replication_ip_whitelist                      | AUTOMATIC                                   |
| group_replication_local_address                     | M1:33061                       |
| group_replication_member_expel_timeout              | 0                                           |
| group_replication_member_weight                     | 50                                          |
| group_replication_message_cache_size                | 1073741824                                  |
| group_replication_poll_spin_loops                   | 0                                           |
| group_replication_recovery_complete_at              | TRANSACTIONS_APPLIED                        |
| group_replication_recovery_get_public_key           | OFF                                         |
| group_replication_recovery_public_key_path          |                                             |
| group_replication_recovery_reconnect_interval       | 60                                          |
| group_replication_recovery_retry_count              | 10                                          |
| group_replication_recovery_ssl_ca                   |                                             |
| group_replication_recovery_ssl_capath               |                                             |
| group_replication_recovery_ssl_cert                 |                                             |
| group_replication_recovery_ssl_cipher               |                                             |
| group_replication_recovery_ssl_crl                  |                                             |
| group_replication_recovery_ssl_crlpath              |                                             |
| group_replication_recovery_ssl_key                  |                                             |
| group_replication_recovery_ssl_verify_server_cert   | OFF                                         |
| group_replication_recovery_use_ssl                  | ON                                          |
| group_replication_single_primary_mode               | ON                                          |
| group_replication_ssl_mode                          | REQUIRED                                    |
| group_replication_start_on_boot                     | ON                                          |
| group_replication_transaction_size_limit            | 150000000                                   |
| group_replication_unreachable_majority_timeout      | 0                                           |
+-----------------------------------------------------+---------------------------------------------+

Some optional but useful parameters from my node1’s my.cnf:

[mysqld]
report_port 	   = 3306
report_host 	   = M1

plugin_load        = group_replication.so
plugin-load-add    = mysql_clone.so

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

1

MySQL InnoDB Cluster – Recovering and provisioning with MySQL Enterprise Backup

July 11, 2019
Full moon by Olivier DASINI

Like I stated in my previous article – MySQL InnoDB Cluster – Recovering and provisioning with mysqldump :
“As the administrator of a cluster, among other tasks, you should be able to restore failed nodes and grow (or shrink) your cluster by adding (or removing) new nodes”.
Well, I still agree with myself 🙂

MySQL customers using a Commercial Edition have access to MySQL Enterprise Backup (MEB) which provide enterprise-grade physical backup and recovery for MySQL.

MEB delivers hot, online, non-blocking backups on multiple platforms including Linux, Windows, Mac & Solaris.
More details here.

Note:
If you want to know how to recover a node and/or how to provision nodes with mysqldump please read this blog post.

Context

Let’s make it as simple as possible 🙂
I’m using MySQL Enterprise 8.0.16, available for MySQL customers on My Oracle Support or on Oracle Software Delivery Cloud.

I have an InnoDB Cluster setup, up and running.
So my main assumption is that you already know what is MySQL Group Replication & MySQL InnoDB Cluster.
Additionally you can read this tutorial and this article from my colleague lefred or this one on Windows Platform from my colleague Ivan.

All nodes must have the right MySQL Enterprise Backup privileges.
Details here.

All nodes must have same values respectively for log-bin & relay-log:
For example: log-bin=binlog & relay-log=relaylog (on all nodes)

Note:
Depending on how you configured your MySQL InnoDB Cluster, some steps could be slightly different.

Scenario 1 – Node Recovering

  • A 3 nodes MySQL InnoDB Cluster – M1 / M2 / M3, in single primary mode
  • MySQL Router is configured to enable R/W connections on 3306 and RO connections on 3307
  • M1 is currently the primary (that is in Read/Write mode)
  • M2 & M3 are currently the secondaries (that is Read Only mode)
  • M1 failed! Assuming it is irreconcilably corrupted :'(
  • M2 & M3 are now the (new temporary) cluster

The goal then is to rebuild M1 and put it back to the cluster.

So like I said before we have a 3 nodes MySQL Enterprise 8.0.16 InnoDB Cluster up and running:

$ mysqlsh clusterAdmin@{mysqlRouterIP}:3306 --cluster
...

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

Then node M1 crashed… (status is “MISSING“) :

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "n/a", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "(MISSING)"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

M1 was the primary.
The cluster initiated an automatic database failover to elect a new primary… blablabla
Well you already know the story 🙂

After a while M1 is fixed and ready to be part of the cluster again.
To minimize the recovery time instead of using the last backup we prefer to take a fresh one.

Speaking of backup, I recommend reading the excellent blog post from my colleague JesperMySQL Backup Best Practices.

Let’s take a fresh backup on a secondary node (we could also used the primary).

MySQL Enterprise Backup is a very versatile tool and has many different configuration options.
For clarity I’ll use a simplistic command. Please read the MEB documentation for a more “production style” commande.
The backup will roughly looks like :

$ mysqlbackup --defaults-file=/etc/my.cnf  --with-timestamp --messages-logdir=/data/backups/ --backup-image=/data/backups/db.mbi backup-to-image
...
-------------------------------------------------------------
   Parameters Summary         
-------------------------------------------------------------
   Start LSN                  : 44603904
   End LSN                    : 44607630
-------------------------------------------------------------

mysqlbackup completed OK! with 1 warnings

Please note that it is highly recommended, in addition to the my.cnf to include in your backup process a copy of the auto.cnf and mysqld-auto.cnf configuration files for all nodes.

If you “lose” your auto.cnf file, don’t worry the server will generate a new one for you.
However the recovery process will be slightly different… (more on that below).

Now it’s time to restore this backup on node M1.

Because this server is part of a MySQL InnoDB Cluster, obviously there are some extra steps compare to a standalone server restoration.

Node Recovering

The node recovering process is simple:

  • Delete all contents of the MySQL Server data directory
  • Restore the backup
  • Restore the auto.cnf file
  • Restore the mysqld-auto.cnf file (if there is one)
  • Start the MySQL instance

This gives us on M1 , something like (simplified version, please adapt to your context) :

# Delete all contents of the MySQL Server data directory
$ rm -rf /var/lib/mysql/*


# Restore the backup
$ mysqlbackup --backup-dir=/exp/bck --datadir=/var/lib/mysql --backup-image=/data/backups/db.mbi copy-back-and-apply-log


# Restore the auto.cnf file
$ cp -p /data/backups/auto.cnf  /var/lib/mysql


# Restore the mysqld-auto.cnf file 
$ cp -p /data/backups/mysqld-auto.cnf  /var/lib/mysql


# Start the MySQL instance
service mysql start 

Then you can connect to the cluster and… see that the node M1 is recovering (“status: RECOVERING”) or if you’re not fast enough that the node is again part of the cluster (“status: ONLINE”):

$ mysqlsh clusterAdmin@{mysqlRouterIP}:3306 --cluster
...

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "n/a", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "RECOVERING", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}


// After a while 


MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Lost the auto.cnf file

As promised, the case when the auto.cnf configuration file is not restored.
In fact, in this case the cluster would see this node as a new node (because the server will have a new UUID).
So the process for putting it back is different.

Also note that if you loose the mysqld-auto.cnf file you’ll probably need to configure (again) the server to be Group Replication aware.

To begin, you must stop the Group Replication plugin on the node that needs to be restored (M1):

$ mysqlsh root@M1 --sql -e"STOP GROUP_REPLICATION;"

Then if necessary you can check the configuration and/or configure the node:

MySQL JS> dba.checkInstanceConfiguration("root@M1:3306")
...


MySQL JS> dba.configureInstance("root@M1:3306")
...

You need to remove the old node from the InnoDB Cluster metadata:

MySQL JS> cluster.rescan()
Rescanning the cluster...

Result of the rescanning operation for the 'default' ReplicaSet:
{
    "name": "default", 
    "newTopologyMode": null, 
    "newlyDiscoveredInstances": [], 
    "unavailableInstances": [
        {
            "host": "M1:3306", 
            "label": "M1:3306", 
            "member_id": "6ad8caed-9d90-11e9-96e5-0242ac13000b"
        }
    ]
}

The instance 'M1:3306' is no longer part of the ReplicaSet.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('M1:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: Y
Removing instance from the cluster metadata...
The instance 'M1:3306' was successfully removed from the cluster metadata.

Add the “new” node:

MySQL JS> cluster.addInstance("clusterAdmin@M1:3306")
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster ...

Validating instance at M1:3306...

This instance reports its own address as M1

Instance configuration is suitable.
The instance 'clusterAdmin@M1:3306' was successfully added to the cluster.

Check – and after the recovery stage, the “new” node is online:

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "n/a", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "RECOVERING", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

// After a while...

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Et voilà!

An simple alternative to deal with this “unpleasantness“, if you don’t need to configure the node, is basically to remove the node and add it again.
Below an example with M3:

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_PARTIAL", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "n/a", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "(MISSING)"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}


MySQL JS> cluster.removeInstance("root@M3:3306")
The instance will be removed from the InnoDB cluster. Depending on the instance
being the Seed or not, the Metadata session might become invalid. If so, please
start a new session to the Metadata Storage R/W instance.

Instance 'M3:3306' is attempting to leave the cluster...

The instance 'M3:3306' was successfully removed from the cluster.


MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}


MySQL JS> cluster.addInstance("root@M3:3306")
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster ...

Validating instance at M3:3306...

This instance reports its own address as M3

Instance configuration is suitable.
The instance 'root@M3:3306' was successfully added to the cluster.


MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Scenario 2 – Node Provisioning

  • A 3 nodes MySQL InnoDB Cluster – M1 / M2 / M3 in single primary mode
  • MySQL Router is configured to enable R/W connections on port 3306 and RO connections on port 3307
  • M2 is currently the primary (that is Read/Write mode)
  • M1 & M3 are currently the secondaries (that is Read Only mode)

The goal then is to add 2 new nodes: M4 & M5

So we have the 3 nodes MySQL 8.0.16 InnoDB Cluster that we used in the first part of this article. And it is up and running.

Actually adding new nodes is very close to what we have done previously.

The process is :

  • Deploy the new MySQL instance preferably already configured for Group Replication
  • Restore the data in the way that we have seen previously

Check the configuration and the configuration itself can be done respectively with dba.checkInstanceConfiguration() and dba.configure() functions (and it could also be useful to use checkInstanceState()see this article).
e.g. on node M4:

$ mysqlsh clusterAdmin@M4:3306 -- dba checkInstanceConfiguration
Validating MySQL instance at M4:3306 for use in an InnoDB cluster...

This instance reports its own address as M4

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance 'M4:3306' is valid for InnoDB cluster usage.

{
    "status": "ok"
}

Restore the backup on M4, the new node:

# Restore the backup on M4
$ mysqlbackup --backup-dir=/exp/bck --datadir=/var/lib/mysql --backup-image=/data/backups/db.mbi copy-back-and-apply-log

An finally add the new node (M4):

// Add the new instance
MySQL JS> cluster.addInstance("clusterAdmin@M4:3306")


// Check
MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M4:3306": {
                "address": "M4:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Same process for the last node, M5.

You end up with a 5 nodes MySQL InnoDB Cluster \o/:

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to 2 failures.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M4:3306": {
                "address": "M4:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M5:3306": {
                "address": "M5:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

This is one way to do node recovery and provisioning using MySQL Enterprise Backup.

You may also take a look on this article: InnoDB Cluster: Recovering an instance with MySQL Enterprise Backup – from my colleague Keith.

If you are not yet a MySQL customer, and therefore you are not able to enjoy our advanced features/tools and technical support, so probably that mysqldump could fit here. Please read: MySQL InnoDB Cluster – Recovering and provisioning with mysqldump.

Note that some new features are coming in this area… 🙂
Stay tuned!

References

Misc
Node 3 – Group Replication configuration

SQL> SHOW VARIABLES LIKE 'group_replication%';
+-----------------------------------------------------+--------------------------------------+
| Variable_name                                       | Value                                |
+-----------------------------------------------------+--------------------------------------+
| group_replication_allow_local_lower_version_join    | OFF                                  |
| group_replication_auto_increment_increment          | 7                                    |
| group_replication_autorejoin_tries                  | 0                                    |
| group_replication_bootstrap_group                   | OFF                                  |
| group_replication_communication_debug_options       | GCS_DEBUG_NONE                       |
| group_replication_communication_max_message_size    | 10485760                             |
| group_replication_components_stop_timeout           | 31536000                             |
| group_replication_compression_threshold             | 1000000                              |
| group_replication_consistency                       | EVENTUAL                             |
| group_replication_enforce_update_everywhere_checks  | OFF                                  |
| group_replication_exit_state_action                 | READ_ONLY                            |
| group_replication_flow_control_applier_threshold    | 25000                                |
| group_replication_flow_control_certifier_threshold  | 25000                                |
| group_replication_flow_control_hold_percent         | 10                                   |
| group_replication_flow_control_max_quota            | 0                                    |
| group_replication_flow_control_member_quota_percent | 0                                    |
| group_replication_flow_control_min_quota            | 0                                    |
| group_replication_flow_control_min_recovery_quota   | 0                                    |
| group_replication_flow_control_mode                 | QUOTA                                |
| group_replication_flow_control_period               | 1                                    |
| group_replication_flow_control_release_percent      | 50                                   |
| group_replication_force_members                     |                                      |
| group_replication_group_name                        | 28f66c86-9d66-11e9-876e-0242ac13000b |
| group_replication_group_seeds                       | M1:33061,M2:33061                    |
| group_replication_gtid_assignment_block_size        | 1000000                              |
| group_replication_ip_whitelist                      | AUTOMATIC                            |
| group_replication_local_address                     | M3:33061                             |
| group_replication_member_expel_timeout              | 0                                    |
| group_replication_member_weight                     | 50                                   |
| group_replication_message_cache_size                | 1073741824                           |
| group_replication_poll_spin_loops                   | 0                                    |
| group_replication_recovery_complete_at              | TRANSACTIONS_APPLIED                 |
| group_replication_recovery_get_public_key           | OFF                                  |
| group_replication_recovery_public_key_path          |                                      |
| group_replication_recovery_reconnect_interval       | 60                                   |
| group_replication_recovery_retry_count              | 10                                   |
| group_replication_recovery_ssl_ca                   |                                      |
| group_replication_recovery_ssl_capath               |                                      |
| group_replication_recovery_ssl_cert                 |                                      |
| group_replication_recovery_ssl_cipher               |                                      |
| group_replication_recovery_ssl_crl                  |                                      |
| group_replication_recovery_ssl_crlpath              |                                      |
| group_replication_recovery_ssl_key                  |                                      |
| group_replication_recovery_ssl_verify_server_cert   | OFF                                  |
| group_replication_recovery_use_ssl                  | ON                                   |
| group_replication_single_primary_mode               | ON                                   |
| group_replication_ssl_mode                          | REQUIRED                             |
| group_replication_start_on_boot                     | ON                                   |
| group_replication_transaction_size_limit            | 150000000                            |
| group_replication_unreachable_majority_timeout      | 0                                    |
+-----------------------------------------------------+--------------------------------------+

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

2

MySQL InnoDB Cluster – Recovering and provisioning with mysqldump

July 9, 2019
Butterfly by Olivier DASINI

As the administrator of a cluster, among other tasks, you should be able to restore failed nodes and grow (or shrink) your cluster by adding (or removing) new nodes.

In MySQL, as a backup tool (and if your amount of data is not too big), you can use mysqldump a client utility that performs logical backups.
The results are SQL statements that reproduce the original schema objects and data.

For substantial amounts of data however, a physical backup solution such as MySQL Enterprise Backup is faster, particularly for the restore operation.
Hey! guess what? You can read: MySQL InnoDB Cluster – Recovering and provisioning with MySQL Enterprise Backup

Context

Let’s make it as simple as possible 🙂
I’m using MySQL 8.0.16.
I have an InnoDB Cluster setup – up and running.
So my main assumption is that you already know what is MySQL Group Replication & MySQL InnoDB Cluster.
Additionally you can read this tutorial and this article from my colleague lefred or this one on Windows Platform from my colleague Ivan.

Note:

Depending on how you configured your MySQL InnoDB Cluster, some steps could be slightly different.

Scenario 1 – Node Recovering

  • A 3 nodes MySQL InnoDB Cluster – M1 / M2 / M3, in single primary mode
  • MySQL Router is configured to enable R/W connections on port 3306 and RO connections on port 3307
  • M1 is currently the primary (so in Read/Write mode)
  • M2 & M3 are currently the secondaries (i.e. Read Only mode)
  • M1 failed! Some tables are irreconcilably corrupted 🙁
  • M2 & M3 are now the (new temporary) cluster

The goal then is to rebuild M1 and put it back to the cluster.

So like I stated we have a 3 nodes MySQL 8.0.16 InnoDB Cluster up and running:

$ mysqlsh clusterAdmin@{mysqlRouterIP}:3306 --cluster
...

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M1:3306"
}

Then M1 failed (status is “MISSING“) :

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "n/a", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "(MISSING)"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

M1 was the primary.
The cluster initiated an automatic database failover to elect a new primary… blablabla
Anyway you already know the story 🙂

After a while M1 is finally fixed and ready to be part of the cluster again.
To minimize the recovery time instead of using the last dump we prefer to take a fresh one.

Speaking of backup, I recommend reading the excellent blog post from my colleague JesperMySQL Backup Best Practices.

Let’s take a fresh dump on a secondary node using MySQL Router, on port 3307 with my custom configuration (we could also used the primary).
The dump will roughly looks like :

mysqldump --defaults-file=/etc/my.cnf -u mdump -p -h {mysqlRouterIP} -P 3307 --all-databases --routines --events --single-transaction --flush-privileges --hex-blob --log-error=/var/log/mysqldump.log --result-file=/data/backups/dump.sql

Minimum privileges for the MySQL user with which mysqldump connects to the server

Actually this minimum privileges depends on what object you wan to dump and what for the dump it is.

However the following privileges should be fine for most of the classical usages:

GRANT SELECT, SHOW VIEW, EVENT, TRIGGER, LOCK TABLES, CREATE, ALTER, RELOAD, REPLICATION CLIENT, REPLICATION_SLAVE_ADMIN ON *.* TO <dumpUser>

Please note that it is highly recommended, in addition to the my.cnf, to include in your backup process a copy of the auto.cnf and mysqld-auto.cnf configuration files for all nodes.

If you “lose” your auto.cnf file, don’t worry the server will generate a new one for you.
However the recovery process will be slightly different… (more on that below).

Now it’s time to restore this dump on node M1.

Because this server is part of InnoDB Cluster, clearly there are some extra steps compare to a standalone server restoration.

Restore the data

First, restore the data on M1:

  • It’s a logical restoration so the server to restore must be up 😀
  • Group Replication plugin must be stopped
    • STOP GROUP_REPLICATION;
  • Disable logging to the binary log
    • SET SQL_LOG_BIN=0;
  • Delete binary log files
    • RESET MASTER;
  • Clear the master info and relay log info repositories and deletes all the relay log files
    • RESET SLAVE;
  • Enable updates
    • SET GLOBAL super_read_only=0;
  • Load the dump
    • source /data/backups/dump.sql

This gives us:

mysqlsh root@M1:3306 --sql
...

M1 SQL> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (12.04 sec)

M1 SQL> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)

M1 SQL> RESET MASTER; 
Query OK, 0 rows affected (0.06 sec)

M1 SQL> RESET SLAVE;
Query OK, 0 rows affected (0.13 sec)

M1 SQL> SET GLOBAL super_read_only=0;
Query OK, 0 rows affected (0.00 sec)

M1 SQL> source /data/backups/dump.sql
Query OK, 0 rows affected (0.00 sec)
...

Put the node back to the cluster

Second, put the node back to the cluster.
Connect to MySQL Router on the primary (port 3306 in my case):

$ mysqlsh clusterAdmin@{mysqlRouterIP}:3306 --cluster
...

MySQL JS> cluster.rejoinInstance("clusterAdmin@M1:3306")
Rejoining the instance to the InnoDB cluster. Depending on the original problem that made the instance unavailable, the rejoin operation might not be successful and further manual steps will be needed to fix the underlying problem.

Please monitor the output of the rejoin operation and take necessary action if the instance cannot rejoin.

Rejoining instance to the cluster ...

The instance 'M1:3306' was successfully rejoined on the cluster.

Now you should check the new cluster status

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Note:

The cluster status of the restored node will be in the status “RECOVERING” before to be “ONLINE”.

Lost the auto.cnf file

As promised, the case when the auto.cnf configuration file is not restored.
In fact, in that case the cluster would see this node as a new node (because the server will have a new UUID).
So the process for putting it back is different.

Also note that if you loose the mysqld-auto.cnf file you’ll probably need to configure (again) the server to be Group Replication aware.

So basically the process is doing some cleaning and then add the old node like it was a new node.


Assuming Group Replication plugin is stopped on M1:

// Check the configuration even more important if you lost mysqld-auto.cnf :)
MySQL JS> dba.checkInstanceConfiguration('clusterAdmin@M1:3306')
Validating MySQL instance at M1:3306 for use in an InnoDB cluster...

This instance reports its own address as M1

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance 'M1:3306' is valid for InnoDB cluster usage.

{
    "status": "ok"
}


// If needed configure your instance
MySQL JS> dba.configureInstance('clusterAdmin@M1:3306')


// Remove the old node from the cluster metadata
MySQL JS> cluster.rescan()
Rescanning the cluster...

Result of the rescanning operation for the 'default' ReplicaSet:
{
    "name": "default", 
    "newTopologyMode": null, 
    "newlyDiscoveredInstances": [], 
    "unavailableInstances": [
        {
            "host": "M1:3306", 
            "label": "M1:3306", 
            "member_id": "a3f1ee50-9be3-11e9-a3fe-0242ac13000b"
        }
    ]
}

The instance 'M1:3306' is no longer part of the ReplicaSet.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('M1:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: Y
Removing instance from the cluster metadata...
The instance 'M1:3306' was successfully removed from the cluster metadata.


// Add the new instance
MySQL JS> cluster.addInstance("clusterAdmin@M1:3306")
A new instance will be added to the InnoDB cluster. Depending on the amount of data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster ...

Validating instance at M1:3306...

This instance reports its own address as M1

Instance configuration is suitable.
The instance 'clusterAdmin@M1:3306' was successfully added to the cluster.


// Check
MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Et voilà !

Scenario 2 – Node Provisioning

  • A 3 nodes MySQL InnoDB Cluster – M1 / M2 / M3 in single primary mode
  • MySQL Router is configured to enable R/W connections on port 3306 and RO connections on port 3307
  • M2 is currently the primary in Read/Write mode
  • M1 & M3 are currently the secondaries in Read Only mode

The goal then is to add 2 new nodes: M4 & M5

So we have the 3 nodes MySQL 8.0.16 InnoDB Cluster that we used in the first part of this article.
And it is up and running.

Actually adding new nodes is very close to what we have done previously.

The process is :

  • Deploy the new MySQL instance preferably already configured for Group Replication
  • Restore the data in the way that we have seen previously

Check the configuration and the configuration itself can be done respectively with dba.checkInstanceConfiguration() and dba.configure() functions (and it could also be useful to use checkInstanceState()see this article).
e.g. on node M4:

$ mysqlsh clusterAdmin@M4:3306 -- dba checkInstanceConfiguration
Validating MySQL instance at M4:3306 for use in an InnoDB cluster...

This instance reports its own address as M4

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance 'M4:3306' is valid for InnoDB cluster usage.

{
    "status": "ok"
}

The first part of the restore process is the same than the one we have seen:

mysqlsh root@M4:3306 --sql
...

M4 SQL> STOP GROUP_REPLICATION; -- if necessary
Query OK, 0 rows affected (12.04 sec)

M4 SQL> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)

M4 SQL> RESET MASTER; 
Query OK, 0 rows affected (0.06 sec)

M4 SQL> RESET SLAVE;
Query OK, 0 rows affected (0.13 sec)

M4 SQL> SET GLOBAL super_read_only=0;
Query OK, 0 rows affected (0.00 sec)

M4 SQL> source /data/backups/dump.sql
Query OK, 0 rows affected (0.00 sec)
...

For the second part, we will now add the new node (M4) :

// Add the new instance
MySQL JS> cluster.addInstance("clusterAdmin@M4:3306")


// Check
MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M4:3306": {
                "address": "M4:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

Note:

If necessary, just before the addInstance() you can do the checkInstanceConfiguration() and configureInstance().

Same process for the last node, M5.

And finally you got a 5 nodes MySQL InnoDB Cluster \o/:

MySQL JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "M2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to 2 failures.", 
        "topology": {
            "M1:3306": {
                "address": "M1:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M2:3306": {
                "address": "M2:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M3:3306": {
                "address": "M3:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M4:3306": {
                "address": "M4:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }, 
            "M5:3306": {
                "address": "M5:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.16"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "M2:3306"
}

This is one way to do node recovery and provisioning using mysqldump, when your amount of data is not big.

When logical backup is not efficient it is time to use an hot, online and non-blocking physical backup tool like MySQL Enterprise Backup.
Actually, it’s even easier!
Please read: MySQL InnoDB Cluster – Recovering and provisioning with MySQL Enterprise Backup.

A final word to say that some new features are coming in this area… 🙂
Stay tuned!

References

Misc
Node 1 – Group Replication configuration

mysql> SHOW VARIABLES LIKE 'group_replication%';
+-----------------------------------------------------+-------------------------------------+
| Variable_name                                       | Value                               |
+-----------------------------------------------------+-------------------------------------+
| group_replication_allow_local_lower_version_join    | OFF                                 |
| group_replication_auto_increment_increment          | 7                                   |
| group_replication_autorejoin_tries                  | 1                                   |
| group_replication_bootstrap_group                   | OFF                                 |
| group_replication_communication_debug_options       | GCS_DEBUG_NONE                      |
| group_replication_communication_max_message_size    | 10485760                            |
| group_replication_components_stop_timeout           | 31536000                            |
| group_replication_compression_threshold             | 1000000                             |
| group_replication_consistency                       | EVENTUAL                            |
| group_replication_enforce_update_everywhere_checks  | OFF                                 |
| group_replication_exit_state_action                 | READ_ONLY                           |
| group_replication_flow_control_applier_threshold    | 25000                               |
| group_replication_flow_control_certifier_threshold  | 25000                               |
| group_replication_flow_control_hold_percent         | 10                                  |
| group_replication_flow_control_max_quota            | 0                                   |
| group_replication_flow_control_member_quota_percent | 0                                   |
| group_replication_flow_control_min_quota            | 0                                   |
| group_replication_flow_control_min_recovery_quota   | 0                                   |
| group_replication_flow_control_mode                 | QUOTA                               |
| group_replication_flow_control_period               | 1                                   |
| group_replication_flow_control_release_percent      | 50                                  |
| group_replication_force_members                     |                                     |
| group_replication_group_name                        | d1b109bf-9be3-11e9-9ea2-0242ac13000b|
| group_replication_group_seeds                       | M2:33061,M3:33061,M4:33061,M5:33061 |
| group_replication_gtid_assignment_block_size        | 1000000                             |
| group_replication_ip_whitelist                      | AUTOMATIC                           |
| group_replication_local_address                     | M1:33061                            |
| group_replication_member_expel_timeout              | 0                                   |
| group_replication_member_weight                     | 50                                  |
| group_replication_message_cache_size                | 1073741824                          |
| group_replication_poll_spin_loops                   | 0                                   |
| group_replication_recovery_complete_at              | TRANSACTIONS_APPLIED                |
| group_replication_recovery_get_public_key           | OFF                                 |
| group_replication_recovery_public_key_path          |                                     |
| group_replication_recovery_reconnect_interval       | 60                                  |
| group_replication_recovery_retry_count              | 10                                  |
| group_replication_recovery_ssl_ca                   |                                     |
| group_replication_recovery_ssl_capath               |                                     |
| group_replication_recovery_ssl_cert                 |                                     |
| group_replication_recovery_ssl_cipher               |                                     |
| group_replication_recovery_ssl_crl                  |                                     |
| group_replication_recovery_ssl_crlpath              |                                     |
| group_replication_recovery_ssl_key                  |                                     |
| group_replication_recovery_ssl_verify_server_cert   | OFF                                 |
| group_replication_recovery_use_ssl                  | ON                                  |
| group_replication_single_primary_mode               | ON                                  |
| group_replication_ssl_mode                          | REQUIRED                            |
| group_replication_start_on_boot                     | ON                                  |
| group_replication_transaction_size_limit            | 150000000                           |
| group_replication_unreachable_majority_timeout      | 0                                   |
+-----------------------------------------------------+-------------------------------------+

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

2