Check the MySQL server startup configuration

June 11, 2019
Caribbean by Olivier DASINI

Since 8.0.16, MySQL Server supports a validate-config option that enables the startup configuration to be checked for problems without running the server in normal operational mode:

  • If no errors are found, the server terminates with an exit code of 0.
  • If an error is found, the server displays a diagnostic message and terminates with an exit code of 1.

validate-config can be used any time, but is particularly useful after an upgrade, to check whether any options previously used with the older server are considered by the upgraded server to be deprecated or obsolete.

First let’s get some information about my MySQL version and configuration.

$ mysqld --help --verbose | head -n13
mysqld  Ver 8.0.16 for Linux on x86_64 (MySQL Community Server - GPL)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Starts the MySQL database server.

Usage: mysqld [OPTIONS]

Default options are read from the following files in the given order:
/etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf 

I’m using MySQL 8.0.16.
The default options configuration are read in the given order from :

  • /etc/my.cnf
  • /etc/mysql/my.cnf
  • /usr/local/mysql/etc/my.cnf
  • ~/.my.cnf

Now let’s check my MySQL server startup configuration :

$ mysqld --validate-config
$

No error !
No output, everything looks good.
My server will start with this configuration.

If there is an error, the server terminates.
The output is obviously different :

$ mysqld --validate-config --fake-option
2019-06-05T15:10:08.653775Z 0 [ERROR] [MY-000068] [Server] unknown option '--fake-option'.
2019-06-05T15:10:08.653822Z 0 [ERROR] [MY-010119] [Server] Aborting

Usually your configuration options are written in your configuration file (in general named my.cnf).
Therefore you can also use validate-config in this context :

$ mysqld --defaults-file=/etc/my.cnf --validate-config 
$ 

Note:

defaults-file, if specified, must be the first option on the command line.

Furthermore you can handle the verbosity using log_error_verbosity :

  • A value of 1 gives you ERROR
  • A value of 2 gives you ERROR & WARNING
  • A value of 3 gives you ERROR, WARNING & INFORMATION (i.e. note)

With a verbosity of 2, in addition to errors, we will be able to display warnings :

$ mysqld --defaults-file=/etc/my.cnf --validate-config  --log_error_verbosity=2
2019-06-05T15:53:42.785422Z 0 [Warning] [MY-011068] [Server] The syntax 'expire-logs-days' is deprecated and will be removed in a future release. Please use binlog_expire_logs_seconds instead.
2019-06-05T15:53:42.785660Z 0 [Warning] [MY-010101] [Server] Insecure configuration for --secure-file-priv: Location is accessible to all OS users. Consider choosing a different directory.

Nothing very serious, however it is a best practice to delete warnings, when possible.

So I fixed these warnings :

$ mysqld --defaults-file=/etc/my.cnf --validate-config  --log_error_verbosity=2
2019-06-05T16:04:32.363297Z 0 [ERROR] [MY-000067] [Server] unknown variable 'binlog_expire_logs_second=7200'.
2019-06-05T16:04:32.363369Z 0 [ERROR] [MY-010119] [Server] Aborting

Oops!!! There is a typo… :-0
I wrote binlog_expire_logs_second instead of binlog_expire_logs_seconds.
(I forgot the final “s”)

In that case, my MySQL server could not start.
Thanks to validate-config !
I can now avoid some unpleasant experience when starting the server 🙂

With the correct spelling I have now no error and no warning :

$ mysqld --defaults-file=/etc/my.cnf --validate-config  --log_error_verbosity=2
$ 

Note that you could also use verbosity 3

$ mysqld --defaults-file=/etc/my.cnf --validate-config  --log_error_verbosity=3
2019-06-05T16:02:03.589770Z 0 [Note] [MY-010747] [Server] Plugin 'FEDERATED' is disabled.
2019-06-05T16:02:03.590719Z 0 [Note] [MY-010733] [Server] Shutting down plugin 'MyISAM'
2019-06-05T16:02:03.590763Z 0 [Note] [MY-010733] [Server] Shutting down plugin 'CSV'

validate-config is convenient and can be very useful.
It may be worthwhile to include it in your upgrade process.

References

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

1

MySQL 8.0.16 New Features Summary

June 5, 2019
Sakila mozaik by Olivier DASINI

Presentation of some of the new features of MySQL 8.0.16 released on April 25, 2019.

Agenda

  • mysql_upgrade is no longer necessary
  • CHECK Constraints
  • Constant-Folding Optimization
  • SYSTEM_USER & partial_revokes
  • Chinese collation for utf8mb4
  • Performance Schema keyring_keys table
  • MySQL Shell Enhancements
  • MySQL Router Enhancements
  • InnoDB Cluster Enhancements
  • Group Replication Enhancements
  • Size of the binary tarball for Linux
  • Server quick settings validation

Download this presentation and others on my SlideShare account.

I’ve also made a video (in French) on my Youtube channel.

You can subscribe here.

That might interest you

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

4

MySQL InnoDB Cluster – HowTo #2 – Validate an instance

May 21, 2019
Sakila HA by Olivier DASINI

How do I… Validate an instance for MySQL InnoDB Cluster usage?

Short answer

Use:

checkInstanceConfiguration()

Long answer…

In this article I assuming you already know what is MySQL Group Replication & MySQL InnoDB Cluster.
Additionally you can read this tutorial and this article from my colleague lefred or this one on Windows Platform from my colleague Ivan.

During the cluster creation process or when you want to add a node to a running cluster, the chosen MySQL instance must be valid for an InnoDB Cluster usage.
That is, be compliant with Group Replication requirements.

MySQL Shell provide a simple and easy way to check if your instance is valid: checkInstanceConfiguration()

I’m using MySQL Shell 8.0.16:

$ mysqlsh
MySQL Shell 8.0.16

Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.

MySQL JS> 

In this scenario my cluster is not created yet. However the logic would have been the same for adding a node to a running cluster.

Ask for help

The built-in help is simply awesome!

MySQL JS> dba.help('checkInstanceConfiguration')
NAME
      checkInstanceConfiguration - Validates an instance for MySQL InnoDB
                                   Cluster usage.

SYNTAX
      dba.checkInstanceConfiguration(instance[, options])

WHERE
      instance: An instance definition.
      options: Data for the operation.

RETURNS
       A descriptive text of the operation result.

DESCRIPTION
      This function reviews the instance configuration to identify if it is
      valid for usage with group replication. Use this to check for possible
      configuration issues on MySQL instances before creating a cluster with
      them or adding them to an existing cluster.

      The instance definition is the connection data for the instance.

      For additional information on connection data use \? connection.

      Only TCP/IP connections are allowed for this function.

      The options dictionary may contain the following options:

      - mycnfPath: Optional path to the MySQL configuration file for the
        instance. Alias for verifyMyCnf
      - verifyMyCnf: Optional path to the MySQL configuration file for the
        instance. If this option is given, the configuration file will be
        verified for the expected option values, in addition to the global
        MySQL system variables.
      - password: The password to get connected to the instance.
      - interactive: boolean value used to disable the wizards in the command
        execution, i.e. prompts are not provided to the user and confirmation
        prompts are not shown.

      The connection password may be contained on the instance definition,
      however, it can be overwritten if it is specified on the options.

      The returned descriptive text of the operation result indicates whether
      the instance is valid for InnoDB Cluster usage or not. If not, a table
      containing the following information is presented:

      - Variable: the invalid configuration variable.
      - Current Value: the current value for the invalid configuration
        variable.
      - Required Value: the required value for the configuration variable.
      - Note: the action to be taken.

      The note can be one of the following:

      - Update the config file and update or restart the server variable.
      - Update the config file and restart the server.
      - Update the config file.
      - Update the server variable.
      - Restart the server.

EXCEPTIONS
      ArgumentError in the following scenarios:

      - If the instance parameter is empty.
      - If the instance definition is invalid.
      - If the instance definition is a connection dictionary but empty.

      RuntimeError in the following scenarios:

      - If the instance accounts are invalid.
      - If the instance is offline.
      - If the instance is already part of a Replication Group.
      - If the instance is already part of an InnoDB Cluster.
      - If the given the instance cannot be used for Group Replication.

Check Instance Configuration

In order to check a MySQL instance I must connect to that instance, either by connecting to that instance with MySQL Shell or by providing the connection data to the function:

MySQL JS> dba.checkInstanceConfiguration('root@172.20.0.11')
Validating MySQL instance at 172.20.0.11:3306 for use in an InnoDB cluster...

This instance reports its own address as mysql_node1

Checking whether existing tables comply with Group Replication requirements...
WARNING: The following tables do not have a Primary Key or equivalent column: 
test.squares, test.people, test.animal

Group Replication requires tables to use InnoDB and have a PRIMARY KEY or PRIMARY KEY Equivalent (non-null unique key). Tables that do not follow these requirements will be readable but not updateable when used with Group Replication. If your applications make updates (INSERT, UPDATE or DELETE) to these tables, ensure they use the InnoDB storage engine and have a PRIMARY KEY or PRIMARY KEY Equivalent.

Checking instance configuration...

Some configuration options need to be fixed:
+--------------------------+---------------+----------------+--------------------------------------------------+
| Variable                 | Current Value | Required Value | Note                                             |
+--------------------------+---------------+----------------+--------------------------------------------------+
| binlog_checksum          | CRC32         | NONE           | Update the server variable                       |
| enforce_gtid_consistency | OFF           | ON             | Update read-only variable and restart the server |
| gtid_mode                | OFF           | ON             | Update read-only variable and restart the server |
| server_id                | 1             | <unique ID>    | Update read-only variable and restart the server |
+--------------------------+---------------+----------------+--------------------------------------------------+

Some variables need to be changed, but cannot be done dynamically on the server.
Please use the dba.configureInstance() command to repair these issues.

{
    "config_errors": [
        {
            "action": "server_update", 
            "current": "CRC32", 
            "option": "binlog_checksum", 
            "required": "NONE"
        }, 
        {
            "action": "restart", 
            "current": "OFF", 
            "option": "enforce_gtid_consistency", 
            "required": "ON"
        }, 
        {
            "action": "restart", 
            "current": "OFF", 
            "option": "gtid_mode", 
            "required": "ON"
        }, 
        {
            "action": "restart", 
            "current": "1", 
            "option": "server_id", 
            "required": "<unique ID>"
        }
    ], 
    "status": "error"
}

The output depends on the instance current status.
In my case 3 tables do not meet the requirements because of lack of Primary key (or non-null unique key).
Also I need to set correctly 4 variables and I must restart the MySQL instance because of 3 of them.

Automation

It is not always convenient (or recommended) to do these kind of task manually.
MySQL Shell is built in regards to DevOps usage :

$ mysqlsh -e "dba.checkInstanceConfiguration('root@172.20.0.12')"
Validating MySQL instance at 172.20.0.12:3306 for use in an InnoDB cluster...

This instance reports its own address as mysql_node2

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...

Some configuration options need to be fixed:
+--------------------------+---------------+----------------+--------------------------------------------------+
| Variable                 | Current Value | Required Value | Note                                             |
+--------------------------+---------------+----------------+--------------------------------------------------+
| binlog_checksum          | CRC32         | NONE           | Update the server variable                       |
| enforce_gtid_consistency | OFF           | ON             | Update read-only variable and restart the server |
| gtid_mode                | OFF           | ON             | Update read-only variable and restart the server |
| server_id                | 1             | <unique ID>    | Update read-only variable and restart the server |
+--------------------------+---------------+----------------+--------------------------------------------------+

Some variables need to be changed, but cannot be done dynamically on the server.
Please use the dba.configureInstance() command to repair these issues.

Or even more practical:

$ mysqlsh -- dba checkInstanceConfiguration --user=root --host=172.20.0.13
Validating MySQL instance at 172.20.0.13:3306 for use in an InnoDB cluster...

This instance reports its own address as mysql_node3

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...

Some configuration options need to be fixed:
+--------------------------+---------------+----------------+--------------------------------------------------+
| Variable                 | Current Value | Required Value | Note                                             |
+--------------------------+---------------+----------------+--------------------------------------------------+
| binlog_checksum          | CRC32         | NONE           | Update the server variable                       |
| enforce_gtid_consistency | OFF           | ON             | Update read-only variable and restart the server |
| gtid_mode                | OFF           | ON             | Update read-only variable and restart the server |
| server_id                | 1             | <unique ID>    | Update read-only variable and restart the server |
+--------------------------+---------------+----------------+--------------------------------------------------+

Some variables need to be changed, but cannot be done dynamically on the server.
Please use the dba.configureInstance() command to repair these issues.

{
    "config_errors": [
        {
            "action": "server_update", 
            "current": "CRC32", 
            "option": "binlog_checksum", 
            "required": "NONE"
        }, 
        {
            "action": "restart", 
            "current": "OFF", 
            "option": "enforce_gtid_consistency", 
            "required": "ON"
        }, 
        {
            "action": "restart", 
            "current": "OFF", 
            "option": "gtid_mode", 
            "required": "ON"
        }, 
        {
            "action": "restart", 
            "current": "1", 
            "option": "server_id", 
            "required": "<unique ID>"
        }
    ], 
    "status": "error"
}

An other option is to create a script and pass it to MySQL Shell.
A very simple (and naive) example could be:

$ cat /tmp/servers.js
dba.checkInstanceConfiguration('root@172.20.0.11');
dba.checkInstanceConfiguration('root@172.20.0.12');
dba.checkInstanceConfiguration('root@172.20.0.13');

then process the file:

$ mysqlsh  -f /tmp/servers.js
Validating MySQL instance at 172.20.0.11:3306 for use in an InnoDB cluster...

This instance reports its own address as mysql_node1

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance '172.20.0.11:3306' is valid for InnoDB cluster usage.

Validating MySQL instance at 172.20.0.12:3306 for use in an InnoDB cluster...

This instance reports its own address as mysql_node2

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance '172.20.0.12:3306' is valid for InnoDB cluster usage.

Validating MySQL instance at 172.20.0.13:3306 for use in an InnoDB cluster...

This instance reports its own address as mysql_node3

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance '172.20.0.13:3306' is valid for InnoDB cluster usage.

In the previous scenario all the MySQL instances was set properly before the check.

Note that all that has been done previously in Javascript can also be done in Python :

MySQL 172.20.0.11:33060+ JS> \py
Switching to Python mode...

MySQL 172.20.0.11:33060+ Py> dba.check_instance_configuration()
Validating MySQL instance at 172.20.0.11:3306 for use in an InnoDB cluster...

This instance reports its own address as mysql_node1

Checking whether existing tables comply with Group Replication requirements...
No incompatible tables detected

Checking instance configuration...
Instance configuration is compatible with InnoDB cluster

The instance '172.20.0.11:3306' is valid for InnoDB cluster usage.

{
    "status": "ok"
}
$ mysqlsh root@172.20.0.11 --py -f check_servers.py
...

To summarize

Q: How do I validate an instance for MySQL InnoDB Cluster usage?

A: Use check_instance_configuration()

References

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

1

CHECK constraints in MySQL

May 14, 2019
Above the clouds by Olivier DASINI

MySQL (really) supports CHECK CONSTRAINT since version 8.0.16.
In this article I will show you 2 things:

  1. An elegant way to simulate check constraint in MySQL 5.7 & 8.0.
  2. How easy and convenient it is to use CHECK constraints starting from MySQL 8.0.16.

Please note that this article is strongly inspired by Mablomy‘s blog post: CHECK constraint for MySQL – NOT NULL on generated columns.

I’m using the optimized MySQL Server Docker images, created, maintained and supported by the MySQL team at Oracle.
For clarity I chose MySQL 8.0.15 for the check constraint hack and obviously 8.0.16 for the “real” check constraint implementation.


Deployment of MySQL 8.0.15 & MySQL 8.0.16:

$ docker run --name=mysql-8.0.15 -e MYSQL_ROOT_PASSWORD=unsafe -d mysql/mysql-server:8.0.15
 d4ce35e429e08bbf46a02729e6667458e2ed90ce94e7622f1342ecb6c0dfa009
$ docker run --name=mysql-8.0.16 -e MYSQL_ROOT_PASSWORD=unsafe -d mysql/mysql-server:8.0.16
 d3b22dff1492fe6cb488a7f747e4709459974e79ae00b60eb0aee20546b68a0f

Note:

Obviously using a password on the command line interface can be insecure.

Please read the best practices of deploying MySQL on Linux with Docker.

Example 1

Check constraints hack

$ docker exec -it mysql-8.0.15 mysql -uroot -p --prompt='mysql-8.0.15> '
Enter password: 

mysql-8.0.15> CREATE SCHEMA test;
Query OK, 1 row affected (0.03 sec)

mysql-8.0.15> USE test
Database changed

mysql-8.0.15> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.15    |
+-----------+


mysql-8.0.15> 
CREATE TABLE checker_hack ( 
    i tinyint, 
    i_must_be_between_7_and_12 BOOLEAN 
         GENERATED ALWAYS AS (IF(i BETWEEN 7 AND 12, true, NULL)) 
         VIRTUAL NOT NULL
);

As you can see, the trick is to use Generated Columns, available since MySQL 5.7 and the flow control operator IF where the check condition is put.

mysql-8.0.15> INSERT INTO checker_hack (i) VALUES (11);
Query OK, 1 row affected (0.03 sec)

mysql-8.0.15> INSERT INTO checker_hack (i) VALUES (12);
Query OK, 1 row affected (0.01 sec)


mysql-8.0.15> SELECT i FROM checker_hack;
+------+
| i    |
+------+
|   11 |
|   12 |
+------+
2 rows in set (0.00 sec)

As expected, values that respect the condition (between 7 and 12) can be inserted.

mysql-8.0.15> INSERT INTO checker_hack (i) VALUES (13);
ERROR 1048 (23000): Column 'i_must_be_between_7_and_12' cannot be null


mysql-8.0.15> SELECT i FROM checker_hack;
+------+
| i    |
+------+
|   11 |
|   12 |
+------+
2 rows in set (0.00 sec)

Outside the limits, an error is raised.
We have our “check constraint” like feature 🙂

Check constraint since MySQL 8.0.16

$ docker exec -it mysql-8.0.16 mysql -uroot -p --prompt='mysql-8.0.16> '
Enter password: 

mysql-8.0.16> CREATE SCHEMA test;
Query OK, 1 row affected (0.08 sec)

mysql-8.0.16> USE test
Database changed

mysql-8.0.16> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.16    |
+-----------+


mysql-8.0.16> 
CREATE TABLE checker ( 
    i tinyint, 
    CONSTRAINT i_must_be_between_7_and_12 CHECK (i BETWEEN 7 AND 12 )
);

Since MySQL 8.0.16, the CHECK keyword do the job.
I would recommend to name wisely your constraint.
The syntax is:

[CONSTRAINT [symbol]] CHECK (expr) [[NOT] ENFORCED]

From there, the following is rather obvious:

mysql-8.0.16> INSERT INTO checker (i) VALUES (11);
Query OK, 1 row affected (0.02 sec)

mysql-8.0.16> INSERT INTO checker (i) VALUES (12);
Query OK, 1 row affected (0.03 sec)


mysql-8.0.16> SELECT i FROM checker;
+------+
| i    |
+------+
|   11 |
|   12 |
+------+
2 rows in set (0.00 sec)

mysql-8.0.16> INSERT INTO checker (i) VALUES (13);
ERROR 3819 (HY000): Check constraint 'i_must_be_between_7_and_12' is violated.


mysql-8.0.16> SELECT i FROM checker;
+------+
| i    |
+------+
|   11 |
|   12 |
+------+
2 rows in set (0.00 sec)

Easy! 🙂

Example 2

You can check a combination of columns.

Check constraints hack

mysql-8.0.15> 
CREATE TABLE squares_hack (
     dx DOUBLE, 
     dy DOUBLE, 
     area_must_be_larger_than_10 BOOLEAN 
           GENERATED ALWAYS AS (IF(dx*dy>10.0, true, NULL)) NOT NULL
);

mysql-8.0.15> INSERT INTO squares_hack (dx,dy) VALUES (7,4);
Query OK, 1 row affected (0.02 sec)


mysql-8.0.15> INSERT INTO squares_hack (dx,dy) VALUES (2,4);
ERROR 1048 (23000): Column 'area_must_be_larger_than_10' cannot be null


mysql-8.0.15> SELECT dx, dy FROM squares_hack;
+------+------+
| dx   | dy   |
+------+------+
|    7 |    4 |
+------+------+
1 row in set (0.00 sec)

Check constraint since MySQL 8.0.16

mysql-8.0.16> 
CREATE TABLE squares (
     dx DOUBLE, 
     dy DOUBLE, 
     CONSTRAINT area_must_be_larger_than_10 CHECK ( dx * dy > 10.0 )
);


mysql-8.0.16> INSERT INTO squares (dx,dy) VALUES (7,4);
Query OK, 1 row affected (0.01 sec)


mysql-8.0.16> INSERT INTO squares (dx,dy) VALUES (2,4);
ERROR 3819 (HY000): Check constraint 'area_must_be_larger_than_10' is violated.


mysql-8.0.16> SELECT dx, dy FROM squares;
+------+------+
| dx   | dy   |
+------+------+
|    7 |    4 |
+------+------+
1 row in set (0.00 sec)

Still easy!

Example 3

You can also check text columns.

Check constraints hack

mysql-8.0.15> 
CREATE TABLE animal_hack (  
     name varchar(30) NOT NULL,  
     class varchar(100) DEFAULT NULL,  
     class_allow_Mammal_Reptile_Amphibian BOOLEAN 
           GENERATED ALWAYS AS (IF(class IN ("Mammal", "Reptile", "Amphibian"), true, NULL)) NOT NULL
);  

mysql-8.0.15> INSERT INTO animal_hack (name, class) VALUES ("Agalychnis callidryas",'Amphibian');  
Query OK, 1 row affected (0.02 sec)

mysql-8.0.15> INSERT INTO animal_hack (name, class) VALUES ("Orycteropus afer", 'Mammal');  
Query OK, 1 row affected (0.02 sec)

mysql-8.0.15> INSERT INTO animal_hack (name, class) VALUES ("Lacerta agilis", 'Reptile');  
Query OK, 1 row affected (0.02 sec)


mysql-8.0.15> SELECT name, class FROM animal_hack;
+-----------------------+-----------+
| name                  | class     |
+-----------------------+-----------+
| Agalychnis callidryas | Amphibian |
| Orycteropus afer      | Mammal    |
| Lacerta agilis        | Reptile   |
+-----------------------+-----------+
3 rows in set (0.00 sec)
mysql-8.0.15> INSERT INTO animal_hack (name, class) VALUES ("Palystes castaneus", 'Arachnid'); 
ERROR 1048 (23000): Column 'class_allow_Mammal_Reptile_Amphibian' cannot be null


mysql-8.0.15> SELECT name, class FROM animal_hack;
+-----------------------+-----------+
| name                  | class     |
+-----------------------+-----------+
| Agalychnis callidryas | Amphibian |
| Orycteropus afer      | Mammal    |
| Lacerta agilis        | Reptile   |
+-----------------------+-----------+
3 rows in set (0.00 sec)

Check constraint since MySQL 8.0.16

mysql-8.0.16> 
CREATE TABLE animal (  
     name varchar(30) NOT NULL,  
     class varchar(100) DEFAULT NULL,  
     CONSTRAINT CHECK (class IN ("Mammal", "Reptile", "Amphibian"))
);  

mysql-8.0.16> INSERT INTO animal (name, class) VALUES ("Agalychnis callidryas",'Amphibian');  
Query OK, 1 row affected (0.04 sec)

mysql-8.0.16> INSERT INTO animal (name, class) VALUES ("Orycteropus afer", 'Mammal');  
Query OK, 1 row affected (0.04 sec)

mysql-8.0.16> INSERT INTO animal (name, class) VALUES ("Lacerta agilis", 'Reptile');  
Query OK, 1 row affected (0.04 sec)


mysql-8.0.16> SELECT name, class FROM animal_hack;
+-----------------------+-----------+
| name                  | class     |
+-----------------------+-----------+
| Agalychnis callidryas | Amphibian |
| Orycteropus afer      | Mammal    |
| Lacerta agilis        | Reptile   |
+-----------------------+-----------+
3 rows in set (0.00 sec)
mysql-8.0.16> INSERT INTO animal (name, class) VALUES ("Palystes castaneus", 'Arachnid');  
ERROR 3819 (HY000): Check constraint 'animal_chk_1' is violated.


mysql-8.0.16> SELECT name, class FROM animal_hack;
+-----------------------+-----------+
| name                  | class     |
+-----------------------+-----------+
| Agalychnis callidryas | Amphibian |
| Orycteropus afer      | Mammal    |
| Lacerta agilis        | Reptile   |
+-----------------------+-----------+
3 rows in set (0.00 sec)

Frankly easy!

I did not mention that the hack works as well in 8.0.16, though not needed anymore.

CHECK constraint is another useful feature implemented in MySQL (and not the last one, stay tuned!).
There are some other interesting things to know about this feature but also about the others available in MySQL 8.0.16.
Please have a look on the references below.

References

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

Comments Off on CHECK constraints in MySQL

Constant-Folding Optimization in MySQL 8.0

May 7, 2019

TL;TR

In MySQL 8.0.16 the optimizer has improved again!
Comparisons of columns of numeric types with constant values are checked and folded or removed for invalid or out-of-rage values.
The goal is to speed up query execution.


The name of this article (Constant-Folding Optimization), named after this kind of optimization, is quite cryptic. Nevertheless the principle is simple and more important there is nothing to do from the user perspective.

What is “Constant-Folding Optimization” ?

From the MySQL Documentation :
Comparisons between constants and column values in which the constant value is out of range or of the wrong type with respect to the column type are now handled once during query optimization rather row-by-row than during execution.

From the MySQL Server Team Blog :
The goal is to speed up execution at the cost of a little more analysis at optimize time.
Always true and false comparisons are detected and eliminated.
In other cases, the type of the constant is adjusted to match that of the field if they are not the same, avoiding type conversion at execution time
.

Clear enough?

One example is worth a thousand words, so let’s have a deeper look comparing the old behavior in MySQL 8.0.15 to the new one beginning with MySQL 8.0.16.

I’m using the optimized MySQL Server Docker images, created, maintained and supported by the MySQL team at Oracle.

Deployment of MySQL 8.0.15 & MySQL 8.0.16:

$ docker run --name=mysql_8.0.15 -e MYSQL_ROOT_PASSWORD=unsafe -d mysql/mysql-server:8.0.15
$ docker run --name=mysql_8.0.16 -e MYSQL_ROOT_PASSWORD=unsafe -d mysql/mysql-server:8.0.16

Note:

Obviously using a password on the command line interface can be insecure.

Please read the best practices of deploying MySQL on Linux with Docker.


Copy the test table dump file on 8.0.15 & 8.0.16:

$ docker cp ./testtbl.sql mysql_8.0.15:/tmp/testtbl.sql
$ docker cp ./testtbl.sql mysql_8.0.16:/tmp/testtbl.sql


Load the test table into 8.0.15 instance:

$ docker exec -it mysql_8.0.15 mysql -u root -p --prompt='mysql_8.0.15> '

Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 31
Server version: 8.0.15 MySQL Community Server - GPL

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql_8.0.15> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.15    |
+-----------+

mysql_8.0.15> CREATE SCHEMA test;
Query OK, 1 row affected (0.04 sec)

mysql_8.0.15> USE test
Database changed

mysql_8.0.15> source /tmp/testtbl.sql
... <snip> ...


Load the test table into 8.0.16 instance:

$ docker exec -it mysql_8.0.16 mysql -u root -p --prompt='mysql_8.0.16> '

Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 8.0.16 MySQL Community Server - GPL

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql_8.0.16> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.16    |
+-----------+

mysql_8.0.16> CREATE SCHEMA test;
Query OK, 1 row affected (0.04 sec)

mysql_8.0.16> USE test
Database changed

mysql_8.0.16> source /tmp/testtbl.sql
... <snip> ...



Let’s see what we have loaded:

mysql_8.0.16> SHOW CREATE TABLE testtbl\G
*************************** 1. row ***************************
       Table: testtbl
Create Table: CREATE TABLE `testtbl` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `val` varchar(36) NOT NULL,
  `val2` varchar(36) DEFAULT NULL,
  `val3` varchar(36) DEFAULT NULL,
  `val4` varchar(36) DEFAULT NULL,
  `num` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx2` (`val2`),
  KEY `idx3` (`val3`),
  KEY `idx4` (`val4`)
) ENGINE=InnoDB AUTO_INCREMENT=14220001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci


mysql_8.0.16> SELECT COUNT(*) FROM testtbl;
+----------+
| COUNT(*) |
+----------+
|  5000000 |
+----------+

What is important for us here is the non indexed column – num :

num int(10) unsigned DEFAULT NULL

It contains only positive numbers:

mysql_8.0.16> SELECT min(num), max(num) FROM testtbl;
+----------+----------+
| min(num) | max(num) |
+----------+----------+
|  9130001 | 14130000 |
+----------+----------+

The old behavior

What happens if I looking for a negative number, let’s say -12345, on the column num ?
Remember that it contains only positive numbers and there is no index.

mysql_8.0.15> EXPLAIN SELECT * FROM testtbl WHERE num=-12345\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtbl
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4820634
     filtered: 10.00
        Extra: Using where

According to the EXPLAIN plan, we have a full table scan. In a way that makes sense because there is no index on num.
However we know that there is no negative value, so there is certainly some room for improvements 🙂

Running the query:

mysql_8.0.15> SELECT * FROM testtbl WHERE num=-12345;
Empty set (2.77 sec)

Indeed the full table scan could be costly.

The current behavior – 8.0.16+

The Constant-Folding Optimization improves the execution of this type of queries.

The EXPLAIN plan for MySQL 8.0.16 is completely different:

mysql_8.0.16> EXPLAIN SELECT * FROM testtbl WHERE num=-12345\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE

Did you notice the:

Extra: Impossible WHERE

Looking for the negative value in a strictly positive column was processed at the optimize time!
So they are obviously a positive impact on the query execution time:

mysql_8.0.16> SELECT * FROM testtbl WHERE num=-12345;
Empty set (0.00 sec)

Yay!



In addition to the = operator, this optimization is currently possible for >, >=, <, <=, =, <>, != and <=> as well.
e.g.

mysql_8.0.16> EXPLAIN SELECT * FROM testtbl WHERE num > -42 AND num <= -1 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE


mysql_8.0.16> SELECT * FROM testtbl WHERE num > -42 AND num <=  -1;
Empty set (0.00 sec)

Indexed column

As a side note, if your column is indexed the optimizer already have the relevant information, so before 8.0.16, no need of Constant-Folding Optimization, to have a fast query :).

mysql_8.0.15> CREATE INDEX idx_num ON testtbl(num);
Query OK, 0 rows affected (24.84 sec)
Records: 0  Duplicates: 0  Warnings: 0


mysql_8.0.15> EXPLAIN SELECT * FROM testtbl WHERE num = -12345\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: no matching row in const table
1 row in set, 1 warning (0.00 sec)


mysql_8.0.15> SELECT * FROM testtbl WHERE num = -12345;
Empty set (0.00 sec)

References

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

Comments Off on Constant-Folding Optimization in MySQL 8.0

MySQL InnoDB Cluster – HowTo #1 – Monitor your cluster

April 11, 2019
Sakila HA by Olivier DASINI

How do I… Monitor the status & the configuration of my cluster?

Short answer

Use:

status()

Long answer…

Assuming you already have a MySQL InnoDB Cluster up and running. If not, please RTFM 🙂
Additionally you can read this tutorial and this article from my colleague lefred or this one on Windows Platform from my colleague Ivan.

I’m using MySQL 8.0.15

MySQL localhost:33060+ JS> session.sql('SELECT VERSION()')
+-----------+
| VERSION() |
+-----------+
| 8.0.15    |
+-----------+

Let’s connect to my cluster

$ mysqlsh root@localhost --cluster

Please provide the password for 'root@localhost': ****
MySQL Shell 8.0.15

Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
Creating a session to 'root@localhost'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 1520 (X protocol)
Server version: 8.0.15 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
You are connected to a member of cluster 'pocCluster'.
Variable 'cluster' is set.
Use cluster.status() in scripting mode to get status of this cluster or cluster.help() for more commands.

The “– – cluster” argument enables cluster management by setting the global variable.
This variable is a reference to the MySQL InnoDB Cluster object session. It will give you access (among others) to the status() method that allows you to check and monitor the cluster.

Ask for help

The built-in help is simply awesome!

MySQL localhost:33060+ JS> cluster.help('status')
NAME
      status - Describe the status of the cluster.

SYNTAX
      <Cluster>.status([options])

WHERE
      options: Dictionary with options.

RETURNS
       A JSON object describing the status of the cluster.

DESCRIPTION
      This function describes the status of the cluster including its
      ReplicaSets and Instances. The following options may be given to control
      the amount of information gathered and returned.

      - extended: if true, includes information about transactions processed by
        connection and applier, as well as groupName and memberId values.
      - queryMembers: if true, connect to each Instance of the ReplicaSets to
        query for more detailed stats about the replication machinery.

EXCEPTIONS
      MetadataError in the following scenarios:

      - If the Metadata is inaccessible.
      - If the Metadata update operation failed.

Cluster status

So let’s discover the status of our cluster

MySQL localhost:33060+ JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "172.19.0.11:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "172.19.0.11:3306": {
                "address": "172.19.0.11:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "172.19.0.12:3306": {
                "address": "172.19.0.12:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "172.19.0.13:3306": {
                "address": "172.19.0.13:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "172.19.0.11:3306"
}

Note:
The instance’s state in the cluster directly influences the information provided in the status report. Therefore ensure the instance you are connected to has a status of ONLINE.

As you can see, by default status() gives you a lot of relevant information.
Thus it could be used to monitor your cluster although the best tool available to monitor your MySQL InnoDB Cluster (but also MySQL Replication, MySQL NDB Cluster and obviously your standalone MySQL servers) is MySQL Enterprise Monitor.

More details with “A Guide to MySQL Enterprise Monitor“.

Extended cluster status

MySQL Group Replication provides several metrics and detailed information about the underlying cluster in MySQL InnoDB clusters.
These metrics which are used for monitoring are based on these Performance Schema tables.

Some of these information are available through MySQL Shell. You can control the amount of information gathered and returned with 2 options: extended & queryMembers.

extended

if enabled, includes information about groupName and memberID for each member; and general statistics about the number of transactions checked, proposed, rejected by members…

MySQL localhost:33060+ JS> cluster.status({extended:true})
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "groupName": "72568575-561c-11e9-914c-0242ac13000b", 
        "name": "default", 
        "primary": "172.19.0.11:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "172.19.0.11:3306": {
                "address": "172.19.0.11:3306", 
                "memberId": "4a85f6c4-561c-11e9-8401-0242ac13000b", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "appliedCount": 2, 
                    "checkedCount": 53, 
                    "committedAllMembers": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-51", 
                    "conflictsDetectedCount": 0, 
                    "inApplierQueueCount": 0, 
                    "inQueueCount": 0, 
                    "lastConflictFree": "72568575-561c-11e9-914c-0242ac13000b:56", 
                    "proposedCount": 53, 
                    "rollbackCount": 0
                }
            }, 
            "172.19.0.12:3306": {
                "address": "172.19.0.12:3306", 
                "memberId": "4ad75450-561c-11e9-baa8-0242ac13000c", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "appliedCount": 44, 
                    "checkedCount": 43, 
                    "committedAllMembers": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-41", 
                    "conflictsDetectedCount": 0, 
                    "inApplierQueueCount": 0, 
                    "inQueueCount": 0, 
                    "lastConflictFree": "72568575-561c-11e9-914c-0242ac13000b:52", 
                    "proposedCount": 0, 
                    "rollbackCount": 0
                }
            }, 
            "172.19.0.13:3306": {
                "address": "172.19.0.13:3306", 
                "memberId": "4b77c1ec-561c-11e9-9cc1-0242ac13000d", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "appliedCount": 42, 
                    "checkedCount": 42, 
                    "committedAllMembers": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-41", 
                    "conflictsDetectedCount": 0, 
                    "inApplierQueueCount": 0, 
                    "inQueueCount": 0, 
                    "lastConflictFree": "72568575-561c-11e9-914c-0242ac13000b:53", 
                    "proposedCount": 0, 
                    "rollbackCount": 0
                }
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "172.19.0.11:3306"
}

queryMembers

if enabled, includes information about recovery and regular transaction I/O, applier worker thread statistic and any lags; applier coordinator statistic…

MySQL localhost:33060+ JS> cluster.status({queryMembers:true})
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "172.19.0.11:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "172.19.0.11:3306": {
                "address": "172.19.0.11:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "connection": {
                        "lastHeartbeatTimestamp": "", 
                        "lastQueued": {
                            "endTimestamp": "2019-04-03 14:26:33.394755", 
                            "immediateCommitTimestamp": "", 
                            "immediateCommitToEndTime": null, 
                            "originalCommitTimestamp": "", 
                            "originalCommitToEndTime": null, 
                            "queueTime": 0.000077, 
                            "startTimestamp": "2019-04-03 14:26:33.394678", 
                            "transaction": "72568575-561c-11e9-914c-0242ac13000b:13"
                        }, 
                        "receivedHeartbeats": 0, 
                        "receivedTransactionSet": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-65", 
                        "threadId": null
                    }, 
                    "workers": [
                        {
                            "lastApplied": {
                                "applyTime": 0.022927, 
                                "endTimestamp": "2019-04-03 14:26:33.417643", 
                                "immediateCommitTimestamp": "", 
                                "immediateCommitToEndTime": null, 
                                "originalCommitTimestamp": "", 
                                "originalCommitToEndTime": null, 
                                "retries": 0, 
                                "startTimestamp": "2019-04-03 14:26:33.394716", 
                                "transaction": "72568575-561c-11e9-914c-0242ac13000b:13"
                            }, 
                            "threadId": 58
                        }
                    ]
                }
            }, 
            "172.19.0.12:3306": {
                "address": "172.19.0.12:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "connection": {
                        "lastHeartbeatTimestamp": "", 
                        "lastQueued": {
                            "endTimestamp": "2019-04-03 15:42:30.855989", 
                            "immediateCommitTimestamp": "", 
                            "immediateCommitToEndTime": null, 
                            "originalCommitTimestamp": "2019-04-03 15:42:30.854594", 
                            "originalCommitToEndTime": 0.001395, 
                            "queueTime": 0.000476, 
                            "startTimestamp": "2019-04-03 15:42:30.855513", 
                            "transaction": "72568575-561c-11e9-914c-0242ac13000b:65"
                        }, 
                        "receivedHeartbeats": 0, 
                        "receivedTransactionSet": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-65", 
                        "threadId": null
                    }, 
                    "workers": [
                        {
                            "lastApplied": {
                                "applyTime": 0.024685, 
                                "endTimestamp": "2019-04-03 15:42:30.880361", 
                                "immediateCommitTimestamp": "", 
                                "immediateCommitToEndTime": null, 
                                "originalCommitTimestamp": "2019-04-03 15:42:30.854594", 
                                "originalCommitToEndTime": 0.025767, 
                                "retries": 0, 
                                "startTimestamp": "2019-04-03 15:42:30.855676", 
                                "transaction": "72568575-561c-11e9-914c-0242ac13000b:65"
                            }, 
                            "threadId": 54
                        }
                    ]
                }
            }, 
            "172.19.0.13:3306": {
                "address": "172.19.0.13:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "connection": {
                        "lastHeartbeatTimestamp": "", 
                        "lastQueued": {
                            "endTimestamp": "2019-04-03 15:42:30.855678", 
                            "immediateCommitTimestamp": "", 
                            "immediateCommitToEndTime": null, 
                            "originalCommitTimestamp": "2019-04-03 15:42:30.854594", 
                            "originalCommitToEndTime": 0.001084, 
                            "queueTime": 0.000171, 
                            "startTimestamp": "2019-04-03 15:42:30.855507", 
                            "transaction": "72568575-561c-11e9-914c-0242ac13000b:65"
                        }, 
                        "receivedHeartbeats": 0, 
                        "receivedTransactionSet": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-65", 
                        "threadId": null
                    }, 
                    "workers": [
                        {
                            "lastApplied": {
                                "applyTime": 0.021354, 
                                "endTimestamp": "2019-04-03 15:42:30.877398", 
                                "immediateCommitTimestamp": "", 
                                "immediateCommitToEndTime": null, 
                                "originalCommitTimestamp": "2019-04-03 15:42:30.854594", 
                                "originalCommitToEndTime": 0.022804, 
                                "retries": 0, 
                                "startTimestamp": "2019-04-03 15:42:30.856044", 
                                "transaction": "72568575-561c-11e9-914c-0242ac13000b:65"
                            }, 
                            "threadId": 54
                        }
                    ]
                }
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "172.19.0.11:3306"
}

To summarize

Q: How do I monitor the status & the configuration of my cluster?

A: Use status() or status({extended:true}) or status({queryMembers:true})

References

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

5

MySQL JSON Document Store

April 2, 2019

Introduction

MySQL is the most popular Open Source database!
An ACID (acronym standing for Atomicity, Consistency, Isolation, and Durability) compliant relational database that allows you, among others, to manage your data with the powerful and proven SQL, to take care of your data integrity with transactions, foreign keys, …
But you already know that 🙂

JavaScript Objet Notation, better known as JSON, is a lightweight and very popular data-interchange format. Use for storing and exchanging data.
A JSON document is a standardized object that can represent structured data. And the structure is implicit in the document.
Anyway, I bet you know that too!

Started with MySQL 5.7.8, you can handle JSON documents in a “relational way”, using SQL queries and also storing them using the MySQL native JSON data type.
We also provides a large set of JSON functions.
I hope you were aware of that!

You should be interested in:

Note:

I would recommend you to have a closer look at JSON_TABLE function, that extract data from a JSON document and returns it as a relational table… It’s just amazing!

However MySQL 8.0 provides another way to handle JSON documents, actually in a “Not only SQL” (NoSQL) approach…
In other words, if you need/want to manage JSON documents (collections) in a non-relational manner, with CRUD (acronym for Create/Read/Update/Delete) operations then you can use MySQL 8.0!
Did you know that?

MySQL Document Store Architecture

Let’s have a quick overview of the architecture.

MySQL Document Store Architecture

  • X Plugin – The X Plugin enables MySQL to use the X Protocol and uses Connectors and the Shell to act as clients to the server.
  • X Protocol – The X Protocol is a new client protocol based on top of the Protobuf library, and works for both, CRUD and SQL operations.
  • X DevAPI – The X DevAPI is a new, modern, async developer API for CRUD and SQL operations on top of X Protocol. It introduces Collections as new Schema objects. Documents are stored in Collections and have their dedicated CRUD operation set.
  • MySQL Shell – The MySQL Shell is an interactive Javascript, Python, or SQL interface supporting development and administration for the MySQL Server. You can use the MySQL Shell to perform data queries and updates as well as various administration operations.
  • MySQL Connectors – Connectors that support the X Protocol and enable you to use X DevAPI in your chosen language (Node.jsPHPPythonJava.NETC++,…).

Write application using X DevAPI

As a disclaimer, I am not a developer, so sorry no fancy code in this blog post.
However the good news is that I can show you were you’ll be able to found the best MySQL developer resources ever 🙂 that is :

https://insidemysql.com/

And to start, I recommend to focus on the following articles:

And of course the newest articles as well.
Furthermore, another resource that would be useful to you is the

X DevAPI User Guide

Use Document Store with MySQL Shell

If you are a DBA, OPS and obviously a developer, the simplest way to use (or test) MySQL Document Store, is with MySQL Shell.

MySQL Shell is an integrated development & administration shell where all MySQL products will be available through a common scripting interface.
If you don’t know it yet, please download it.
Trust me you are going to love it !

MySQL Shell

MySQL Shell key features are :

  • Scripting for Javascript, Python, and SQL mode
  • Supports MySQL Standard and X Protocols
  • Document and Relational Models
  • CRUD Document and Relational APIs via scripting
  • Traditional Table, JSON, Tab Separated output results formats
  • Both Interactive and Batch operations

Note:

MySQL Shell is also a key component of MySQL InnoDB Cluster. In this context, it allows you to deploy and manager a MySQL Group Replication cluster.

See my MySQL InnoDB Cluster tutorial.

First steps with MySQL Shell

Let’s connect to the MySQL Server with MySQL Shell (mysqlsh)

$ mysqlsh root@myHost
MySQL Shell 8.0.15
... snip ...
Your MySQL connection id is 15 (X protocol)
Server version: 8.0.15 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

We must be inside a X session in order to use MySQL as a document store. Luckily there is no extra step, because it’s the default in MySQL 8.0. Note that the default “X” port is 33060.
You can check that you are inside a X session thus using X protocol

MySQL myHost:33060+ JS> session
<Session:root@myHost:33060>

MySQL myHost:33060+ JS> \status
...snip...

Session type:                 X

Default schema:               
Current schema:               
Server version:               8.0.15 MySQL Community Server - GPL
Protocol version:             X protocol
...snip...

If you are connected inside a classic session, you’ll get the following input (note “<ClassicSession instead of <Session”) :

MySQL myHost:3306 JS> session
<ClassicSession:root@myHost:3306>

You can know what is you X protocol port by checking mysqlx_port variable.
I’ll switch to the MySQL Shell SQL mode to execute my SQL command:

MySQL myHost:3306 JS> \sql
Switching to SQL mode... Commands end with ;

MySQL myHost:3306 SQL> SHOW VARIABLES LIKE 'mysqlx_port';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| mysqlx_port   | 33060 |
+---------------+-------+

Then reconnect to the server using the right port (33060 by default) and you should be fine :

MySQL myHost:3306 SQL> \connect root@myHost:33060 
...snip...
Your MySQL connection id is 66 (X protocol)
Server version: 8.0.15 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

MySQL myHost:33060+ SQL> \js
Switching to JavaScript mode...

MySQL myHost:33060+ JS> session
<Session:root@myHost:33060>

CRUD

We are going to create a schema (demo) where we will do our tests

MySQL myHost:33060+ JS> session.createSchema('demo')
<Schema:demo>


MySQL myHost:33060+ JS> \use demo
Default schema `demo` accessible through db.

Note:

The MySQL Shell default language is JavaScript. However, all the steps described in this article can also be done in Python.

e.g.

JS> session.createSchema(‘demo’)

Py> session.create_schema(‘demo’)

Create documents

Create a collection (my_coll1) insert the schema demo and insert documents :

MySQL myHost:33060+ demo JS> db.createCollection('my_coll1');
<Collection:my_coll1>


MySQL myHost:33060+ demo JS> db.my_coll1.add({"title":"MySQL Document Store", "abstract":"SQL is now optional!", "code": "42"})
Query OK, 1 item affected (0.0358 sec)

Trying to add a non valid JSON document raise an error :

MySQL myHost:33060+ demo JS> db.my_coll1.add("This is not a valid JSON document")
CollectionAdd.add: Argument #1 expected to be a document, JSON expression or a list of documents (ArgumentError)

List collections

To get the list of collections belonging to the current schema use getCollections() :

MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:my_coll1>
]

Find documents

Display the content of a collection with find() :

MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "code": "42",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    }
]
1 document in set (0.0029 sec)

Note:

Each document requires an identifier field called _id. The value of the _id field must be unique among all documents in the same collection.
MySQL server sets an _id value if the document does not contain the _id field.

Please read: Understanding Document IDs.

You can execute many operations on your document. One practical way to get the list of available functions is to press the <TAB> key, to ask for auto-completion, after the dot “.”
For example, type db.my_coll1. then press <TAB>twice, you’ll get the following result:

MySQL myHost:33060+ demo JS> db.my_coll1.
add()               count()             dropIndex()         find()              getOne()            getSession()        modify()            remove()            replaceOne()        session
addOrReplaceOne()   createIndex()       existsInDatabase()  getName()           getSchema()         help()              name                removeOne()         schema

You can also use the awesome MySQL Shell built-in help (I strongly recommend my colleague Jesper‘s article) and please bookmark is blog.
Last but not least our documentations: X DevAPI User Guide, MySQL Shell JavaSCript API Reference & MySQL Shell Python API reference.

Modify documents

You’ll need the modify() function :

MySQL myHost:33060+ demo JS> db.my_coll1.find("_id='00005c9514e60000000000000053'").fields("code")
[
    {
        "code": "42"
    }
]


MySQL myHost:33060+ demo JS> db.my_coll1.modify("_id='00005c9514e60000000000000053'").set("code","2019")
Query OK, 1 item affected (0.0336 sec)
Rows matched: 1  Changed: 1  Warnings: 0


MySQL myHost:33060+ demo JS> db.my_coll1.find("_id='00005c9514e60000000000000053'").fields("code")
[
    {
        "code": "2019"
    }
]

Remove content from documents

You can also modify the structure of a document by remove a key and its content with modify() and unset().

MySQL myHost:33060+ demo JS> db.my_coll1.add({"title":"Quote", "message": "Strive for greatness"})
Query OK, 1 item affected (0.0248 sec)

MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "code": "42",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    },
    {
        "_id": "00005c9514e60000000000000054",
        "title": "Quote",
        "message": "Strive for greatness"
    }
]
2 documents in set (0.0033 sec)

MySQL myHost:33060+ demo JS> db.my_coll1.modify("_id='00005c9514e60000000000000054'").unset("title")
Query OK, 1 item affected (0.0203 sec)

Rows matched: 1  Changed: 1  Warnings: 0

MySQL myHost:33060+ demo JS> db.my_coll1.find("_id='00005c9514e60000000000000054'")
[
    {
        "_id": "00005c9514e60000000000000054",
        "message": "Strive for greatness"
    }
]

Remove documents

We are missing one last important operation, delete documents with remove()

MySQL myHost:33060+ demo JS> db.my_coll1.remove("_id='00005c9514e60000000000000054'")
Query OK, 1 item affected (0.0625 sec)


MySQL myHost:33060+ demo JS> db.my_coll1.find("_id='00005c9514e60000000000000054'")
Empty set (0.0003 sec)

You can also remove all documents in a collection with one command. To do so, use the remove(“true”) method without specifying any search condition.
Obviously it is usually not a good practice…

Import JSON dcouments

Let’s work with a bigger JSON collection.
MySQL Shell provide a very convenient tool, named importJson(), to easily import JSON documents inside your MySQL Server either in the form of collection or table.

MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:my_coll1>
]


MySQL myHost:33060+ demo JS> util.importJson('GoT_episodes.json')
Importing from file "GoT_episodes.json" to collection `demo`.`GoT_episodes` in MySQL Server at myHost:33060

.. 73.. 73
Processed 47.74 KB in 73 documents in 0.1051 sec (694.75 documents/s)
Total successfully imported documents 73 (694.75 documents/s)


MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:GoT_episodes>, 
    <Collection:my_coll1>
]

You can find the JSON file source here.
Note that I had to do an extra step before import the data:
sed ‘s/}}},{“id”/}}} {“id”/g’ got_episodes.json.BAK > got_episodes.json

By the way you can import data from MongoDB to MySQL \o/

No more excuses to finally get rid of MongoDB 😉

Let’s do some queries…

Display 1 document

MySQL myHost:33060+ demo JS> db.GoT_episodes.find().limit(1)
[
    {
        "id": 4952,
        "_id": "00005c9514e6000000000000009e",
        "url": "http://www.tvmaze.com/episodes/4952/game-of-thrones-1x01-winter-is-coming",
        "name": "Winter is Coming",
        "image": {
            "medium": "http://static.tvmaze.com/uploads/images/medium_landscape/1/2668.jpg",
            "original": "http://static.tvmaze.com/uploads/images/original_untouched/1/2668.jpg"
        },
        "_links": {
            "self": {
                "href": "http://api.tvmaze.com/episodes/4952"
            }
        },
        "number": 1,
        "season": 1,
        "airdate": "2011-04-17",
        "airtime": "21:00",
        "runtime": 60,
        "summary": "<p>Lord Eddard Stark, ruler of the North, is summoned to court by his old friend, King Robert Baratheon, to serve as the King's Hand. Eddard reluctantly agrees after learning of a possible threat to the King's life. Eddard's bastard son Jon Snow must make a painful decision about his own future, while in the distant east Viserys Targaryen plots to reclaim his father's throne, usurped by Robert, by selling his sister in marriage.</p>",
        "airstamp": "2011-04-18T01:00:00+00:00"
    }
]

Looks like data relative to a famous TV show 🙂

All episodes from season 1

MySQL myHost:33060+ demo JS> db.GoT_episodes.find("season=1").fields("name", "summary", "airdate").sort("number")
[
    {
        "name": "Winter is Coming",
        "airdate": "2011-04-17",
        "summary": "<p>Lord Eddard Stark, ruler of the North, is summoned to court by his old friend, King Robert Baratheon, to serve as the King's Hand. Eddard reluctantly agrees after learning of a possible threat to the King's life. Eddard's bastard son Jon Snow must make a painful decision about his own future, while in the distant east Viserys Targaryen plots to reclaim his father's throne, usurped by Robert, by selling his sister in marriage.</p>"
    },
    {
        "name": "The Kingsroad",
        "airdate": "2011-04-24",
        "summary": "<p>An incident on the Kingsroad threatens Eddard and Robert's friendship. Jon and Tyrion travel to the Wall, where they discover that the reality of the Night's Watch may not match the heroic image of it.</p>"
    },
    {
        "name": "Lord Snow",
        "airdate": "2011-05-01",
        "summary": "<p>Jon Snow attempts to find his place amongst the Night's Watch. Eddard and his daughters arrive at King's Landing.</p>"
    },
    {
        "name": "Cripples, Bastards, and Broken Things",
        "airdate": "2011-05-08",
        "summary": "<p>Tyrion stops at Winterfell on his way home and gets a frosty reception from Robb Stark. Eddard's investigation into the death of his predecessor gets underway.</p>"
    },
    {
        "name": "The Wolf and the Lion",
        "airdate": "2011-05-15",
        "summary": "<p>Catelyn's actions on the road have repercussions for Eddard. Tyrion enjoys the dubious hospitality of the Eyrie.</p>"
    },
    {
        "name": "A Golden Crown",
        "airdate": "2011-05-22",
        "summary": "<p>Viserys is increasingly frustrated by the lack of progress towards gaining his crown.</p>"
    },
    {
        "name": "You Win or You Die",
        "airdate": "2011-05-29",
        "summary": "<p>Eddard's investigations in King's Landing reach a climax and a dark secret is revealed.</p>"
    },
    {
        "name": "The Pointy End",
        "airdate": "2011-06-05",
        "summary": "<p>Tyrion joins his father's army with unexpected allies. Events in King's Landing take a turn for the worse as Arya's lessons are put to the test.</p>"
    },
    {
        "name": "Baelor",
        "airdate": "2011-06-12",
        "summary": "<p>Catelyn must negotiate with the irascible Lord Walder Frey.</p>"
    },
    {
        "name": "Fire and Blood",
        "airdate": "2011-06-19",
        "summary": "<p>Daenerys must realize her destiny. Jaime finds himself in an unfamiliar predicament.</p>"
    }
]

First episode of each season

MySQL myHost:33060+ demo JS> db.GoT_episodes.find("number=1").fields("name", "airdate", "season").sort("season")
[
    {
        "name": "Winter is Coming",
        "season": 1,
        "airdate": "2011-04-17"
    },
    {
        "name": "The North Remembers",
        "season": 2,
        "airdate": "2012-04-01"
    },
    {
        "name": "Valar Dohaeris",
        "season": 3,
        "airdate": "2013-03-31"
    },
    {
        "name": "Two Swords",
        "season": 4,
        "airdate": "2014-04-06"
    },
    {
        "name": "The Wars to Come",
        "season": 5,
        "airdate": "2015-04-12"
    },
    {
        "name": "The Red Woman",
        "season": 6,
        "airdate": "2016-04-24"
    },
    {
        "name": "Dragonstone",
        "season": 7,
        "airdate": "2017-07-16"
    },
    {
        "name": "TBA",
        "season": 8,
        "airdate": "2019-04-14"
    }
]
8 documents in set (0.0047 sec)

CRUD Prepared Statements

A common pattern with document store datastores is to repeatedly execute the same (or similar) kind of simple queries (e.g. “id” based lookup).
These queries can be accelerated using prepared (CRUD) statements.

For example if your application often use the following query:

MySQL myHost:33060+ demo JS> db.GoT_episodes.find("number=1 AND season=1").fields("name", "airdate")
[
    {
        "name": "Winter is Coming",
        "airdate": "2011-04-17"
    }
]

So it’s probably a good idea to use prepared statements.
First we need to prepare the query:

// Prepare a statement using a named parameter
var gotEpisode = db.GoT_episodes.find("number = :episodeNum AND season = :seasonNum").fields("name", "airdate")

Then bind the value to the parameter :

MySQL myHost:33060+ demo JS> gotEpisode.bind('episodeNum', 1).bind('seasonNum', 1)
[
    {
        "name": "Winter is Coming",
        "airdate": "2011-04-17"
    }
]
MySQL myHost:33060+ demo JS> gotEpisode.bind('episodeNum', 7).bind('seasonNum', 3)
[
    {
        "name": "The Bear and the Maiden Fair",
        "airdate": "2013-05-12"
    }
]

Simply powerful!

Index

Indeed relevant indexes is a common practice to improve performances. MySQL Document Store allows you to index your keys inside the JSON document.

Add a composite Index on keys season AND episode.

MySQL myHost:33060+ demo JS> db.GoT_episodes.createIndex('idxSeasonEpisode', {fields: [{field: "$.season", type: "TINYINT UNSIGNED", required: true}, {field: "$.number", type: "TINYINT UNSIGNED", required: true}]})
Query OK, 0 rows affected (0.1245 sec)

The required: true option means that it’s mandatory for all documents to contains at least the keys number and season.
E.g.

MySQL myHost:33060+ demo JS> db.GoT_episodes.add({"name": "MySQL 8 is Great"})
ERROR: 5115: Document is missing a required field


MySQL myHost:33060+ demo JS> db.GoT_episodes.add({"name": "MySQL 8 is Great", "number": 8})
ERROR: 5115: Document is missing a required field


MySQL myHost:33060+ demo JS> db.GoT_episodes.add({"name": "MySQL 8 is Great", "season": 8})
ERROR: 5115: Document is missing a required field

Add an index on key summary (30 first characters)

MySQL myHost:33060+ demo JS> db.GoT_episodes.createIndex('idxSummary', {fields: [{field: "$.summary", type: "TEXT(30)"}]})
Query OK, 0 rows affected (0.1020 sec)

Add a Unique Index on key id
Not the one generated by MySQL called _id and already indexed (primary key)

MySQL myHost:33060+ demo JS> db.GoT_episodes.createIndex('idxId', {fields: [{field: "$.id", type: "INT UNSIGNED"}], unique: true})
Query OK, 0 rows affected (0.3379 sec)

The unique: true option means that values of key id must be unique for each document inside the collection. i.e. no duplicate values.
E.g.

MySQL myHost:33060+ demo JS> db.GoT_episodes.add({"id":4952, "number": 42, "season": 42 })
ERROR: 5116: Document contains a field value that is not unique but required to be

You can obviously drop an index, using dropIndex().
E.g. db.GoT_episodes.dropIndex(“idxSummary”)

Transactions

MySQL Document Store is full ACID, it relies on the proven InnoDB’s strength & robustness.

Yes, you get it right, We do care about your data!

You need the functions below:

Let’s see an example with a multi collection transactions that will be rollback.

// Start the transaction
session.startTransaction()

MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "code": "42",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    }
]
1 document in set (0.0033 sec)


// Modify a document in collection my_coll1
MySQL myHost:33060+ demo JS> db.my_coll1.modify("_id = '00005c9514e60000000000000053'").unset("code")
Query OK, 1 item affected (0.0043 sec)
Rows matched: 1  Changed: 1  Warnings: 0


//Collection 1 : my_coll1
// Add a new document in my_coll1
MySQL myHost:33060+ demo JS> db.my_coll1.add({"title":"Quote", "message": "Be happy, be bright, be you"})
Query OK, 1 item affected (0.0057 sec)


MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    },
    {
        "_id": "00005c9514e600000000000000e7",
        "title": "Quote",
        "message": "Be happy, be bright, be you"
    }
]
2 documents in set (0.0030 sec)



// Collection 2 : GoT_episodes
// Number of documents in GoT_episodes
MySQL myHost:33060+ demo JS> db.GoT_episodes.count()
73


// Remove all the 73 documents from GoT_episodes
MySQL myHost:33060+ demo JS> db.GoT_episodes.remove("true")
Query OK, 73 items affected (0.2075 sec)


// Empty collection
MySQL myHost:33060+ demo JS> db.GoT_episodes.count()
0



// Finally want my previous status back
// Rollback the transaction (if necessary e.g. in case of an error)
MySQL myHost:33060+ demo JS> session.rollback() 
Query OK, 0 rows affected (0.0174 sec)

Tadam!!!
We back in the past 🙂

MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "code": "42",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    }
]
1 document in set (0.0028 sec)


MySQL myHost:33060+ demo JS> db.GoT_episodes.count()
73

Execute (complex) SQL queries

NoSQL + SQL = MySQL

From the MySQL server point of view, collections are tables as well, like regular tables.
And this is very powerful !!!

Powerful because that allow you, within the same datastore (MySQL), to do CRUD queries and SQL queries on the same dataset.
Powerful because that allow you, to have your OLTP CRUD workload and your analytics SQL workload at the same place.
So no need to transfer/sync/… data from 1 datastore to another anymore!!!

You can do SQL queries using sql() functions:

MySQL myHost:33060+ demo JS> session.sql("SELECT count(*) FROM GoT_episodes")
+----------+
| count(*) |
+----------+
|       73 |
+----------+

You can also do SQL queries just as you have done until now, using the rich set of MySQL JSON functions.
OK let’s have a closer look.

Remember this CRUD query?

MySQL myHost:33060+ demo JS> db.GoT_episodes.find("number=1 AND season=1").fields("name", "airdate")
[
    {
        "name": "Winter is Coming",
        "airdate": "2011-04-17"
    }
]

Its SQL query alter ego is :

MySQL myHost:33060+ demo JS> \sql

MySQL myHost:33060+ demo SQL> 
SELECT doc->>"$.name" AS name, doc->>"$.airdate" AS airdate 
FROM GoT_episodes 
WHERE doc->>"$.number" = 1 AND doc->>"$.season" = 1\G
*************************** 1. row ***************************
   name: Winter is Coming
airdate: 2011-04-17

Let’s do some SQL queries…

Number of episodes by season

MySQL myHost:33060+ demo SQL> 
SELECT doc->>"$.season", COUNT(doc->>"$.number") 
FROM GoT_episodes 
GROUP BY doc->>"$.season";
+------------------+-------------------------+
| doc->>"$.season" | count(doc->>"$.number") |
+------------------+-------------------------+
| 1                |                      10 |
| 2                |                      10 |
| 3                |                      10 |
| 4                |                      10 |
| 5                |                      10 |
| 6                |                      10 |
| 7                |                       7 |
| 8                |                       6 |
+------------------+-------------------------+

Episode statistics for each season

MySQL myHost:33060+ demo SQL> 
SELECT DISTINCT
    doc->>"$.season" AS Season,
    max(doc->>"$.runtime") OVER w AS "Max duration",
    min(doc->>"$.runtime") OVER w AS "Min duration",
    AVG(doc->>"$.runtime") OVER w AS "Avg duration"
FROM GoT_episodes
WINDOW w AS (
    PARTITION BY doc->>"$.season"
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
+--------+--------------+--------------+--------------+
| Season | Max duration | Min duration | Avg duration |
+--------+--------------+--------------+--------------+
| 1      | 60           | 60           |           60 |
| 2      | 60           | 60           |           60 |
| 3      | 60           | 60           |           60 |
| 4      | 60           | 60           |           60 |
| 5      | 60           | 60           |           60 |
| 6      | 69           | 60           |         60.9 |
| 7      | 60           | 60           |           60 |
| 8      | 90           | 60           |           80 |
+--------+--------------+--------------+--------------+

Statistics on the number of days between episodes

MySQL myHost:33060+ demo SQL> 
SELECT
    doc->>"$.airdate" AS airdate, 
    DATEDIFF(doc->>"$.airdate", lag(doc->>"$.airdate") OVER w) AS "Delta days between episode",
    DATEDIFF(doc->>"$.airdate", first_value(doc->>"$.airdate") OVER w) AS "Total days since 1st episode"
FROM GoT_episodes
    WINDOW w AS (ORDER BY doc->>"$.airdate")
;
+------------+----------------------------+------------------------------+
| airdate    | Delta days between episode | Total days since 1st episode |
+------------+----------------------------+------------------------------+
| 2011-04-17 |                       NULL |                            0 |
| 2011-04-24 |                          7 |                            7 |
| 2011-05-01 |                          7 |                           14 |
| 2011-05-08 |                          7 |                           21 |
| 2011-05-15 |                          7 |                           28 |
| 2011-05-22 |                          7 |                           35 |
| 2011-05-29 |                          7 |                           42 |
| 2011-06-05 |                          7 |                           49 |
| 2011-06-12 |                          7 |                           56 |
| 2011-06-19 |                          7 |                           63 |
| 2012-04-01 |                        287 |                          350 |
| 2012-04-08 |                          7 |                          357 |
| 2012-04-15 |                          7 |                          364 |
| 2012-04-22 |                          7 |                          371 |
| 2012-04-29 |                          7 |                          378 |
| 2012-05-06 |                          7 |                          385 |
| 2012-05-13 |                          7 |                          392 |
| 2012-05-20 |                          7 |                          399 |
| 2012-05-27 |                          7 |                          406 |
| 2012-06-03 |                          7 |                          413 |
| 2013-03-31 |                        301 |                          714 |
| 2013-04-07 |                          7 |                          721 |
| 2013-04-14 |                          7 |                          728 |
| 2013-04-21 |                          7 |                          735 |
| 2013-04-28 |                          7 |                          742 |
| 2013-05-05 |                          7 |                          749 |
| 2013-05-12 |                          7 |                          756 |
| 2013-05-19 |                          7 |                          763 |
| 2013-06-02 |                         14 |                          777 |
| 2013-06-09 |                          7 |                          784 |
| 2014-04-06 |                        301 |                         1085 |
| 2014-04-13 |                          7 |                         1092 |
| 2014-04-20 |                          7 |                         1099 |
| 2014-04-27 |                          7 |                         1106 |
| 2014-05-04 |                          7 |                         1113 |
| 2014-05-11 |                          7 |                         1120 |
| 2014-05-18 |                          7 |                         1127 |
| 2014-06-01 |                         14 |                         1141 |
| 2014-06-08 |                          7 |                         1148 |
| 2014-06-15 |                          7 |                         1155 |
| 2015-04-12 |                        301 |                         1456 |
| 2015-04-19 |                          7 |                         1463 |
| 2015-04-26 |                          7 |                         1470 |
| 2015-05-03 |                          7 |                         1477 |
| 2015-05-10 |                          7 |                         1484 |
| 2015-05-17 |                          7 |                         1491 |
| 2015-05-24 |                          7 |                         1498 |
| 2015-05-31 |                          7 |                         1505 |
| 2015-06-07 |                          7 |                         1512 |
| 2015-06-14 |                          7 |                         1519 |
| 2016-04-24 |                        315 |                         1834 |
| 2016-05-01 |                          7 |                         1841 |
| 2016-05-08 |                          7 |                         1848 |
| 2016-05-15 |                          7 |                         1855 |
| 2016-05-22 |                          7 |                         1862 |
| 2016-05-29 |                          7 |                         1869 |
| 2016-06-05 |                          7 |                         1876 |
| 2016-06-12 |                          7 |                         1883 |
| 2016-06-19 |                          7 |                         1890 |
| 2016-06-26 |                          7 |                         1897 |
| 2017-07-16 |                        385 |                         2282 |
| 2017-07-23 |                          7 |                         2289 |
| 2017-07-30 |                          7 |                         2296 |
| 2017-08-06 |                          7 |                         2303 |
| 2017-08-13 |                          7 |                         2310 |
| 2017-08-20 |                          7 |                         2317 |
| 2017-08-27 |                          7 |                         2324 |
| 2019-04-14 |                        595 |                         2919 |
| 2019-04-21 |                          7 |                         2926 |
| 2019-04-28 |                          7 |                         2933 |
| 2019-05-05 |                          7 |                         2940 |
| 2019-05-12 |                          7 |                         2947 |
| 2019-05-19 |                          7 |                         2954 |
+------------+----------------------------+------------------------------+
73 rows in set (0.0066 sec)

Note:

Hey buddy, aren’t Window Functions very cool?

More here and here.

Drop collections

Use dropCollection() :

MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:GoT_episodes>, 
    <Collection:my_coll1>
]


MySQL myHost:33060+ demo JS> db.dropCollection("my_coll1")
MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:GoT_episodes>
]

Conclusion

Wow!
Probably one of my longest article, but I wanted to be sure to give you a large overview of MySQL Document Store (although not exhaustive) from a point of view of a non developer.


Now it is your turn to give it a try 🙂

NoSQL + SQL = MySQL

In order to go further

Some useful link:

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

7

MySQL Security – MySQL Enterprise Data Masking and De-Identification

March 19, 2019

When thinking about security within a MySQL installation, you should consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications. MySQL provides many tools / features / plugins in order to protect your data including some advanced features like Transparent Data Encryption aka TDE,  Audit, Data Masking & De-Identification, Firewall, Password Management, Password Validation Plugin, etc…

MySQL Security

In order to mitigate the effects of data breaches, and therefore the associated risks for your organization’s brand and reputation, popular regulations or standards including GDPR, PCI DSS, HIPAA,… recommand (among others things) data masking and de-identification.

According to Wikipedia:

  • Data masking or data obfuscation is the process of hiding original data with modified content (characters or other data.)
  • De-identification is the process used to prevent a person’s identity from being connected with information. For example, data produced during human subject research might be de-identified to preserve research participants’ privacy.

In other words, MySQL Enterprise Data Masking and De-Identification hides sensitive information by replacing real values with substitutes in order to protect sensitive data while they are still look real and consistent.

This the topic of this eight episode of this MySQL  Security series (URLs to all the articles at the end of this page).

MySQL Enterprise Data Masking and De-Identification

The simplest way to present this MySQL feature :
A built-in database solution to help organizations protect sensitive data from unauthorized uses

MySQL Enterprise Masking and De-identificaiton protects sensitive data from unauthorized users.

Note:

MySQL Enterprise Data Masking and De-Identification is an extension included in MySQL Enterprise Edition, a commercial product.

Available in MySQL 8.0, as of 8.0.13 and in MySQL 5.7, as of 5.7.24.

First step, installation.

Installation

MySQL Enterprise Data Masking and De-Identification, is implemented as a plugin library file containing a plugin and user-defined functions (UDFs).
As usual install is easy:

mysql> 
INSTALL PLUGIN data_masking SONAME 'data_masking.so';
CREATE FUNCTION gen_blacklist RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary_drop RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary_load RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_range RETURNS INTEGER  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_email RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_pan RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_ssn RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_us_phone RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_inner RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_outer RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_pan RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_pan_relaxed RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_ssn RETURNS STRING  SONAME 'data_masking.so';

You can check the activation of the data masking plugin:

mysql> 
SELECT PLUGIN_NAME, PLUGIN_STATUS, PLUGIN_VERSION, PLUGIN_LIBRARY, PLUGIN_DESCRIPTION 
FROM INFORMATION_SCHEMA.PLUGINS 
WHERE PLUGIN_NAME='data_masking'\G
*************************** 1. row ***************************
       PLUGIN_NAME: data_masking
     PLUGIN_STATUS: ACTIVE
    PLUGIN_VERSION: 0.1
    PLUGIN_LIBRARY: data_masking.so
PLUGIN_DESCRIPTION: Data masking facilities

Note:

If the plugin and UDFs are used on a master replication server, install them on all slave servers as well to avoid replication problems.

Uninstall is simple as well, uninstall the plugin and drop the UDFs:

mysql>
UNINSTALL PLUGIN data_masking;
DROP FUNCTION gen_blacklist;
DROP FUNCTION gen_dictionary;
DROP FUNCTION gen_dictionary_drop;
DROP FUNCTION gen_dictionary_load;
DROP FUNCTION gen_range;
DROP FUNCTION gen_rnd_email;
DROP FUNCTION gen_rnd_pan;
DROP FUNCTION gen_rnd_ssn;
DROP FUNCTION gen_rnd_us_phone;
DROP FUNCTION mask_inner;
DROP FUNCTION mask_outer;
DROP FUNCTION mask_pan;
DROP FUNCTION mask_pan_relaxed;
DROP FUNCTION mask_ssn;

Now we’re ready to play!

Data Generation

One of the nice “side feature” of MySQL Data Masking and De-Identification is the ability to generate business relevant datasets. Because it is not always possible to test/simulate your application on your real dataset (indeed playing with customer credit card or security social numbers is a very bad practice) this feature is very convenient.

Generating Random Data with Specific Characteristics

Several functions are available. They start with these 4 first characters: gen_ and you’ll find the complete list here.
In this article I’ll use the following functions :

  • gen_range() : returns a random integer selected from a given range.
  • gen_rnd_email() : returns a random email address in the example.com domain.
  • gen_rnd_pan() : returns a random payment card Primary Account Number.
  • gen_rnd_us_phone() : returns a random U.S. phone number in the 555 area code not used for legitimate numbers.

Generating Random Data Using Dictionaries

Sometime you will need data with better quality. So another way to generate a relevant dataset is to use dictionaries.

Again several functions are available. They also start with these 4 first characters: gen_ and you’ll find the complete list here.
I’ll use the following functions :

  • gen_dictionary_load() : Loads a file into the dictionary registry and assigns the dictionary a name to be used with other functions that require a dictionary name argument.
  • gen_dictionary() : Returns a random term from a dictionary.

OK, let’s moving forward!
In order to use data from a dictionary we must first load the data.

A dictionary is a plain text file, with one term per line:

$ head /dict/mq_cities.txt
Basse-Pointe
Bellefontaine
Case-Pilote
Ducos
Fonds-Saint-Denis
Fort-de-France
Grand'Rivière
Gros-Morne
L'Ajoupa-Bouillon
La Trinité

Then we must load the dictionaries

Note:

The secure_file_priv variable must be set properly (usually in your my.cnf or my.ini).

mysql> SHOW VARIABLES LIKE 'secure_file_priv'\G
*************************** 1. row ***************************
Variable_name: secure_file_priv
        Value: /dict/
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/Firstnames.txt', 'Firstnames')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/Firstnames.txt', 'Firstnames'): Dictionary load success
1 row in set (0,20 sec)

mysql> SELECT gen_dictionary_load('/dict/Lastnames.txt', 'Lastnames')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/Lastnames.txt', 'Lastnames'): Dictionary load success
1 row in set (0,24 sec)

mysql> SELECT gen_dictionary_load('/dict/JobTitles.txt', 'JobTitles')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/JobTitles.txt', 'JobTitles'): Dictionary load success
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/BirthDates.txt', 'BirthDates')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/BirthDates.txt', 'BirthDates'): Dictionary load success
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/mq_cities.txt', 'mq_Cities')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/mq_cities.txt', 'mq_Cities'): Dictionary load success
1 row in set (0,00 sec)

Note:

Dictionaries are not persistent. Any dictionary used by applications must be loaded for each server startup.

Now I have all my bricks to build my business centric test dataset.
For example I can generate a random email address:

mysql> SELECT gen_rnd_email();
+---------------------------+
| gen_rnd_email()           |
+---------------------------+
| rcroe.odditdn@example.com |
+---------------------------+

Or a random city from my dictionary of the cities of Martinique :

mysql> SELECT gen_dictionary('mq_Cities');
+-------------------------------+
| gen_dictionary('mq_Cities')   |
+-------------------------------+
| Fort-de-France                |
+-------------------------------+

Awesome!

Now let’s use these functions to generate some random but business oriented data.
Below our test table called sensitive_data which contains… sensitive data :

CREATE TABLE sensitive_data(
    emp_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    firstname VARCHAR(100) NOT NULL,
    lastname VARCHAR(100) NOT NULL,
    birth_date date,
    email VARCHAR(100) NOT NULL,
    phone VARCHAR(20),
    jobTitle VARCHAR(50),
    salary INT UNSIGNED,
    city VARCHAR(30),
    credit_card CHAR(19),
    PRIMARY KEY (emp_id))
;

I created a stored procedure (sorry but I’m a DBA) to fill my table with data. However a script in your favorite programming language could be a better choice:

DELIMITER //
DROP PROCEDURE IF EXISTS add_rows;
CREATE PROCEDURE add_rows( IN numRow TINYINT UNSIGNED)
BEGIN
    DECLARE cpt TINYINT UNSIGNED DEFAULT 0;
    WHILE cpt < numRow DO
        INSERT INTO sensitive_data(firstname, lastname, birth_date, email, phone, jobTitle, salary, city, credit_card)
        SELECT
        gen_dictionary('Firstnames'),
        gen_dictionary('Lastnames'),
        gen_dictionary('BirthDates'),
        gen_rnd_email(),
        gen_rnd_us_phone(),
        gen_dictionary('JobTitles'),
        gen_range(30000, 120000),
        gen_dictionary('mq_Cities'),
        gen_rnd_pan()
        FROM DUAL;
        SET cpt = cpt + 1;
        SELECT sleep(1);
    END WHILE;
END//
DELIMITER ;


-- Call the procedure and insert 10 rows in the table
CALL add_rows(10);


mysql> SELECT firstname, lastname, phone, salary, city FROM sensitive_data;
+-----------+-----------+----------------+--------+------------------+
| firstname | lastname  | phone          | salary | city             |
+-----------+-----------+----------------+--------+------------------+
| Fresh     | Daz       | 1-555-381-3165 |  78920 | Ducos            |
| Doowon    | Vieri     | 1-555-645-3332 |  78742 | Macouba          |
| Marsja    | Speckmann | 1-555-455-3688 |  56526 | Les Trois-Îlets  |
| Carrsten  | Speckmann | 1-555-264-8108 |  51253 | Fort-de-France   |
| Yonghong  | Marrevee  | 1-555-245-0883 |  86820 | Le Lorrain       |
| Shuji     | Magliocco | 1-555-628-3771 |  88615 | Le Marin         |
| Luisa     | Sury      | 1-555-852-7710 | 117957 | Le Morne-Rouge   |
| Troy      | Zobel     | 1-555-805-0270 |  78801 | Bellefontaine    |
| Lunjin    | Pettis    | 1-555-065-0517 |  69782 | Le Prêcheur      |
| Boriana   | Marletta  | 1-555-062-4226 |  70970 | Saint-Joseph     |
+-----------+-----------+----------------+--------+------------------+
10 rows in set (0,00 sec)

It looks like real data, it smells like real data, it sounds like real data but these are not real data. That’s what we wanted 🙂

Data Masking and De-Identification

Many masking functions are available. They start with these 5 first characters: mask_ and you’ll find the complete list here.
I’ll use the following functions :

mask_inner() masks the interior of its string argument, leaving the ends unmasked. Other arguments specify the sizes of the unmasked ends.

SELECT phone, mask_inner(phone, 0, 4) FROM sensitive_data LIMIT 1;
+----------------+-------------------------+
| phone          | mask_inner(phone, 0, 4) |
+----------------+-------------------------+
| 1-555-381-3165 | XXXXXXXXXX3165          |
+----------------+-------------------------+

mask_outer() does the reverse, masking the ends of its string argument, leaving the interior unmasked. Other arguments specify the sizes of the masked ends.

SELECT birth_date, mask_outer(birth_date, 5, 0) FROM sensitive_data LIMIT 1;
+------------+------------------------------+
| birth_date | mask_outer(birth_date, 5, 0) |
+------------+------------------------------+
| 1954-06-08 | XXXXX06-08                   |
+------------+------------------------------+

mask_pan() masks all but the last four digits of the number;
mask_pan_relaxed() is similar but does not mask the first six digits that indicate the payment card issuer unmasked.

SELECT mask_pan(credit_card), mask_pan_relaxed(credit_card) FROM sensitive_data LIMIT 1;
+-----------------------+-------------------------------+
| mask_pan(credit_card) | mask_pan_relaxed(credit_card) |
+-----------------------+-------------------------------+
| XXXXXXXXXXXX4416      | 262491XXXXXX4416              |
+-----------------------+-------------------------------+

Note:

If you deal with U.S. Social Security Numbers, you could also use mask_ssn() function.

e.g. mysql> SELECT mask_ssn(gen_rnd_ssn());

So how to masked and de-identified customer sensitive data ?


There are different strategies. One is to use views.
Thus you already have a first level of security because you can choose only the columns the business need and/or filter the rows.
Furthermore you have another level of security because you can control who can access these data with relevant privileges, with or without roles.

Let’s see some examples:

Ex. 1
Mask the firstname (firstname) & the lastname (lastname)

CREATE VIEW v1_mask AS
  SELECT
    mask_inner(firstname, 1, 0) AS firstname,
    mask_outer(lastname, 3, 3) AS lastname,
    salary
  FROM sensitive_data;
SELECT * FROM v1_mask WHERE salary > 100000;
+-----------+----------+--------+
| firstname | lastname | salary |
+-----------+----------+--------+
| LXXXX     | XXXX     | 117957 |
+-----------+----------+--------+

Ex. 2
Mask the credit card number (credit_card)

CREATE VIEW v2_mask AS
  SELECT
    firstname,
    lastname,
    email,
    phone,
    mask_pan(credit_card) AS credit_card
  FROM sensitive_data;  
SELECT email, phone, credit_card 
FROM v2_mask 
WHERE firstname='Fresh' AND lastname='Daz';
+---------------------------+----------------+------------------+
| email                     | phone          | credit_card      |
+---------------------------+----------------+------------------+
| bcnnk.wnruava@example.com | 1-555-381-3165 | XXXXXXXXXXXX4416 |
+---------------------------+----------------+------------------+

Ex. 3
Replace real values of employee id (emp_id) and birth date (birth_date) with random ones.

CREATE VIEW v3_mask AS
  SELECT
    gen_range(1, 1000) AS emp_id,
    FROM_DAYS(gen_range(715000, 731000)) AS birth_date,
    jobTitle,
    salary,
    city 
  FROM sensitive_data;
SELECT DISTINCT
    jobTitle,
    max(salary) OVER w AS Max,
    min(salary) OVER w AS Min,
    AVG(salary) OVER w AS Avg
FROM v3_mask
WINDOW w AS (
    PARTITION BY jobTitle
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
+--------------------+--------+-------+------------+
| jobTitle           | Max    | Min   | Avg        |
+--------------------+--------+-------+------------+
| Assistant Engineer |  78920 | 78920 | 78920.0000 |
| Engineer           |  88615 | 88615 | 88615.0000 |
| Manager            |  78801 | 51253 | 65027.0000 |
| Senior Engineer    |  86820 | 70970 | 78895.0000 |
| Staff              |  78742 | 69782 | 74262.0000 |
| Technique Leader   | 117957 | 56526 | 87241.5000 |
+--------------------+--------+-------+------------+

Et voilà!
As a conclusion, MySQL Enterprise Masking and De-Identification enables organization to:

  • Meet regulatory requirements and data privacy laws
  • Significantly reduce the risk of a data breach
  • Protect confidential information

To conclude this conclusion, I recommend to read Data Masking in MySQL blog post from the MySQL Server Blog.

MySQL Enterprise Edition

MySQL Enterprise Edition includes the most comprehensive set of advanced features, management tools and technical support to achieve the highest levels of MySQL scalability, security, reliability, and uptime.

It reduces the risk, cost, and complexity in developing, deploying, and managing business-critical MySQL applications.

MySQL Enterprise Edition server Trial Download (Note – Select Product Pack: MySQL Database).

MySQL Enterprise Edition

In order to go further

MySQL Security Series

  1. Password Validation Plugin
  2. Password Management
  3. User Account Locking
  4. The Connection-Control Plugins
  5. Enterprise Audit
  6. Enterprise Transparent Data Encryption (TDE)
  7. Enterprise Firewall
  8. Enterprise Data Masking and De-Identification

Reference Manual

MySQL Security

Blog posts

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

15

MySQL Functional Indexes

March 14, 2019
Sunset in Crete by Olivier DASINI

Since MySQL 5.7 one can put indexes on expressions, aka functional indexes, using generated columns. Basically you first need to use the generated column to define the functional expression, then indexed this column.

Quite useful when dealing with JSON functions, you can find an example here and the documentation there.

Starting with MySQL 8.0.13 we have now an easiest way to create functional indexes (or functional key parts as mentioned in the documentation) \o/

Let’s see how with a quick practical example.

Below salaries table structure:

mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0,00 sec)

It contains some data

mysql> SELECT count(*) FROM salaries;
+----------+
| count(*) |
+----------+
|  2844047 |
+----------+


mysql> SELECT * FROM salaries LIMIT 3;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  10001 |  60117 | 1986-06-26 | 1987-06-26 |
|  10001 |  62102 | 1987-06-26 | 1988-06-25 |
|  10001 |  66074 | 1988-06-25 | 1989-06-25 |
+--------+--------+------------+------------+

Let’s focus on the following query:
SELECT * FROM salaries WHERE YEAR(to_date)=1985

mysql> SELECT * FROM salaries WHERE YEAR(to_date)=1985;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  14688 |  42041 | 1985-07-06 | 1985-08-08 |
...snip...
| 498699 |  40000 | 1985-09-25 | 1985-09-28 |
+--------+--------+------------+------------+
89 rows in set (0,80 sec)


mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2838426
     filtered: 100.00
        Extra: Using where

We have a full table scan ( type: ALL), meaning no index is used. Perhaps because there is no index on column to_date… 😉
So let’s add an index on to_date !

mysql> ALTER TABLE salaries ADD INDEX idx_to_date (to_date);
Query OK, 0 rows affected (17,13 sec)
Records: 0  Duplicates: 0  Warnings: 0


mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `idx_to_date` (`to_date`),
  CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

And run again the query with the hope of a better execution plan

mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2838426
     filtered: 100.00
        Extra: Using where

Ouch! Still have a full table scan !
The index can’t be used because of the use of a function (YEAR()) on the indexed column (to_date).
BTW if you’re really surprise, maybe you should read this. 😉

This is the case when you need a functional index!

mysql> ALTER TABLE salaries ADD INDEX idx_year_to_date((YEAR(to_date)));
Query OK, 0 rows affected (20,04 sec)
Records: 0  Duplicates: 0  Warnings: 0

The syntax is very similar of the creation of a “regular” index. Although you must be aware of the double parentheses: (( <expression> ))
We can now see our new index named idx_year_to_date and the indexed expression year(to_date) :

mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `idx_to_date` (`to_date`),
  KEY `idx_year_to_date` ((year(`to_date`))),
  CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci


mysql> SELECT INDEX_NAME, EXPRESSION 
FROM INFORMATION_SCHEMA.STATISTICS 
WHERE TABLE_SCHEMA='employees' 
    AND TABLE_NAME = "salaries" 
    AND INDEX_NAME='idx_year_to_date';
+------------------+-----------------+
| INDEX_NAME       | EXPRESSION      |
+------------------+-----------------+
| idx_year_to_date | year(`to_date`) |
+------------------+-----------------+

Let’s test our query again

mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ref
possible_keys: idx_year_to_date
          key: idx_year_to_date
      key_len: 5
          ref: const
         rows: 89
     filtered: 100.00
        Extra: NULL


mysql> SELECT * FROM salaries WHERE YEAR(to_date)=1985;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  14688 |  42041 | 1985-07-06 | 1985-08-08 |
...snip...
| 498699 |  40000 | 1985-09-25 | 1985-09-28 |
+--------+--------+------------+------------+
89 rows in set (0,00 sec)

Here we go!
Now the query is able to use the index. And in this case we have a positive impact on the execution time.

It is also interesting to note that it is possible to use idx_to_date, the first index created (the non functional one) if we can rewrite the original query:

mysql> EXPLAIN SELECT * 
FROM salaries 
WHERE to_date BETWEEN '1985-01-01' AND '1985-12-31'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: range
possible_keys: idx_to_date
          key: idx_to_date
      key_len: 3
          ref: NULL
         rows: 89
     filtered: 100.00
        Extra: Using index condition


mysql> SELECT * 
FROM salaries 
WHERE to_date BETWEEN '1985-01-01' AND '1985-12-31'
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  20869 |  40000 | 1985-02-17 | 1985-03-01 |
...snip...
|  45012 |  66889 | 1985-08-16 | 1985-12-31 |
+--------+--------+------------+------------+
89 rows in set (0,00 sec)

This saves an index, I mean less indexes to maintain for the engine. Also speaking of maintenance cost, the cost to maintain a functional index is higher than the cost of a regular one.

In the other side the execution plan is less good (query cost higher) and obviously you must rewrite the query.

Requirements and restrictions.

A primary key cannot be a functional index:

mysql> CREATE TABLE t1 (i INT, PRIMARY KEY ((ABS(i))));
ERROR 3756 (HY000): The primary key cannot be a functional index

You can not index non-deterministic functions (RAND(), UNIX_TIMESTAMP(), NOW()…)

mysql> CREATE TABLE t1 (i int, KEY ((RAND(i))));
ERROR 3758 (HY000): Expression of functional index 'functional_index' contains a disallowed function.

SPATIAL and FULLTEXT indexes cannot have functional key parts.

Conclusion

Functional index is an interesting and a relevant feature, it could be very useful to optimize your queries without rewrite them and especially when dealing with JSON documents and other complex types.

Obviously all the details you must know are in the MySQL documentation: Functional Key Parts
If you interested in the high level architecture and the low level design please read the workload.



Thanks for using MySQL!

Follow me on twitter

6

30 mins with MySQL JSON functions

July 23, 2018
Tags: , ,

Lire cet article en français

Note: You may also be interested by 30 mins with JSON in MySQL

Note 2: Handling JSON documents could be also done with MySQL Document Store.

JSON (JavaScript Object Notation) is a popular way for moving data between various systems, including databases.  Starting with 5.7 MySQL supports a native JSON data type (internal binary format for efficiency) and a set of built-in JSON functions that allows you to perform operations on JSON documents.

This blog post is not a complete overview of the entire MySQL JSON functions set (RTFM instead) but rather an arbitrary presentation of some of them.

Note: MySQL 8 enables an alternative way of working with MySQL as a document store. (Not cover in this blog post).

I’m using MySQL 8.0.11, downloadable here.

MySQL native JSON data type

JSON Utility Functions

JSON_PRETTY

Improve readability with JSON_PRETTY

By default, display a JSON document in MySQL looks like something like this :

SELECT doc FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
doc: {"_id": "564b3259666906a86ea90a99", "name": "Dj Reynolds Pub And Restaurant", "grades": [{"date": {"$date": 1409961600000}, "grade": "A", "score": 2}, {"date": {"$date": 1374451200000}, "grade": "A", "score": 11}, {"date": {"$date": 1343692800000}, "grade": "A", "score": 12}, {"date": {"$date": 1325116800000}, "grade": "A", "score": 12}], "address": {"coord": [-73.98513559999999, 40.7676919], "street": "West   57 Street", "zipcode": "10019", "building": "351"}, "borough": "Manhattan", "cuisine": "Irish", "restaurant_id": "30191841"}

You can have a prettier display with JSON_PRETTY :

SELECT JSON_PRETTY(doc) FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
JSON_PRETTY(doc): {
  "_id": "564b3259666906a86ea90a99",
  "name": "Dj Reynolds Pub And Restaurant",
  "grades": [
    {
      "date": {
        "$date": 1409961600000
      },
      "grade": "A",
      "score": 2
    },
    {
      "date": {
        "$date": 1374451200000
      },
      "grade": "A",
      "score": 11
    },
    {
      "date": {
        "$date": 1343692800000
      },
      "grade": "A",
      "score": 12
    },
    {
      "date": {
        "$date": 1325116800000
      },
      "grade": "A",
      "score": 12
    }
  ],
  "address": {
    "coord": [
      -73.98513559999999,
      40.7676919
    ],
    "street": "West   57 Street",
    "zipcode": "10019",
    "building": "351"
  },
  "borough": "Manhattan",
  "cuisine": "Irish",
  "restaurant_id": "30191841"
}

JSON_STORAGE_SIZE

Return the number of bytes used to store the binary representation of a JSON document with JSON_STORAGE_SIZE.

SELECT max(JSON_STORAGE_SIZE(doc)) FROM restaurants;
+-----------------------------+
| max(JSON_STORAGE_SIZE(doc)) |
+-----------------------------+
|                         916 |
+-----------------------------+

SELECT avg(JSON_STORAGE_SIZE(doc)) FROM restaurants;
+-----------------------------+
| avg(JSON_STORAGE_SIZE(doc)) |
+-----------------------------+
|                    537.2814 |
+-----------------------------+

SELECT min(JSON_STORAGE_SIZE(doc)) FROM restaurants;
+-----------------------------+
| min(JSON_STORAGE_SIZE(doc)) |
+-----------------------------+
|                         255 |
+-----------------------------+

In this collection, the heavier document is 916 bytes, the lighter is 255 and the average size is 537.2814

Note: This is the space used to store the JSON document as it was inserted into the column, prior to any partial updates that may have been performed on it afterwards.

Functions That Search JSON Values

JSON_EXTRACT (->) / JSON_UNQUOTE / ->> operator

JSON_EXTRACT (or ->) returns data from a JSON document.

JSON_UNQUOTE unquotes JSON value and returns the result as a utf8mb4 string.

->> the JSON unquoting extraction operator is a shortcut for JSON_UNQUOTE(JSON_EXTRACT())

SELECT JSON_EXTRACT(doc, "$.cuisine") FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
JSON_EXTRACT(doc, "$.cuisine"): "Irish"


SELECT doc->"$.cuisine" FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
doc->"$.cuisine": "Irish"

Both queries above are similar.

If you want the same result but without quotes use ->> or JSON_UNQUOTE(JSON_EXTRACT()) :

SELECT JSON_UNQUOTE(JSON_EXTRACT(doc, "$.cuisine")) FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
JSON_UNQUOTE(JSON_EXTRACT(doc, "$.cuisine")): Irish


SELECT doc->>"$.cuisine" FROM restaurants LIMIT 1\G
doc->>"$.cuisine": Irish

Both queries above are similar.

JSON_CONTAINS

Search whether the value of specified key matches a specified value with JSON_CONTAINS.

SELECT count(*) 
FROM restaurants 
WHERE JSON_CONTAINS(doc, '"Creole"', '$.cuisine');
+----------+
| count(*) |
+----------+
|       24 |
+----------+


SELECT doc->>"$.name" 
FROM restaurants 
WHERE JSON_CONTAINS(doc, '"Creole"', '$.cuisine');
+-----------------------------------------------+
| doc->>"$.name"                                |
+-----------------------------------------------+
| Belvedere Restaurant                          |
| Chez Macoule Restaurant                       |
| Paradise Venus Restaurant                     |
| Heavenly Fritaille Restaurant                 |
| Yolie'S Bar & Restaurant                      |
| Yo-Yo Fritaille                               |
| Kal Bakery & Restaurant                       |
| Bon Appetit Restaurant                        |
| Katou Fin Restaurant                          |
| Alhpa Restaurant                              |
| Lakay Buffet Restaurant                       |
| La Tranquilite Restaurant                     |
| La Caye Restaurant                            |
| Nous Les Amis Restaurant & Bakery             |
| Yoyo Fritaille                                |
| Fresh Crown Restaurant                        |
| Tonel Restaurant & Lounge                     |
| Grace Devine Pastry And Restaurant Restaurant |
| Viva Bubble Tea                               |
| Cafe Creole Restaurant N Bakery               |
| Delly'S Place Restaurant & Fritaille          |
| Creole Plate                                  |
| Chez Nous Restaurant & Fritaille              |
| Combite Creole                                |
+-----------------------------------------------+

JSON_CONTAINS_PATH

Indicate whether a JSON document contains data at a given path or paths with JSON_CONTAINS_PATH.

Let’s insert a dummy document in the collection restaurants

INSERT INTO restaurants (doc) VALUES ('{"_id": "1234", "name": "Daz Restaurant", "cuisine": "West Indian", "restaurant_id": "4321"}');

How many documents without grades? :

SELECT count(*), JSON_CONTAINS_PATH(doc, 'one', '$.grades') cp 
FROM restaurants 
GROUP BY cp;
+----------+------+
| count(*) | cp   |
+----------+------+
|        1 |    0 |
|    25359 |    1 |
+----------+------+

Ok, only 1. We can easily check the structure of this document :

SELECT JSON_PRETTY(doc) 
FROM restaurants 
WHERE JSON_CONTAINS_PATH(doc, 'one', '$.grades') = 0\G
*************************** 1. row ***************************
JSON_PRETTY(doc): {
  "_id": "1234",
  "name": "Daz Restaurant",
  "cuisine": "West Indian",
  "restaurant_id": "4321"
}

A bridge between 2 models

To paraphrase David Stokes (MySQL Community Manager) in his book MySQL and JSON – A practical Programming Guide.

The advantages of traditional relational data and schemaless data are both large. But in some cases, data in a schema needs to be schemaless, or schemaless-data needs to be in a schema. 

Making such metamorphosis is very easy to do with MySQL!

Relational to JSON

JSON_OBJECT

Evaluates a list of key-value pairs and returns a JSON object containing those pairs with JSON_OBJECT.

A traditional SQL query with a relational result set. The JSON document output non-JSON data :

SELECT doc->>"$.name" 
FROM restaurants 
WHERE JSON_CONTAINS(doc, '"Creole"', '$.cuisine') 
LIMIT 2;
+-------------------------+
| doc->>"$.name"          |
+-------------------------+
| Belvedere Restaurant    |
| Chez Macoule Restaurant |
+-------------------------+

This result set could be convert in a JSON format, actually a JSON object :

SELECT JSON_OBJECT("Name", doc->>"$.name") 
FROM restaurants 
WHERE JSON_CONTAINS(doc, '"Creole"', '$.cuisine') 
LIMIT 2;
+-------------------------------------+
| JSON_OBJECT("Name", doc->>"$.name") |
+-------------------------------------+
| {"Name": "Belvedere Restaurant"}    |
| {"Name": "Chez Macoule Restaurant"} |
+-------------------------------------+

Other example :

SELECT Name, Population 
FROM City 
WHERE CountryCode='fra' 
ORDER BY Population DESC 
LIMIT 5;
+-----------+------------+
| Name      | Population |
+-----------+------------+
| Paris     |    2125246 |
| Marseille |     798430 |
| Lyon      |     445452 |
| Toulouse  |     390350 |
| Nice      |     342738 |
+-----------+------------+


SELECT JSON_OBJECT("CityName",Name, "CityPop", Population) 
FROM City 
WHERE CountryCode='fra' 
ORDER BY Population DESC 
LIMIT 5;
+-----------------------------------------------------+
| JSON_OBJECT("CityName",Name, "CityPop", Population) |
+-----------------------------------------------------+
| {"CityPop": 2125246, "CityName": "Paris"}           |
| {"CityPop": 798430, "CityName": "Marseille"}        |
| {"CityPop": 445452, "CityName": "Lyon"}             |
| {"CityPop": 390350, "CityName": "Toulouse"}         |
| {"CityPop": 342738, "CityName": "Nice"}             |
+-----------------------------------------------------+

JSON_OBJECTAGG

Takes two column names or expressions and returns a JSON object containing key-value pairs with JSON_OBJECTAGG.

Grouping rows are very often useful. This why we implemented some JSON aggregate functions like this one.

SELECT JSON_OBJECTAGG(Name, CountryCode) 
FROM City  
GROUP BY id 
ORDER BY RAND() 
LIMIT 5;
+-----------------------------------+
| JSON_OBJECTAGG(Name, CountryCode) |
+-----------------------------------+
| {"Reno": "USA"}                   |
| {"Hanam": "KOR"}                  |
| {"Laizhou": "CHN"}                |
| {"Yogyakarta": "IDN"}             |
| {"Tantoyuca": "MEX"}              |
+-----------------------------------+
  • Note
    • It’s usually not a good idea to use ORDER BY RAND(). It works like a charm for small dataset, but it’s a true performance killer with huge datasets.
    • The best practice is to do it in the application or pre-compute random value in the database.

JSON_ARRAY

Evaluate a list of values and returns a JSON array containing those values with JSON_ARRAY.

Next example is an hierarchical query using a recursive  Common Table Expression aka recursive CTE (or WITH Syntax)

WITH RECURSIVE emp_ext (id, name, path) AS ( 
    SELECT id, name, CAST(id AS CHAR(200)) 
    FROM employees 
    WHERE manager_id IS NULL 
    UNION ALL 
    SELECT s.id, s.name, CONCAT(m.path, ",", s.id) 
    FROM emp_ext m 
        JOIN employees s ON m.id=s.manager_id 
) 
SELECT id,name, path FROM emp_ext ORDER BY path;
+------+---------+-----------------+
| id   | name    | path            |
+------+---------+-----------------+
|  333 | Yasmina | 333             |
|  198 | John    | 333,198         |
|   29 | Pedro   | 333,198,29      |
| 4610 | Sarah   | 333,198,29,4610 |
|   72 | Pierre  | 333,198,29,72   |
|  692 | Tarek   | 333,692         |
|  123 | Adil    | 333,692,123     |
+------+---------+-----------------+

JSON format output with JSON_OBJECT & JSON_ARRAY :

WITH RECURSIVE emp_ext (id, name, path) AS ( 
    SELECT id, name, CAST(id AS CHAR(200)) 
    FROM employees 
    WHERE manager_id IS NULL 
    UNION ALL 
    SELECT s.id, s.name, CONCAT(m.path, ",", s.id) 
    FROM emp_ext m 
        JOIN employees s ON m.id=s.manager_id 
) 
SELECT JSON_OBJECT("ID",id, "Name",name, "Path", JSON_ARRAY(path)) 
FROM emp_ext 
ORDER BY path;
+-------------------------------------------------------------+
| JSON_OBJECT("ID",id, "Name",name, "Path", JSON_ARRAY(path)) |
+-------------------------------------------------------------+
| {"ID": 333, "Name": "Yasmina", "Path": ["333"]}             |
| {"ID": 198, "Name": "John", "Path": ["333,198"]}            |
| {"ID": 29, "Name": "Pedro", "Path": ["333,198,29"]}         |
| {"ID": 4610, "Name": "Sarah", "Path": ["333,198,29,4610"]}  |
| {"ID": 72, "Name": "Pierre", "Path": ["333,198,29,72"]}     |
| {"ID": 692, "Name": "Tarek", "Path": ["333,692"]}           |
| {"ID": 123, "Name": "Adil", "Path": ["333,692,123"]}        |
+-------------------------------------------------------------+

JSON_ARRAYAGG

Aggregate a result set as a single JSON array whose elements consist of the rows with JSON_ARRAYAGG.

With this other JSON aggregate functions we will see different combinations of JSON format output :

SELECT CountryCode, JSON_ARRAYAGG(City.Name) 
FROM City 
    JOIN Country ON (City.CountryCode=Country.Code) 
WHERE Continent='Europe' 
GROUP BY 1 
LIMIT 5;
+-------------+--------------------------------------------------------------------------------------------------------------+
| CountryCode | JSON_ARRAYAGG(City.Name)                                                                                     |
+-------------+--------------------------------------------------------------------------------------------------------------+
| ALB         | ["Tirana"]                                                                                                   |
| AND         | ["Andorra la Vella"]                                                                                         |
| AUT         | ["Graz", "Linz", "Salzburg", "Innsbruck", "Wien", "Klagenfurt"]                                              |
| BEL         | ["Antwerpen", "Brugge", "Gent", "Schaerbeek", "Charleroi", "Namur", "Liège", "Mons", "Bruxelles [Brussel]"]  |
| BGR         | ["Šumen", "Sofija", "Stara Zagora", "Plovdiv", "Pleven", "Varna", "Sliven", "Burgas", "Dobric", "Ruse"]      |
+-------------+--------------------------------------------------------------------------------------------------------------+
SELECT JSON_OBJECT("CountryCode",CountryCode), JSON_OBJECT("CityName",JSON_ARRAYAGG(City.Name)) 
FROM City 
    JOIN Country ON (City.CountryCode=Country.Code) 
WHERE Continent='Europe' 
GROUP BY 1 
LIMIT 5;
+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------+
| JSON_OBJECT("CountryCode",CountryCode) | JSON_OBJECT("CityName",JSON_ARRAYAGG(City.Name))                                                                           |
+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------+
| {"CountryCode": "ALB"}                 | {"CityName": ["Tirana"]}                                                                                                   |
| {"CountryCode": "AND"}                 | {"CityName": ["Andorra la Vella"]}                                                                                         |
| {"CountryCode": "AUT"}                 | {"CityName": ["Wien", "Graz", "Linz", "Salzburg", "Innsbruck", "Klagenfurt"]}                                              |
| {"CountryCode": "BEL"}                 | {"CityName": ["Schaerbeek", "Mons", "Namur", "Brugge", "Liège", "Antwerpen", "Charleroi", "Gent", "Bruxelles [Brussel]"]}  |
| {"CountryCode": "BGR"}                 | {"CityName": ["Burgas", "Šumen", "Dobric", "Sliven", "Pleven", "Stara Zagora", "Ruse", "Varna", "Plovdiv", "Sofija"]}      |
+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------+
SELECT JSON_OBJECT("Code",CountryCode, "CityName", JSON_ARRAYAGG(City.Name)) 
FROM City 
    JOIN Country ON (City.CountryCode=Country.Code) 
WHERE Continent='Europe' 
GROUP BY CountryCode 
LIMIT 5;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| JSON_OBJECT("Code",CountryCode, "CityName", JSON_ARRAYAGG(City.Name))                                                                     |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| {"Code": "ALB", "CityName": ["Tirana"]}                                                                                                   |
| {"Code": "AND", "CityName": ["Andorra la Vella"]}                                                                                         |
| {"Code": "AUT", "CityName": ["Graz", "Linz", "Salzburg", "Innsbruck", "Wien", "Klagenfurt"]}                                              |
| {"Code": "BEL", "CityName": ["Bruxelles [Brussel]", "Antwerpen", "Brugge", "Gent", "Schaerbeek", "Charleroi", "Namur", "Liège", "Mons"]}  |
| {"Code": "BGR", "CityName": ["Ruse", "Šumen", "Sofija", "Stara Zagora", "Plovdiv", "Pleven", "Varna", "Sliven", "Burgas", "Dobric"]}      |
+-------------------------------------------------------------------------------------------------------------------------------------------+

JSON to Relational

Ok let’s transform JSON data into relational data!

JSON_TABLE

Extract data from a JSON document and returns it as a relational table having the specified columns with JSON_TABLE.

Actually I highly recommend you to spend time in the documentation for this powerful function, that allows you to map JSON data into a temporary relational table and then query from this table.

Enough blabla, let’s see some examples :

SELECT GNP 
FROM countryinfo, JSON_TABLE(doc, "$" COLUMNS (GNP int PATH "$.GNP")) AS jst 
WHERE _id='FRA';
+---------+
| GNP     |
+---------+
| 1424285 |
+---------+
SELECT GNP, Name, LifeExpectancy 
FROM countryinfo, JSON_TABLE(doc, "$" COLUMNS (GNP int PATH "$.GNP", Name char(255) PATH "$.Name", LifeExpectancy int PATH "$.demographics.LifeExpectancy")) AS jst 
WHERE _id IN ('FRA', 'USA');
+---------+---------------+----------------+
| GNP     | Name          | LifeExpectancy |
+---------+---------------+----------------+
| 1424285 | France        |             79 |
| 8510700 | United States |             77 |
+---------+---------------+----------------+
SELECT name AS "Creole Cuisine" 
FROM restaurant.restaurants, JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine")) AS jst 
WHERE cuisine='Creole';
+-----------------------------------------------+
| Creole Cuisine                                |
+-----------------------------------------------+
| Belvedere Restaurant                          |
| Chez Macoule Restaurant                       |
| Paradise Venus Restaurant                     |
| Heavenly Fritaille Restaurant                 |
| Yolie'S Bar & Restaurant                      |
| Yo-Yo Fritaille                               |
| Kal Bakery & Restaurant                       |
| Bon Appetit Restaurant                        |
| Katou Fin Restaurant                          |
| Alhpa Restaurant                              |
| Lakay Buffet Restaurant                       |
| La Tranquilite Restaurant                     |
| La Caye Restaurant                            |
| Nous Les Amis Restaurant & Bakery             |
| Yoyo Fritaille                                |
| Fresh Crown Restaurant                        |
| Tonel Restaurant & Lounge                     |
| Grace Devine Pastry And Restaurant Restaurant |
| Viva Bubble Tea                               |
| Cafe Creole Restaurant N Bakery               |
| Delly'S Place Restaurant & Fritaille          |
| Creole Plate                                  |
| Chez Nous Restaurant & Fritaille              |
| Combite Creole                                |
+-----------------------------------------------+

JSON_TABLE – Nested Data

Walk down the JSON document path and retrieve nested data.

For example, extract all grades for Hawaiian cuisine restaurants :

SELECT name, cuisine, gradeID, grade 
FROM restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", NESTED PATH "$.grades[*]" COLUMNS (gradeID FOR ORDINALITY, grade char(20) PATH "$.grade"))) AS jst 
WHERE cuisine='Hawaiian';
+------------------+----------+---------+-------+
| name             | cuisine  | gradeID | grade |
+------------------+----------+---------+-------+
| Makana           | Hawaiian |       1 | C     |
| Makana           | Hawaiian |       2 | C     |
| Makana           | Hawaiian |       3 | A     |
| Makana           | Hawaiian |       4 | C     |
| Makana           | Hawaiian |       5 | A     |
| General Assembly | Hawaiian |       1 | A     |
| General Assembly | Hawaiian |       2 | A     |
| General Assembly | Hawaiian |       3 | A     |
| General Assembly | Hawaiian |       4 | A     |
| Onomea           | Hawaiian |       1 | A     |
| Onomea           | Hawaiian |       2 | A     |
+------------------+----------+---------+-------+

JSON_TABLE – Missing Data

Specify what to do when data is missing.

Default behavior :

SELECT name, cuisine, borough 
FROM restaurant.restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", borough char(100) PATH "$.borough")) AS jst  
LIMIT 2;
+--------------------------------+-------------+-----------+
| name                           | cuisine     | borough   |
+--------------------------------+-------------+-----------+
| Daz Restaurant                 | West Indian | NULL      |
| Dj Reynolds Pub And Restaurant | Irish       | Manhattan |
+--------------------------------+-------------+-----------+

Enforce the default behavior :

SELECT name, cuisine, borough 
FROM restaurant.restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", borough char(100) PATH "$.borough" NULL ON EMPTY)) AS jst 
LIMIT 2;
+--------------------------------+-------------+-----------+
| name                           | cuisine     | borough   |
+--------------------------------+-------------+-----------+
| Daz Restaurant                 | West Indian | NULL      |
| Dj Reynolds Pub And Restaurant | Irish       | Manhattan |
+--------------------------------+-------------+-----------+

Raise an error :

SELECT name, cuisine, borough 
FROM restaurant.restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", borough char(100) PATH "$.borough" ERROR ON EMPTY)) AS jst 
LIMIT 2;
ERROR 3665 (22035): Missing value for JSON_TABLE column 'borough'

Specify a default value :

SELECT name, cuisine, borough 
FROM restaurant.restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", borough char(100) PATH "$.borough" DEFAULT '"<UNKNOW>"' ON EMPTY)) AS jst 
LIMIT 2;
+--------------------------------+-------------+-----------+
| name                           | cuisine     | borough   |
+--------------------------------+-------------+-----------+
| Daz Restaurant                 | West Indian | <UNKNOW>  |
| Dj Reynolds Pub And Restaurant | Irish       | Manhattan |
+--------------------------------+-------------+-----------+

Wrapup

I’ll stop here this introduction to this rich MySQL JSON functions world. I presented a subset of these functions but it definitely worth to spend some time to discover the entire set e.g. how to create, modify, indexing, … JSON documents.

Furthermore, if your workload does not fit in the relational model, you should use the MySQL 8 Document Store, that provide a CRUD API and some other cool stuffs. I’ll blog about it soon, so stay tune!

Anyway I’ll recommend you to read : Top 10 reasons for NoSQL with MySQL.

Misc

Documentation

Articles

Other resources

  • You’ll find some of the sample databases used in this article here.
  • Restaurants collection could be find here.
  • Some books that could be useful : here.

Thanks for using MySQL!

4