dasini.net &#8211; Diary of a MySQL expert

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

Comments Off

Filed under:
Astuce, MySQL by Olivier DASINI

Constant-Folding Optimization in MySQL 8.0

May 7, 2019

Tags: Docker, Optimization, performance

TL;TR

In MySQL 8.0.16 the optimizer has improved again!
Comparisons of columns of numeric types with constant values are checked and folded or removed for invalid or out-of-rage values.
The goal is to speed up query execution.

The name of this article (Constant-Folding Optimization), named after this kind of optimization, is quite cryptic. Nevertheless the principle is simple and more important there is nothing to do from the user perspective.

What is “Constant-Folding Optimization” ?

From the MySQL Documentation :
Comparisons between constants and column values in which the constant value is out of range or of the wrong type with respect to the column type are now handled once during query optimization rather row-by-row than during execution.

From the MySQL Server Team Blog :
The goal is to speed up execution at the cost of a little more analysis at optimize time.
Always true and false comparisons are detected and eliminated.
In other cases, the type of the constant is adjusted to match that of the field if they are not the same, avoiding type conversion at execution time.

Clear enough?

One example is worth a thousand words, so let’s have a deeper look comparing the old behavior in MySQL 8.0.15 to the new one beginning with MySQL 8.0.16.

I’m using the optimized MySQL Server Docker images, created, maintained and supported by the MySQL team at Oracle.

Deployment of MySQL 8.0.15 & MySQL 8.0.16:

$ docker run --name=mysql_8.0.15 -e MYSQL_ROOT_PASSWORD=unsafe -d mysql/mysql-server:8.0.15

$ docker run --name=mysql_8.0.16 -e MYSQL_ROOT_PASSWORD=unsafe -d mysql/mysql-server:8.0.16

Note:
Obviously using a password on the command line interface can be insecure.
Please read the best practices of deploying MySQL on Linux with Docker.

Copy the test table dump file on 8.0.15 & 8.0.16:

$ docker cp ./testtbl.sql mysql_8.0.15:/tmp/testtbl.sql

$ docker cp ./testtbl.sql mysql_8.0.16:/tmp/testtbl.sql

Load the test table into 8.0.15 instance:

$ docker exec -it mysql_8.0.15 mysql -u root -p --prompt='mysql_8.0.15> '

Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 31
Server version: 8.0.15 MySQL Community Server - GPL

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql_8.0.15> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.15    |
+-----------+

mysql_8.0.15> CREATE SCHEMA test;
Query OK, 1 row affected (0.04 sec)

mysql_8.0.15> USE test
Database changed

mysql_8.0.15> source /tmp/testtbl.sql
... <snip> ...

Load the test table into 8.0.16 instance:

$ docker exec -it mysql_8.0.16 mysql -u root -p --prompt='mysql_8.0.16> '

Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 8.0.16 MySQL Community Server - GPL

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql_8.0.16> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.16    |
+-----------+

mysql_8.0.16> CREATE SCHEMA test;
Query OK, 1 row affected (0.04 sec)

mysql_8.0.16> USE test
Database changed

mysql_8.0.16> source /tmp/testtbl.sql
... <snip> ...

Let’s see what we have loaded:

mysql_8.0.16> SHOW CREATE TABLE testtbl\G
*************************** 1. row ***************************
       Table: testtbl
Create Table: CREATE TABLE `testtbl` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `val` varchar(36) NOT NULL,
  `val2` varchar(36) DEFAULT NULL,
  `val3` varchar(36) DEFAULT NULL,
  `val4` varchar(36) DEFAULT NULL,
  `num` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx2` (`val2`),
  KEY `idx3` (`val3`),
  KEY `idx4` (`val4`)
) ENGINE=InnoDB AUTO_INCREMENT=14220001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci


mysql_8.0.16> SELECT COUNT(*) FROM testtbl;
+----------+
| COUNT(*) |
+----------+
|  5000000 |
+----------+

What is important for us here is the non indexed column – num :

num int(10) unsigned DEFAULT NULL

It contains only positive numbers:

mysql_8.0.16> SELECT min(num), max(num) FROM testtbl;
+----------+----------+
| min(num) | max(num) |
+----------+----------+
|  9130001 | 14130000 |
+----------+----------+

The old behavior

What happens if I looking for a negative number, let’s say -12345, on the column num ?
Remember that it contains only positive numbers and there is no index.

mysql_8.0.15> EXPLAIN SELECT * FROM testtbl WHERE num=-12345\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testtbl
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4820634
     filtered: 10.00
        Extra: Using where

According to the EXPLAIN plan, we have a full table scan. In a way that makes sense because there is no index on num.
However we know that there is no negative value, so there is certainly some room for improvements 🙂

Running the query:

mysql_8.0.15> SELECT * FROM testtbl WHERE num=-12345;
Empty set (2.77 sec)

Indeed the full table scan could be costly.

The current behavior – 8.0.16+

The Constant-Folding Optimization improves the execution of this type of queries.

The EXPLAIN plan for MySQL 8.0.16 is completely different:

mysql_8.0.16> EXPLAIN SELECT * FROM testtbl WHERE num=-12345\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE

Did you notice the:

Extra: Impossible WHERE

Looking for the negative value in a strictly positive column was processed at the optimize time!
So they are obviously a positive impact on the query execution time:

mysql_8.0.16> SELECT * FROM testtbl WHERE num=-12345;
Empty set (0.00 sec)

Yay!

In addition to the = operator, this optimization is currently possible for >, >=, <, <=, =, <>, != and <=> as well.
e.g.

mysql_8.0.16> EXPLAIN SELECT * FROM testtbl WHERE num > -42 AND num <= -1 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE


mysql_8.0.16> SELECT * FROM testtbl WHERE num > -42 AND num <=  -1;
Empty set (0.00 sec)

Indexed column

As a side note, if your column is indexed the optimizer already have the relevant information, so before 8.0.16, no need of Constant-Folding Optimization, to have a fast query :).

mysql_8.0.15> CREATE INDEX idx_num ON testtbl(num);
Query OK, 0 rows affected (24.84 sec)
Records: 0  Duplicates: 0  Warnings: 0


mysql_8.0.15> EXPLAIN SELECT * FROM testtbl WHERE num = -12345\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: no matching row in const table
1 row in set, 1 warning (0.00 sec)


mysql_8.0.15> SELECT * FROM testtbl WHERE num = -12345;
Empty set (0.00 sec)

References

Thanks for using MySQL!

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

Comments Off

Filed under:
MySQL, optimisation, Performance by Olivier DASINI

MySQL InnoDB Cluster – HowTo #1 – Monitor your cluster

April 11, 2019

Tags: group replication

How do I… Monitor the status & the configuration of my cluster?

Short answer

Use:

status()

Long answer…

Assuming you already have a MySQL InnoDB Cluster up and running. If not, please RTFM 🙂
Additionally you can read this tutorial and this article from my colleague lefred or this one on Windows Platform from my colleague Ivan.

I’m using MySQL 8.0.15

MySQL localhost:33060+ JS> session.sql('SELECT VERSION()')
+-----------+
| VERSION() |
+-----------+
| 8.0.15    |
+-----------+

Let’s connect to my cluster

$ mysqlsh root@localhost --cluster

Please provide the password for 'root@localhost': ****
MySQL Shell 8.0.15

Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
Creating a session to 'root@localhost'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 1520 (X protocol)
Server version: 8.0.15 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
You are connected to a member of cluster 'pocCluster'.
Variable 'cluster' is set.
Use cluster.status() in scripting mode to get status of this cluster or cluster.help() for more commands.

The “– – cluster” argument enables cluster management by setting the global variable.
This variable is a reference to the MySQL InnoDB Cluster object session. It will give you access (among others) to the status() method that allows you to check and monitor the cluster.

Ask for help

The built-in help is simply awesome!

MySQL localhost:33060+ JS> cluster.help('status')
NAME
      status - Describe the status of the cluster.

SYNTAX
      <Cluster>.status([options])

WHERE
      options: Dictionary with options.

RETURNS
       A JSON object describing the status of the cluster.

DESCRIPTION
      This function describes the status of the cluster including its
      ReplicaSets and Instances. The following options may be given to control
      the amount of information gathered and returned.

      - extended: if true, includes information about transactions processed by
        connection and applier, as well as groupName and memberId values.
      - queryMembers: if true, connect to each Instance of the ReplicaSets to
        query for more detailed stats about the replication machinery.

EXCEPTIONS
      MetadataError in the following scenarios:

      - If the Metadata is inaccessible.
      - If the Metadata update operation failed.

Cluster status

So let’s discover the status of our cluster

MySQL localhost:33060+ JS> cluster.status()
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "172.19.0.11:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "172.19.0.11:3306": {
                "address": "172.19.0.11:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "172.19.0.12:3306": {
                "address": "172.19.0.12:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "172.19.0.13:3306": {
                "address": "172.19.0.13:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "172.19.0.11:3306"
}

Note:
The instance’s state in the cluster directly influences the information provided in the status report. Therefore ensure the instance you are connected to has a status of ONLINE.

As you can see, by default status() gives you a lot of relevant information.
Thus it could be used to monitor your cluster although the best tool available to monitor your MySQL InnoDB Cluster (but also MySQL Replication, MySQL NDB Cluster and obviously your standalone MySQL servers) is MySQL Enterprise Monitor.

More details with “A Guide to MySQL Enterprise Monitor“.

Extended cluster status

MySQL Group Replication provides several metrics and detailed information about the underlying cluster in MySQL InnoDB clusters.
These metrics which are used for monitoring are based on these Performance Schema tables.

Some of these information are available through MySQL Shell. You can control the amount of information gathered and returned with 2 options: extended & queryMembers.

extended

if enabled, includes information about groupName and memberID for each member; and general statistics about the number of transactions checked, proposed, rejected by members…

MySQL localhost:33060+ JS> cluster.status({extended:true})
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "groupName": "72568575-561c-11e9-914c-0242ac13000b", 
        "name": "default", 
        "primary": "172.19.0.11:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "172.19.0.11:3306": {
                "address": "172.19.0.11:3306", 
                "memberId": "4a85f6c4-561c-11e9-8401-0242ac13000b", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "appliedCount": 2, 
                    "checkedCount": 53, 
                    "committedAllMembers": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-51", 
                    "conflictsDetectedCount": 0, 
                    "inApplierQueueCount": 0, 
                    "inQueueCount": 0, 
                    "lastConflictFree": "72568575-561c-11e9-914c-0242ac13000b:56", 
                    "proposedCount": 53, 
                    "rollbackCount": 0
                }
            }, 
            "172.19.0.12:3306": {
                "address": "172.19.0.12:3306", 
                "memberId": "4ad75450-561c-11e9-baa8-0242ac13000c", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "appliedCount": 44, 
                    "checkedCount": 43, 
                    "committedAllMembers": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-41", 
                    "conflictsDetectedCount": 0, 
                    "inApplierQueueCount": 0, 
                    "inQueueCount": 0, 
                    "lastConflictFree": "72568575-561c-11e9-914c-0242ac13000b:52", 
                    "proposedCount": 0, 
                    "rollbackCount": 0
                }
            }, 
            "172.19.0.13:3306": {
                "address": "172.19.0.13:3306", 
                "memberId": "4b77c1ec-561c-11e9-9cc1-0242ac13000d", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "appliedCount": 42, 
                    "checkedCount": 42, 
                    "committedAllMembers": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-41", 
                    "conflictsDetectedCount": 0, 
                    "inApplierQueueCount": 0, 
                    "inQueueCount": 0, 
                    "lastConflictFree": "72568575-561c-11e9-914c-0242ac13000b:53", 
                    "proposedCount": 0, 
                    "rollbackCount": 0
                }
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "172.19.0.11:3306"
}

queryMembers

if enabled, includes information about recovery and regular transaction I/O, applier worker thread statistic and any lags; applier coordinator statistic…

MySQL localhost:33060+ JS> cluster.status({queryMembers:true})
{
    "clusterName": "pocCluster", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "172.19.0.11:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "172.19.0.11:3306": {
                "address": "172.19.0.11:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "connection": {
                        "lastHeartbeatTimestamp": "", 
                        "lastQueued": {
                            "endTimestamp": "2019-04-03 14:26:33.394755", 
                            "immediateCommitTimestamp": "", 
                            "immediateCommitToEndTime": null, 
                            "originalCommitTimestamp": "", 
                            "originalCommitToEndTime": null, 
                            "queueTime": 0.000077, 
                            "startTimestamp": "2019-04-03 14:26:33.394678", 
                            "transaction": "72568575-561c-11e9-914c-0242ac13000b:13"
                        }, 
                        "receivedHeartbeats": 0, 
                        "receivedTransactionSet": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-65", 
                        "threadId": null
                    }, 
                    "workers": [
                        {
                            "lastApplied": {
                                "applyTime": 0.022927, 
                                "endTimestamp": "2019-04-03 14:26:33.417643", 
                                "immediateCommitTimestamp": "", 
                                "immediateCommitToEndTime": null, 
                                "originalCommitTimestamp": "", 
                                "originalCommitToEndTime": null, 
                                "retries": 0, 
                                "startTimestamp": "2019-04-03 14:26:33.394716", 
                                "transaction": "72568575-561c-11e9-914c-0242ac13000b:13"
                            }, 
                            "threadId": 58
                        }
                    ]
                }
            }, 
            "172.19.0.12:3306": {
                "address": "172.19.0.12:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "connection": {
                        "lastHeartbeatTimestamp": "", 
                        "lastQueued": {
                            "endTimestamp": "2019-04-03 15:42:30.855989", 
                            "immediateCommitTimestamp": "", 
                            "immediateCommitToEndTime": null, 
                            "originalCommitTimestamp": "2019-04-03 15:42:30.854594", 
                            "originalCommitToEndTime": 0.001395, 
                            "queueTime": 0.000476, 
                            "startTimestamp": "2019-04-03 15:42:30.855513", 
                            "transaction": "72568575-561c-11e9-914c-0242ac13000b:65"
                        }, 
                        "receivedHeartbeats": 0, 
                        "receivedTransactionSet": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-65", 
                        "threadId": null
                    }, 
                    "workers": [
                        {
                            "lastApplied": {
                                "applyTime": 0.024685, 
                                "endTimestamp": "2019-04-03 15:42:30.880361", 
                                "immediateCommitTimestamp": "", 
                                "immediateCommitToEndTime": null, 
                                "originalCommitTimestamp": "2019-04-03 15:42:30.854594", 
                                "originalCommitToEndTime": 0.025767, 
                                "retries": 0, 
                                "startTimestamp": "2019-04-03 15:42:30.855676", 
                                "transaction": "72568575-561c-11e9-914c-0242ac13000b:65"
                            }, 
                            "threadId": 54
                        }
                    ]
                }
            }, 
            "172.19.0.13:3306": {
                "address": "172.19.0.13:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE", 
                "transactions": {
                    "connection": {
                        "lastHeartbeatTimestamp": "", 
                        "lastQueued": {
                            "endTimestamp": "2019-04-03 15:42:30.855678", 
                            "immediateCommitTimestamp": "", 
                            "immediateCommitToEndTime": null, 
                            "originalCommitTimestamp": "2019-04-03 15:42:30.854594", 
                            "originalCommitToEndTime": 0.001084, 
                            "queueTime": 0.000171, 
                            "startTimestamp": "2019-04-03 15:42:30.855507", 
                            "transaction": "72568575-561c-11e9-914c-0242ac13000b:65"
                        }, 
                        "receivedHeartbeats": 0, 
                        "receivedTransactionSet": "4a85f6c4-561c-11e9-8401-0242ac13000b:1-12,
72568575-561c-11e9-914c-0242ac13000b:1-65", 
                        "threadId": null
                    }, 
                    "workers": [
                        {
                            "lastApplied": {
                                "applyTime": 0.021354, 
                                "endTimestamp": "2019-04-03 15:42:30.877398", 
                                "immediateCommitTimestamp": "", 
                                "immediateCommitToEndTime": null, 
                                "originalCommitTimestamp": "2019-04-03 15:42:30.854594", 
                                "originalCommitToEndTime": 0.022804, 
                                "retries": 0, 
                                "startTimestamp": "2019-04-03 15:42:30.856044", 
                                "transaction": "72568575-561c-11e9-914c-0242ac13000b:65"
                            }, 
                            "threadId": 54
                        }
                    ]
                }
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "172.19.0.11:3306"
}

To summarize

Q: How do I monitor the status & the configuration of my cluster?

A: Use status() or status({extended:true}) or status({queryMembers:true})

References

Thanks for using MySQL!

Watch my videos on my YouTube channel and subscribe.

https://insidemysql.com/

Thanks for using HeatWave & MySQL!

Filed under:
Group Replication, High Availability, MySQL by Olivier DASINI

MySQL JSON Document Store

April 2, 2019

Tags: CRUD, Document Store, JSON, mongodb, NoSQL, Window Functions

Introduction

MySQL is the most popular Open Source database!
An ACID (acronym standing for Atomicity, Consistency, Isolation, and Durability) compliant relational database that allows you, among others, to manage your data with the powerful and proven SQL, to take care of your data integrity with transactions, foreign keys, …
But you already know that 🙂

JavaScript Objet Notation, better known as JSON, is a lightweight and very popular data-interchange format. Use for storing and exchanging data.
A JSON document is a standardized object that can represent structured data. And the structure is implicit in the document.
Anyway, I bet you know that too!

Started with MySQL 5.7.8, you can handle JSON documents in a “relational way”, using SQL queries and also storing them using the MySQL native JSON data type.
We also provides a large set of JSON functions.
I hope you were aware of that!

You should be interested in:

Note:
I would recommend you to have a closer look at JSON_TABLE function, that extract data from a JSON document and returns it as a relational table… It’s just amazing!

However MySQL 8.0 provides another way to handle JSON documents, actually in a “Not only SQL” (NoSQL) approach…
In other words, if you need/want to manage JSON documents (collections) in a non-relational manner, with CRUD (acronym for Create/Read/Update/Delete) operations then you can use MySQL 8.0!
Did you know that?

MySQL Document Store Architecture

Let’s have a quick overview of the architecture.

X Plugin – The X Plugin enables MySQL to use the X Protocol and uses Connectors and the Shell to act as clients to the server.
X Protocol – The X Protocol is a new client protocol based on top of the Protobuf library, and works for both, CRUD and SQL operations.
X DevAPI – The X DevAPI is a new, modern, async developer API for CRUD and SQL operations on top of X Protocol. It introduces Collections as new Schema objects. Documents are stored in Collections and have their dedicated CRUD operation set.
MySQL Shell – The MySQL Shell is an interactive Javascript, Python, or SQL interface supporting development and administration for the MySQL Server. You can use the MySQL Shell to perform data queries and updates as well as various administration operations.
MySQL Connectors – Connectors that support the X Protocol and enable you to use X DevAPI in your chosen language (Node.js – PHP – Python – Java – .NET – C++,…).

Write application using X DevAPI

As a disclaimer, I am not a developer, so sorry no fancy code in this blog post.
However the good news is that I can show you were you’ll be able to found the best MySQL developer resources ever 🙂 that is :

And to start, I recommend to focus on the following articles:

And of course the newest articles as well.
Furthermore, another resource that would be useful to you is the

X DevAPI User Guide

Use Document Store with MySQL Shell

If you are a DBA, OPS and obviously a developer, the simplest way to use (or test) MySQL Document Store, is with MySQL Shell.

MySQL Shell is an integrated development & administration shell where all MySQL products will be available through a common scripting interface.
If you don’t know it yet, please download it.
Trust me you are going to love it !

MySQL Shell key features are :

Scripting for Javascript, Python, and SQL mode
Supports MySQL Standard and X Protocols
Document and Relational Models
CRUD Document and Relational APIs via scripting
Traditional Table, JSON, Tab Separated output results formats
Both Interactive and Batch operations

Note:
MySQL Shell is also a key component of MySQL InnoDB Cluster. In this context, it allows you to deploy and manager a MySQL Group Replication cluster.
See my MySQL InnoDB Cluster tutorial.

First steps with MySQL Shell

Let’s connect to the MySQL Server with MySQL Shell (mysqlsh)

$ mysqlsh root@myHost
MySQL Shell 8.0.15
... snip ...
Your MySQL connection id is 15 (X protocol)
Server version: 8.0.15 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

We must be inside a X session in order to use MySQL as a document store. Luckily there is no extra step, because it’s the default in MySQL 8.0. Note that the default “X” port is 33060.
You can check that you are inside a X session thus using X protocol

MySQL myHost:33060+ JS> session
<Session:root@myHost:33060>

MySQL myHost:33060+ JS> \status
...snip...

Session type:                 X

Default schema:               
Current schema:               
Server version:               8.0.15 MySQL Community Server - GPL
Protocol version:             X protocol
...snip...

If you are connected inside a classic session, you’ll get the following input (note “<ClassicSession instead of <Session”) :

MySQL myHost:3306 JS> session
<ClassicSession:root@myHost:3306>

You can know what is you X protocol port by checking mysqlx_port variable.
I’ll switch to the MySQL Shell SQL mode to execute my SQL command:

MySQL myHost:3306 JS> \sql
Switching to SQL mode... Commands end with ;

MySQL myHost:3306 SQL> SHOW VARIABLES LIKE 'mysqlx_port';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| mysqlx_port   | 33060 |
+---------------+-------+

Then reconnect to the server using the right port (33060 by default) and you should be fine :

MySQL myHost:3306 SQL> \connect root@myHost:33060 
...snip...
Your MySQL connection id is 66 (X protocol)
Server version: 8.0.15 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

MySQL myHost:33060+ SQL> \js
Switching to JavaScript mode...

MySQL myHost:33060+ JS> session
<Session:root@myHost:33060>

CRUD

We are going to create a schema (demo) where we will do our tests

MySQL myHost:33060+ JS> session.createSchema('demo')
<Schema:demo>


MySQL myHost:33060+ JS> \use demo
Default schema `demo` accessible through db.

Note:
The MySQL Shell default language is JavaScript. However, all the steps described in this article can also be done in Python.
e.g.
JS> session.createSchema(‘demo’)
Py> session.create_schema(‘demo’)

Create documents

Create a collection (my_coll1) insert the schema demo and insert documents :

MySQL myHost:33060+ demo JS> db.createCollection('my_coll1');
<Collection:my_coll1>


MySQL myHost:33060+ demo JS> db.my_coll1.add({"title":"MySQL Document Store", "abstract":"SQL is now optional!", "code": "42"})
Query OK, 1 item affected (0.0358 sec)

Trying to add a non valid JSON document raise an error :

MySQL myHost:33060+ demo JS> db.my_coll1.add("This is not a valid JSON document")
CollectionAdd.add: Argument #1 expected to be a document, JSON expression or a list of documents (ArgumentError)

List collections

To get the list of collections belonging to the current schema use getCollections() :

MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:my_coll1>
]

Find documents

Display the content of a collection with find() :

MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "code": "42",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    }
]
1 document in set (0.0029 sec)

Note:
Each document requires an identifier field called _id. The value of the _id field must be unique among all documents in the same collection.
MySQL server sets an _id value if the document does not contain the _id field.
Please read: Understanding Document IDs.

You can execute many operations on your document. One practical way to get the list of available functions is to press the <TAB> key, to ask for auto-completion, after the dot “.”
For example, type db.my_coll1. then press <TAB>twice, you’ll get the following result:

MySQL myHost:33060+ demo JS> db.my_coll1.
add()               count()             dropIndex()         find()              getOne()            getSession()        modify()            remove()            replaceOne()        session
addOrReplaceOne()   createIndex()       existsInDatabase()  getName()           getSchema()         help()              name                removeOne()         schema

You can also use the awesome MySQL Shell built-in help (I strongly recommend my colleague Jesper‘s article) and please bookmark is blog.
Last but not least our documentations: X DevAPI User Guide, MySQL Shell JavaSCript API Reference & MySQL Shell Python API reference.

Modify documents

You’ll need the modify() function :

MySQL myHost:33060+ demo JS> db.my_coll1.find("_id='00005c9514e60000000000000053'").fields("code")
[
    {
        "code": "42"
    }
]


MySQL myHost:33060+ demo JS> db.my_coll1.modify("_id='00005c9514e60000000000000053'").set("code","2019")
Query OK, 1 item affected (0.0336 sec)
Rows matched: 1  Changed: 1  Warnings: 0


MySQL myHost:33060+ demo JS> db.my_coll1.find("_id='00005c9514e60000000000000053'").fields("code")
[
    {
        "code": "2019"
    }
]

Remove content from documents

You can also modify the structure of a document by remove a key and its content with modify() and unset().

MySQL myHost:33060+ demo JS> db.my_coll1.add({"title":"Quote", "message": "Strive for greatness"})
Query OK, 1 item affected (0.0248 sec)

MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "code": "42",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    },
    {
        "_id": "00005c9514e60000000000000054",
        "title": "Quote",
        "message": "Strive for greatness"
    }
]
2 documents in set (0.0033 sec)

MySQL myHost:33060+ demo JS> db.my_coll1.modify("_id='00005c9514e60000000000000054'").unset("title")
Query OK, 1 item affected (0.0203 sec)

Rows matched: 1  Changed: 1  Warnings: 0

MySQL myHost:33060+ demo JS> db.my_coll1.find("_id='00005c9514e60000000000000054'")
[
    {
        "_id": "00005c9514e60000000000000054",
        "message": "Strive for greatness"
    }
]

Remove documents

We are missing one last important operation, delete documents with remove()

MySQL myHost:33060+ demo JS> db.my_coll1.remove("_id='00005c9514e60000000000000054'")
Query OK, 1 item affected (0.0625 sec)


MySQL myHost:33060+ demo JS> db.my_coll1.find("_id='00005c9514e60000000000000054'")
Empty set (0.0003 sec)

You can also remove all documents in a collection with one command. To do so, use the remove(“true”) method without specifying any search condition.
Obviously it is usually not a good practice…

Import JSON dcouments

Let’s work with a bigger JSON collection.
MySQL Shell provide a very convenient tool, named importJson(), to easily import JSON documents inside your MySQL Server either in the form of collection or table.

MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:my_coll1>
]


MySQL myHost:33060+ demo JS> util.importJson('GoT_episodes.json')
Importing from file "GoT_episodes.json" to collection `demo`.`GoT_episodes` in MySQL Server at myHost:33060

.. 73.. 73
Processed 47.74 KB in 73 documents in 0.1051 sec (694.75 documents/s)
Total successfully imported documents 73 (694.75 documents/s)


MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:GoT_episodes>, 
    <Collection:my_coll1>
]

You can find the JSON file source here.
Note that I had to do an extra step before import the data:
sed ‘s/}}},{“id”/}}} {“id”/g’ got_episodes.json.BAK > got_episodes.json

By the way you can import data from MongoDB to MySQL \o/

No more excuses to finally get rid of MongoDB 😉

Let’s do some queries…

Display 1 document

MySQL myHost:33060+ demo JS> db.GoT_episodes.find().limit(1)
[
    {
        "id": 4952,
        "_id": "00005c9514e6000000000000009e",
        "url": "http://www.tvmaze.com/episodes/4952/game-of-thrones-1x01-winter-is-coming",
        "name": "Winter is Coming",
        "image": {
            "medium": "http://static.tvmaze.com/uploads/images/medium_landscape/1/2668.jpg",
            "original": "http://static.tvmaze.com/uploads/images/original_untouched/1/2668.jpg"
        },
        "_links": {
            "self": {
                "href": "http://api.tvmaze.com/episodes/4952"
            }
        },
        "number": 1,
        "season": 1,
        "airdate": "2011-04-17",
        "airtime": "21:00",
        "runtime": 60,
        "summary": "<p>Lord Eddard Stark, ruler of the North, is summoned to court by his old friend, King Robert Baratheon, to serve as the King's Hand. Eddard reluctantly agrees after learning of a possible threat to the King's life. Eddard's bastard son Jon Snow must make a painful decision about his own future, while in the distant east Viserys Targaryen plots to reclaim his father's throne, usurped by Robert, by selling his sister in marriage.</p>",
        "airstamp": "2011-04-18T01:00:00+00:00"
    }
]

Looks like data relative to a famous TV show 🙂

All episodes from season 1

MySQL myHost:33060+ demo JS> db.GoT_episodes.find("season=1").fields("name", "summary", "airdate").sort("number")
[
    {
        "name": "Winter is Coming",
        "airdate": "2011-04-17",
        "summary": "<p>Lord Eddard Stark, ruler of the North, is summoned to court by his old friend, King Robert Baratheon, to serve as the King's Hand. Eddard reluctantly agrees after learning of a possible threat to the King's life. Eddard's bastard son Jon Snow must make a painful decision about his own future, while in the distant east Viserys Targaryen plots to reclaim his father's throne, usurped by Robert, by selling his sister in marriage.</p>"
    },
    {
        "name": "The Kingsroad",
        "airdate": "2011-04-24",
        "summary": "<p>An incident on the Kingsroad threatens Eddard and Robert's friendship. Jon and Tyrion travel to the Wall, where they discover that the reality of the Night's Watch may not match the heroic image of it.</p>"
    },
    {
        "name": "Lord Snow",
        "airdate": "2011-05-01",
        "summary": "<p>Jon Snow attempts to find his place amongst the Night's Watch. Eddard and his daughters arrive at King's Landing.</p>"
    },
    {
        "name": "Cripples, Bastards, and Broken Things",
        "airdate": "2011-05-08",
        "summary": "<p>Tyrion stops at Winterfell on his way home and gets a frosty reception from Robb Stark. Eddard's investigation into the death of his predecessor gets underway.</p>"
    },
    {
        "name": "The Wolf and the Lion",
        "airdate": "2011-05-15",
        "summary": "<p>Catelyn's actions on the road have repercussions for Eddard. Tyrion enjoys the dubious hospitality of the Eyrie.</p>"
    },
    {
        "name": "A Golden Crown",
        "airdate": "2011-05-22",
        "summary": "<p>Viserys is increasingly frustrated by the lack of progress towards gaining his crown.</p>"
    },
    {
        "name": "You Win or You Die",
        "airdate": "2011-05-29",
        "summary": "<p>Eddard's investigations in King's Landing reach a climax and a dark secret is revealed.</p>"
    },
    {
        "name": "The Pointy End",
        "airdate": "2011-06-05",
        "summary": "<p>Tyrion joins his father's army with unexpected allies. Events in King's Landing take a turn for the worse as Arya's lessons are put to the test.</p>"
    },
    {
        "name": "Baelor",
        "airdate": "2011-06-12",
        "summary": "<p>Catelyn must negotiate with the irascible Lord Walder Frey.</p>"
    },
    {
        "name": "Fire and Blood",
        "airdate": "2011-06-19",
        "summary": "<p>Daenerys must realize her destiny. Jaime finds himself in an unfamiliar predicament.</p>"
    }
]

First episode of each season

MySQL myHost:33060+ demo JS> db.GoT_episodes.find("number=1").fields("name", "airdate", "season").sort("season")
[
    {
        "name": "Winter is Coming",
        "season": 1,
        "airdate": "2011-04-17"
    },
    {
        "name": "The North Remembers",
        "season": 2,
        "airdate": "2012-04-01"
    },
    {
        "name": "Valar Dohaeris",
        "season": 3,
        "airdate": "2013-03-31"
    },
    {
        "name": "Two Swords",
        "season": 4,
        "airdate": "2014-04-06"
    },
    {
        "name": "The Wars to Come",
        "season": 5,
        "airdate": "2015-04-12"
    },
    {
        "name": "The Red Woman",
        "season": 6,
        "airdate": "2016-04-24"
    },
    {
        "name": "Dragonstone",
        "season": 7,
        "airdate": "2017-07-16"
    },
    {
        "name": "TBA",
        "season": 8,
        "airdate": "2019-04-14"
    }
]
8 documents in set (0.0047 sec)

CRUD Prepared Statements

A common pattern with document store datastores is to repeatedly execute the same (or similar) kind of simple queries (e.g. “id” based lookup).
These queries can be accelerated using prepared (CRUD) statements.

For example if your application often use the following query:

MySQL myHost:33060+ demo JS> db.GoT_episodes.find("number=1 AND season=1").fields("name", "airdate")
[
    {
        "name": "Winter is Coming",
        "airdate": "2011-04-17"
    }
]

So it’s probably a good idea to use prepared statements.
First we need to prepare the query:

// Prepare a statement using a named parameter
var gotEpisode = db.GoT_episodes.find("number = :episodeNum AND season = :seasonNum").fields("name", "airdate")

Then bind the value to the parameter :

MySQL myHost:33060+ demo JS> gotEpisode.bind('episodeNum', 1).bind('seasonNum', 1)
[
    {
        "name": "Winter is Coming",
        "airdate": "2011-04-17"
    }
]

MySQL myHost:33060+ demo JS> gotEpisode.bind('episodeNum', 7).bind('seasonNum', 3)
[
    {
        "name": "The Bear and the Maiden Fair",
        "airdate": "2013-05-12"
    }
]

Simply powerful!

Index

Indeed relevant indexes is a common practice to improve performances. MySQL Document Store allows you to index your keys inside the JSON document.

Add a composite Index on keys season AND episode.

MySQL myHost:33060+ demo JS> db.GoT_episodes.createIndex('idxSeasonEpisode', {fields: [{field: "$.season", type: "TINYINT UNSIGNED", required: true}, {field: "$.number", type: "TINYINT UNSIGNED", required: true}]})
Query OK, 0 rows affected (0.1245 sec)

The required: true option means that it’s mandatory for all documents to contains at least the keys number and season.
E.g.

MySQL myHost:33060+ demo JS> db.GoT_episodes.add({"name": "MySQL 8 is Great"})
ERROR: 5115: Document is missing a required field


MySQL myHost:33060+ demo JS> db.GoT_episodes.add({"name": "MySQL 8 is Great", "number": 8})
ERROR: 5115: Document is missing a required field


MySQL myHost:33060+ demo JS> db.GoT_episodes.add({"name": "MySQL 8 is Great", "season": 8})
ERROR: 5115: Document is missing a required field

Add an index on key summary (30 first characters)

MySQL myHost:33060+ demo JS> db.GoT_episodes.createIndex('idxSummary', {fields: [{field: "$.summary", type: "TEXT(30)"}]})
Query OK, 0 rows affected (0.1020 sec)

Add a Unique Index on key id
Not the one generated by MySQL called _id and already indexed (primary key)

MySQL myHost:33060+ demo JS> db.GoT_episodes.createIndex('idxId', {fields: [{field: "$.id", type: "INT UNSIGNED"}], unique: true})
Query OK, 0 rows affected (0.3379 sec)

The unique: true option means that values of key id must be unique for each document inside the collection. i.e. no duplicate values.
E.g.

MySQL myHost:33060+ demo JS> db.GoT_episodes.add({"id":4952, "number": 42, "season": 42 })
ERROR: 5116: Document contains a field value that is not unique but required to be

You can obviously drop an index, using dropIndex().
E.g. db.GoT_episodes.dropIndex(“idxSummary”)

Transactions

MySQL Document Store is full ACID, it relies on the proven InnoDB’s strength & robustness.

Yes, you get it right, We do care about your data!

You need the functions below:

Let’s see an example with a multi collection transactions that will be rollback.

// Start the transaction
session.startTransaction()

MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "code": "42",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    }
]
1 document in set (0.0033 sec)


// Modify a document in collection my_coll1
MySQL myHost:33060+ demo JS> db.my_coll1.modify("_id = '00005c9514e60000000000000053'").unset("code")
Query OK, 1 item affected (0.0043 sec)
Rows matched: 1  Changed: 1  Warnings: 0


//Collection 1 : my_coll1
// Add a new document in my_coll1
MySQL myHost:33060+ demo JS> db.my_coll1.add({"title":"Quote", "message": "Be happy, be bright, be you"})
Query OK, 1 item affected (0.0057 sec)


MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    },
    {
        "_id": "00005c9514e600000000000000e7",
        "title": "Quote",
        "message": "Be happy, be bright, be you"
    }
]
2 documents in set (0.0030 sec)



// Collection 2 : GoT_episodes
// Number of documents in GoT_episodes
MySQL myHost:33060+ demo JS> db.GoT_episodes.count()
73


// Remove all the 73 documents from GoT_episodes
MySQL myHost:33060+ demo JS> db.GoT_episodes.remove("true")
Query OK, 73 items affected (0.2075 sec)


// Empty collection
MySQL myHost:33060+ demo JS> db.GoT_episodes.count()
0



// Finally want my previous status back
// Rollback the transaction (if necessary e.g. in case of an error)
MySQL myHost:33060+ demo JS> session.rollback() 
Query OK, 0 rows affected (0.0174 sec)

Tadam!!!
We back in the past 🙂

MySQL myHost:33060+ demo JS> db.my_coll1.find()
[
    {
        "_id": "00005c9514e60000000000000053",
        "code": "42",
        "title": "MySQL Document Store",
        "abstract": "SQL is now optional!"
    }
]
1 document in set (0.0028 sec)


MySQL myHost:33060+ demo JS> db.GoT_episodes.count()
73

Execute (complex) SQL queries

From the MySQL server point of view, collections are tables as well, like regular tables.
And this is very powerful !!!

Powerful because that allow you, within the same datastore (MySQL), to do CRUD queries and SQL queries on the same dataset.
Powerful because that allow you, to have your OLTP CRUD workload and your analytics SQL workload at the same place.
So no need to transfer/sync/… data from 1 datastore to another anymore!!!

You can do SQL queries using sql() functions:

MySQL myHost:33060+ demo JS> session.sql("SELECT count(*) FROM GoT_episodes")
+----------+
| count(*) |
+----------+
|       73 |
+----------+

You can also do SQL queries just as you have done until now, using the rich set of MySQL JSON functions.
OK let’s have a closer look.

Remember this CRUD query?

MySQL myHost:33060+ demo JS> db.GoT_episodes.find("number=1 AND season=1").fields("name", "airdate")
[
    {
        "name": "Winter is Coming",
        "airdate": "2011-04-17"
    }
]

Its SQL query alter ego is :

MySQL myHost:33060+ demo JS> \sql

MySQL myHost:33060+ demo SQL> 
SELECT doc->>"$.name" AS name, doc->>"$.airdate" AS airdate 
FROM GoT_episodes 
WHERE doc->>"$.number" = 1 AND doc->>"$.season" = 1\G
*************************** 1. row ***************************
   name: Winter is Coming
airdate: 2011-04-17

Let’s do some SQL queries…

Number of episodes by season

MySQL myHost:33060+ demo SQL> 
SELECT doc->>"$.season", COUNT(doc->>"$.number") 
FROM GoT_episodes 
GROUP BY doc->>"$.season";
+------------------+-------------------------+
| doc->>"$.season" | count(doc->>"$.number") |
+------------------+-------------------------+
| 1                |                      10 |
| 2                |                      10 |
| 3                |                      10 |
| 4                |                      10 |
| 5                |                      10 |
| 6                |                      10 |
| 7                |                       7 |
| 8                |                       6 |
+------------------+-------------------------+

Episode statistics for each season

MySQL myHost:33060+ demo SQL> 
SELECT DISTINCT
    doc->>"$.season" AS Season,
    max(doc->>"$.runtime") OVER w AS "Max duration",
    min(doc->>"$.runtime") OVER w AS "Min duration",
    AVG(doc->>"$.runtime") OVER w AS "Avg duration"
FROM GoT_episodes
WINDOW w AS (
    PARTITION BY doc->>"$.season"
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
+--------+--------------+--------------+--------------+
| Season | Max duration | Min duration | Avg duration |
+--------+--------------+--------------+--------------+
| 1      | 60           | 60           |           60 |
| 2      | 60           | 60           |           60 |
| 3      | 60           | 60           |           60 |
| 4      | 60           | 60           |           60 |
| 5      | 60           | 60           |           60 |
| 6      | 69           | 60           |         60.9 |
| 7      | 60           | 60           |           60 |
| 8      | 90           | 60           |           80 |
+--------+--------------+--------------+--------------+

Statistics on the number of days between episodes

MySQL myHost:33060+ demo SQL> 
SELECT
    doc->>"$.airdate" AS airdate, 
    DATEDIFF(doc->>"$.airdate", lag(doc->>"$.airdate") OVER w) AS "Delta days between episode",
    DATEDIFF(doc->>"$.airdate", first_value(doc->>"$.airdate") OVER w) AS "Total days since 1st episode"
FROM GoT_episodes
    WINDOW w AS (ORDER BY doc->>"$.airdate")
;
+------------+----------------------------+------------------------------+
| airdate    | Delta days between episode | Total days since 1st episode |
+------------+----------------------------+------------------------------+
| 2011-04-17 |                       NULL |                            0 |
| 2011-04-24 |                          7 |                            7 |
| 2011-05-01 |                          7 |                           14 |
| 2011-05-08 |                          7 |                           21 |
| 2011-05-15 |                          7 |                           28 |
| 2011-05-22 |                          7 |                           35 |
| 2011-05-29 |                          7 |                           42 |
| 2011-06-05 |                          7 |                           49 |
| 2011-06-12 |                          7 |                           56 |
| 2011-06-19 |                          7 |                           63 |
| 2012-04-01 |                        287 |                          350 |
| 2012-04-08 |                          7 |                          357 |
| 2012-04-15 |                          7 |                          364 |
| 2012-04-22 |                          7 |                          371 |
| 2012-04-29 |                          7 |                          378 |
| 2012-05-06 |                          7 |                          385 |
| 2012-05-13 |                          7 |                          392 |
| 2012-05-20 |                          7 |                          399 |
| 2012-05-27 |                          7 |                          406 |
| 2012-06-03 |                          7 |                          413 |
| 2013-03-31 |                        301 |                          714 |
| 2013-04-07 |                          7 |                          721 |
| 2013-04-14 |                          7 |                          728 |
| 2013-04-21 |                          7 |                          735 |
| 2013-04-28 |                          7 |                          742 |
| 2013-05-05 |                          7 |                          749 |
| 2013-05-12 |                          7 |                          756 |
| 2013-05-19 |                          7 |                          763 |
| 2013-06-02 |                         14 |                          777 |
| 2013-06-09 |                          7 |                          784 |
| 2014-04-06 |                        301 |                         1085 |
| 2014-04-13 |                          7 |                         1092 |
| 2014-04-20 |                          7 |                         1099 |
| 2014-04-27 |                          7 |                         1106 |
| 2014-05-04 |                          7 |                         1113 |
| 2014-05-11 |                          7 |                         1120 |
| 2014-05-18 |                          7 |                         1127 |
| 2014-06-01 |                         14 |                         1141 |
| 2014-06-08 |                          7 |                         1148 |
| 2014-06-15 |                          7 |                         1155 |
| 2015-04-12 |                        301 |                         1456 |
| 2015-04-19 |                          7 |                         1463 |
| 2015-04-26 |                          7 |                         1470 |
| 2015-05-03 |                          7 |                         1477 |
| 2015-05-10 |                          7 |                         1484 |
| 2015-05-17 |                          7 |                         1491 |
| 2015-05-24 |                          7 |                         1498 |
| 2015-05-31 |                          7 |                         1505 |
| 2015-06-07 |                          7 |                         1512 |
| 2015-06-14 |                          7 |                         1519 |
| 2016-04-24 |                        315 |                         1834 |
| 2016-05-01 |                          7 |                         1841 |
| 2016-05-08 |                          7 |                         1848 |
| 2016-05-15 |                          7 |                         1855 |
| 2016-05-22 |                          7 |                         1862 |
| 2016-05-29 |                          7 |                         1869 |
| 2016-06-05 |                          7 |                         1876 |
| 2016-06-12 |                          7 |                         1883 |
| 2016-06-19 |                          7 |                         1890 |
| 2016-06-26 |                          7 |                         1897 |
| 2017-07-16 |                        385 |                         2282 |
| 2017-07-23 |                          7 |                         2289 |
| 2017-07-30 |                          7 |                         2296 |
| 2017-08-06 |                          7 |                         2303 |
| 2017-08-13 |                          7 |                         2310 |
| 2017-08-20 |                          7 |                         2317 |
| 2017-08-27 |                          7 |                         2324 |
| 2019-04-14 |                        595 |                         2919 |
| 2019-04-21 |                          7 |                         2926 |
| 2019-04-28 |                          7 |                         2933 |
| 2019-05-05 |                          7 |                         2940 |
| 2019-05-12 |                          7 |                         2947 |
| 2019-05-19 |                          7 |                         2954 |
+------------+----------------------------+------------------------------+
73 rows in set (0.0066 sec)

Note:
Hey buddy, aren’t Window Functions very cool?
More here and here.

Drop collections

Use dropCollection() :

MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:GoT_episodes>, 
    <Collection:my_coll1>
]


MySQL myHost:33060+ demo JS> db.dropCollection("my_coll1")
MySQL myHost:33060+ demo JS> db.getCollections()
[
    <Collection:GoT_episodes>
]

Conclusion

Wow!
Probably one of my longest article, but I wanted to be sure to give you a large overview of MySQL Document Store (although not exhaustive) from a point of view of a non developer.

Now it is your turn to give it a try 🙂

NoSQL + SQL = MySQL

In order to go further

Some useful link:

Thanks for using MySQL!

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

Filed under:
Document Store, json, MySQL, NoSQL, Tuto by Olivier DASINI

MySQL Security – MySQL Enterprise Data Masking and De-Identification

March 19, 2019

Tags: Data Masking, De-Identification, GDPR, HIPAA, MySQL Enterprise, PCI DSS, security

When thinking about security within a MySQL installation, you should consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications. MySQL provides many tools / features / plugins in order to protect your data including some advanced features like Transparent Data Encryption aka TDE, Audit, Data Masking & De-Identification, Firewall, Password Management, Password Validation Plugin, etc…

In order to mitigate the effects of data breaches, and therefore the associated risks for your organization’s brand and reputation, popular regulations or standards including GDPR, PCI DSS, HIPAA,… recommand (among others things) data masking and de-identification.

According to Wikipedia:

Data masking or data obfuscation is the process of hiding original data with modified content (characters or other data.)
De-identification is the process used to prevent a person’s identity from being connected with information. For example, data produced during human subject research might be de-identified to preserve research participants’ privacy.

In other words, MySQL Enterprise Data Masking and De-Identification hides sensitive information by replacing real values with substitutes in order to protect sensitive data while they are still look real and consistent.

This the topic of this eight episode of this MySQL Security series (URLs to all the articles at the end of this page).

MySQL Enterprise Data Masking and De-Identification

The simplest way to present this MySQL feature :
“A built-in database solution to help organizations protect sensitive data from unauthorized uses“

*MySQL Enterprise Masking and De-identificaiton protects sensitive data from unauthorized users.*

Note:
MySQL Enterprise Data Masking and De-Identification is an extension included in MySQL Enterprise Edition, a commercial product.
Available in MySQL 8.0, as of 8.0.13 and in MySQL 5.7, as of 5.7.24.

First step, installation.

Installation

MySQL Enterprise Data Masking and De-Identification, is implemented as a plugin library file containing a plugin and user-defined functions (UDFs).
As usual install is easy:

mysql> 
INSTALL PLUGIN data_masking SONAME 'data_masking.so';
CREATE FUNCTION gen_blacklist RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary_drop RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary_load RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_range RETURNS INTEGER  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_email RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_pan RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_ssn RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_us_phone RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_inner RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_outer RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_pan RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_pan_relaxed RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_ssn RETURNS STRING  SONAME 'data_masking.so';

You can check the activation of the data masking plugin:

mysql> 
SELECT PLUGIN_NAME, PLUGIN_STATUS, PLUGIN_VERSION, PLUGIN_LIBRARY, PLUGIN_DESCRIPTION 
FROM INFORMATION_SCHEMA.PLUGINS 
WHERE PLUGIN_NAME='data_masking'\G
*************************** 1. row ***************************
       PLUGIN_NAME: data_masking
     PLUGIN_STATUS: ACTIVE
    PLUGIN_VERSION: 0.1
    PLUGIN_LIBRARY: data_masking.so
PLUGIN_DESCRIPTION: Data masking facilities

Note:
If the plugin and UDFs are used on a master replication server, install them on all slave servers as well to avoid replication problems.

Uninstall is simple as well, uninstall the plugin and drop the UDFs:

mysql>
UNINSTALL PLUGIN data_masking;
DROP FUNCTION gen_blacklist;
DROP FUNCTION gen_dictionary;
DROP FUNCTION gen_dictionary_drop;
DROP FUNCTION gen_dictionary_load;
DROP FUNCTION gen_range;
DROP FUNCTION gen_rnd_email;
DROP FUNCTION gen_rnd_pan;
DROP FUNCTION gen_rnd_ssn;
DROP FUNCTION gen_rnd_us_phone;
DROP FUNCTION mask_inner;
DROP FUNCTION mask_outer;
DROP FUNCTION mask_pan;
DROP FUNCTION mask_pan_relaxed;
DROP FUNCTION mask_ssn;

Now we’re ready to play!

Data Generation

One of the nice “side feature” of MySQL Data Masking and De-Identification is the ability to generate business relevant datasets. Because it is not always possible to test/simulate your application on your real dataset (indeed playing with customer credit card or security social numbers is a very bad practice) this feature is very convenient.

Generating Random Data with Specific Characteristics

Several functions are available. They start with these 4 first characters: gen_ and you’ll find the complete list here.
In this article I’ll use the following functions :

gen_range() : returns a random integer selected from a given range.
gen_rnd_email() : returns a random email address in the example.com domain.
gen_rnd_pan() : returns a random payment card Primary Account Number.
gen_rnd_us_phone() : returns a random U.S. phone number in the 555 area code not used for legitimate numbers.

Generating Random Data Using Dictionaries

Sometime you will need data with better quality. So another way to generate a relevant dataset is to use dictionaries.

Again several functions are available. They also start with these 4 first characters: gen_ and you’ll find the complete list here.
I’ll use the following functions :

gen_dictionary_load() : Loads a file into the dictionary registry and assigns the dictionary a name to be used with other functions that require a dictionary name argument.
gen_dictionary() : Returns a random term from a dictionary.

OK, let’s moving forward!
In order to use data from a dictionary we must first load the data.

A dictionary is a plain text file, with one term per line:

$ head /dict/mq_cities.txt
Basse-Pointe
Bellefontaine
Case-Pilote
Ducos
Fonds-Saint-Denis
Fort-de-France
Grand'Rivière
Gros-Morne
L'Ajoupa-Bouillon
La Trinité

Then we must load the dictionaries

Note:
The secure_file_priv variable must be set properly (usually in your my.cnf or my.ini).

mysql> SHOW VARIABLES LIKE 'secure_file_priv'\G
*************************** 1. row ***************************
Variable_name: secure_file_priv
        Value: /dict/
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/Firstnames.txt', 'Firstnames')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/Firstnames.txt', 'Firstnames'): Dictionary load success
1 row in set (0,20 sec)

mysql> SELECT gen_dictionary_load('/dict/Lastnames.txt', 'Lastnames')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/Lastnames.txt', 'Lastnames'): Dictionary load success
1 row in set (0,24 sec)

mysql> SELECT gen_dictionary_load('/dict/JobTitles.txt', 'JobTitles')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/JobTitles.txt', 'JobTitles'): Dictionary load success
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/BirthDates.txt', 'BirthDates')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/BirthDates.txt', 'BirthDates'): Dictionary load success
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/mq_cities.txt', 'mq_Cities')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/mq_cities.txt', 'mq_Cities'): Dictionary load success
1 row in set (0,00 sec)

Note:
Dictionaries are not persistent. Any dictionary used by applications must be loaded for each server startup.

Now I have all my bricks to build my business centric test dataset.
For example I can generate a random email address:

mysql> SELECT gen_rnd_email();
+---------------------------+
| gen_rnd_email()           |
+---------------------------+
| rcroe.odditdn@example.com |
+---------------------------+

Or a random city from my dictionary of the cities of Martinique :

mysql> SELECT gen_dictionary('mq_Cities');
+-------------------------------+
| gen_dictionary('mq_Cities')   |
+-------------------------------+
| Fort-de-France                |
+-------------------------------+

Awesome!

Now let’s use these functions to generate some random but business oriented data.
Below our test table called sensitive_data which contains… sensitive data :

CREATE TABLE sensitive_data(
    emp_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    firstname VARCHAR(100) NOT NULL,
    lastname VARCHAR(100) NOT NULL,
    birth_date date,
    email VARCHAR(100) NOT NULL,
    phone VARCHAR(20),
    jobTitle VARCHAR(50),
    salary INT UNSIGNED,
    city VARCHAR(30),
    credit_card CHAR(19),
    PRIMARY KEY (emp_id))
;

I created a stored procedure (sorry but I’m a DBA) to fill my table with data. However a script in your favorite programming language could be a better choice:

DELIMITER //
DROP PROCEDURE IF EXISTS add_rows;
CREATE PROCEDURE add_rows( IN numRow TINYINT UNSIGNED)
BEGIN
    DECLARE cpt TINYINT UNSIGNED DEFAULT 0;
    WHILE cpt < numRow DO
        INSERT INTO sensitive_data(firstname, lastname, birth_date, email, phone, jobTitle, salary, city, credit_card)
        SELECT
        gen_dictionary('Firstnames'),
        gen_dictionary('Lastnames'),
        gen_dictionary('BirthDates'),
        gen_rnd_email(),
        gen_rnd_us_phone(),
        gen_dictionary('JobTitles'),
        gen_range(30000, 120000),
        gen_dictionary('mq_Cities'),
        gen_rnd_pan()
        FROM DUAL;
        SET cpt = cpt + 1;
        SELECT sleep(1);
    END WHILE;
END//
DELIMITER ;


-- Call the procedure and insert 10 rows in the table
CALL add_rows(10);


mysql> SELECT firstname, lastname, phone, salary, city FROM sensitive_data;
+-----------+-----------+----------------+--------+------------------+
| firstname | lastname  | phone          | salary | city             |
+-----------+-----------+----------------+--------+------------------+
| Fresh     | Daz       | 1-555-381-3165 |  78920 | Ducos            |
| Doowon    | Vieri     | 1-555-645-3332 |  78742 | Macouba          |
| Marsja    | Speckmann | 1-555-455-3688 |  56526 | Les Trois-Îlets  |
| Carrsten  | Speckmann | 1-555-264-8108 |  51253 | Fort-de-France   |
| Yonghong  | Marrevee  | 1-555-245-0883 |  86820 | Le Lorrain       |
| Shuji     | Magliocco | 1-555-628-3771 |  88615 | Le Marin         |
| Luisa     | Sury      | 1-555-852-7710 | 117957 | Le Morne-Rouge   |
| Troy      | Zobel     | 1-555-805-0270 |  78801 | Bellefontaine    |
| Lunjin    | Pettis    | 1-555-065-0517 |  69782 | Le Prêcheur      |
| Boriana   | Marletta  | 1-555-062-4226 |  70970 | Saint-Joseph     |
+-----------+-----------+----------------+--------+------------------+
10 rows in set (0,00 sec)

It looks like real data, it smells like real data, it sounds like real data but these are not real data. That’s what we wanted 🙂

Data Masking and De-Identification

Many masking functions are available. They start with these 5 first characters: mask_ and you’ll find the complete list here.
I’ll use the following functions :

mask_inner() masks the interior of its string argument, leaving the ends unmasked. Other arguments specify the sizes of the unmasked ends.

SELECT phone, mask_inner(phone, 0, 4) FROM sensitive_data LIMIT 1;
+----------------+-------------------------+
| phone          | mask_inner(phone, 0, 4) |
+----------------+-------------------------+
| 1-555-381-3165 | XXXXXXXXXX3165          |
+----------------+-------------------------+

mask_outer() does the reverse, masking the ends of its string argument, leaving the interior unmasked. Other arguments specify the sizes of the masked ends.

SELECT birth_date, mask_outer(birth_date, 5, 0) FROM sensitive_data LIMIT 1;
+------------+------------------------------+
| birth_date | mask_outer(birth_date, 5, 0) |
+------------+------------------------------+
| 1954-06-08 | XXXXX06-08                   |
+------------+------------------------------+

mask_pan() masks all but the last four digits of the number;
mask_pan_relaxed() is similar but does not mask the first six digits that indicate the payment card issuer unmasked.

SELECT mask_pan(credit_card), mask_pan_relaxed(credit_card) FROM sensitive_data LIMIT 1;
+-----------------------+-------------------------------+
| mask_pan(credit_card) | mask_pan_relaxed(credit_card) |
+-----------------------+-------------------------------+
| XXXXXXXXXXXX4416      | 262491XXXXXX4416              |
+-----------------------+-------------------------------+

Note:
If you deal with U.S. Social Security Numbers, you could also use mask_ssn() function.
e.g. mysql> SELECT mask_ssn(gen_rnd_ssn());

So how to masked and de-identified customer sensitive data ?

There are different strategies. One is to use views.
Thus you already have a first level of security because you can choose only the columns the business need and/or filter the rows.
Furthermore you have another level of security because you can control who can access these data with relevant privileges, with or without roles.

Let’s see some examples:

Ex. 1
Mask the firstname (firstname) & the lastname (lastname)

CREATE VIEW v1_mask AS
  SELECT
    mask_inner(firstname, 1, 0) AS firstname,
    mask_outer(lastname, 3, 3) AS lastname,
    salary
  FROM sensitive_data;

SELECT * FROM v1_mask WHERE salary > 100000;
+-----------+----------+--------+
| firstname | lastname | salary |
+-----------+----------+--------+
| LXXXX     | XXXX     | 117957 |
+-----------+----------+--------+

Ex. 2
Mask the credit card number (credit_card)

CREATE VIEW v2_mask AS
  SELECT
    firstname,
    lastname,
    email,
    phone,
    mask_pan(credit_card) AS credit_card
  FROM sensitive_data;

SELECT email, phone, credit_card 
FROM v2_mask 
WHERE firstname='Fresh' AND lastname='Daz';
+---------------------------+----------------+------------------+
| email                     | phone          | credit_card      |
+---------------------------+----------------+------------------+
| bcnnk.wnruava@example.com | 1-555-381-3165 | XXXXXXXXXXXX4416 |
+---------------------------+----------------+------------------+

Ex. 3
Replace real values of employee id (emp_id) and birth date (birth_date) with random ones.

CREATE VIEW v3_mask AS
  SELECT
    gen_range(1, 1000) AS emp_id,
    FROM_DAYS(gen_range(715000, 731000)) AS birth_date,
    jobTitle,
    salary,
    city 
  FROM sensitive_data;

SELECT DISTINCT
    jobTitle,
    max(salary) OVER w AS Max,
    min(salary) OVER w AS Min,
    AVG(salary) OVER w AS Avg
FROM v3_mask
WINDOW w AS (
    PARTITION BY jobTitle
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
+--------------------+--------+-------+------------+
| jobTitle           | Max    | Min   | Avg        |
+--------------------+--------+-------+------------+
| Assistant Engineer |  78920 | 78920 | 78920.0000 |
| Engineer           |  88615 | 88615 | 88615.0000 |
| Manager            |  78801 | 51253 | 65027.0000 |
| Senior Engineer    |  86820 | 70970 | 78895.0000 |
| Staff              |  78742 | 69782 | 74262.0000 |
| Technique Leader   | 117957 | 56526 | 87241.5000 |
+--------------------+--------+-------+------------+

Et voilà!
As a conclusion, MySQL Enterprise Masking and De-Identification enables organization to:

Meet regulatory requirements and data privacy laws
Significantly reduce the risk of a data breach
Protect confidential information

To conclude this conclusion, I recommend to read Data Masking in MySQL blog post from the MySQL Server Blog.

MySQL Enterprise Edition

MySQL Enterprise Edition includes the most comprehensive set of advanced features, management tools and technical support to achieve the highest levels of MySQL scalability, security, reliability, and uptime.

It reduces the risk, cost, and complexity in developing, deploying, and managing business-critical MySQL applications.

MySQL Enterprise Edition server Trial Download (Note – Select Product Pack: MySQL Database).

In order to go further

MySQL Security Series

Reference Manual

MySQL Security

Blog posts

MySQL Security Serie – Password Management

Olivier’s MySQL Channel

Thanks for using MySQL!

Watch my videos on my YouTube channel and subscribe.

Lire cet article en français

Thanks for using HeatWave & MySQL!

Filed under:
MySQL, Security, Tuto by Olivier DASINI

MySQL Functional Indexes

March 14, 2019

Tags: index, Optimization, performance

Since MySQL 5.7 one can put indexes on expressions, aka functional indexes, using generated columns. Basically you first need to use the generated column to define the functional expression, then indexed this column.

Quite useful when dealing with JSON functions, you can find an example here and the documentation there.

Starting with MySQL 8.0.13 we have now an easiest way to create functional indexes (or functional key parts as mentioned in the documentation) \o/

Let’s see how with a quick practical example.

Using MySQL 8.0.15
Querying the test_db database

Below salaries table structure:

mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0,00 sec)

It contains some data

mysql> SELECT count(*) FROM salaries;
+----------+
| count(*) |
+----------+
|  2844047 |
+----------+


mysql> SELECT * FROM salaries LIMIT 3;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  10001 |  60117 | 1986-06-26 | 1987-06-26 |
|  10001 |  62102 | 1987-06-26 | 1988-06-25 |
|  10001 |  66074 | 1988-06-25 | 1989-06-25 |
+--------+--------+------------+------------+

Let’s focus on the following query:
SELECT * FROM salaries WHERE YEAR(to_date)=1985

mysql> SELECT * FROM salaries WHERE YEAR(to_date)=1985;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  14688 |  42041 | 1985-07-06 | 1985-08-08 |
...snip...
| 498699 |  40000 | 1985-09-25 | 1985-09-28 |
+--------+--------+------------+------------+
89 rows in set (0,80 sec)


mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2838426
     filtered: 100.00
        Extra: Using where

We have a full table scan ( type: ALL), meaning no index is used. Perhaps because there is no index on column to_date… 😉
So let’s add an index on to_date !

mysql> ALTER TABLE salaries ADD INDEX idx_to_date (to_date);
Query OK, 0 rows affected (17,13 sec)
Records: 0  Duplicates: 0  Warnings: 0


mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `idx_to_date` (`to_date`),
  CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

And run again the query with the hope of a better execution plan

mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2838426
     filtered: 100.00
        Extra: Using where

Ouch! Still have a full table scan !
The index can’t be used because of the use of a function (YEAR()) on the indexed column (to_date).
BTW if you’re really surprise, maybe you should read this. 😉

This is the case when you need a functional index!

mysql> ALTER TABLE salaries ADD INDEX idx_year_to_date((YEAR(to_date)));
Query OK, 0 rows affected (20,04 sec)
Records: 0  Duplicates: 0  Warnings: 0

The syntax is very similar of the creation of a “regular” index. Although you must be aware of the double parentheses: (( <expression> ))
We can now see our new index named idx_year_to_date and the indexed expression year(to_date) :

mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `idx_to_date` (`to_date`),
  KEY `idx_year_to_date` ((year(`to_date`))),
  CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci


mysql> SELECT INDEX_NAME, EXPRESSION 
FROM INFORMATION_SCHEMA.STATISTICS 
WHERE TABLE_SCHEMA='employees' 
    AND TABLE_NAME = "salaries" 
    AND INDEX_NAME='idx_year_to_date';
+------------------+-----------------+
| INDEX_NAME       | EXPRESSION      |
+------------------+-----------------+
| idx_year_to_date | year(`to_date`) |
+------------------+-----------------+

Let’s test our query again

mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ref
possible_keys: idx_year_to_date
          key: idx_year_to_date
      key_len: 5
          ref: const
         rows: 89
     filtered: 100.00
        Extra: NULL


mysql> SELECT * FROM salaries WHERE YEAR(to_date)=1985;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  14688 |  42041 | 1985-07-06 | 1985-08-08 |
...snip...
| 498699 |  40000 | 1985-09-25 | 1985-09-28 |
+--------+--------+------------+------------+
89 rows in set (0,00 sec)

Here we go!
Now the query is able to use the index. And in this case we have a positive impact on the execution time.

It is also interesting to note that it is possible to use idx_to_date, the first index created (the non functional one) if we can rewrite the original query:

mysql> EXPLAIN SELECT * 
FROM salaries 
WHERE to_date BETWEEN '1985-01-01' AND '1985-12-31'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: range
possible_keys: idx_to_date
          key: idx_to_date
      key_len: 3
          ref: NULL
         rows: 89
     filtered: 100.00
        Extra: Using index condition


mysql> SELECT * 
FROM salaries 
WHERE to_date BETWEEN '1985-01-01' AND '1985-12-31'
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  20869 |  40000 | 1985-02-17 | 1985-03-01 |
...snip...
|  45012 |  66889 | 1985-08-16 | 1985-12-31 |
+--------+--------+------------+------------+
89 rows in set (0,00 sec)

This saves an index, I mean less indexes to maintain for the engine. Also speaking of maintenance cost, the cost to maintain a functional index is higher than the cost of a regular one.

In the other side the execution plan is less good (query cost higher) and obviously you must rewrite the query.

Requirements and restrictions.

A primary key cannot be a functional index:

mysql> CREATE TABLE t1 (i INT, PRIMARY KEY ((ABS(i))));
ERROR 3756 (HY000): The primary key cannot be a functional index

You can not index non-deterministic functions (RAND(), UNIX_TIMESTAMP(), NOW()…)

mysql> CREATE TABLE t1 (i int, KEY ((RAND(i))));
ERROR 3758 (HY000): Expression of functional index 'functional_index' contains a disallowed function.

SPATIAL and FULLTEXT indexes cannot have functional key parts.

Conclusion

Functional index is an interesting and a relevant feature, it could be very useful to optimize your queries without rewrite them and especially when dealing with JSON documents and other complex types.

Obviously all the details you must know are in the MySQL documentation: Functional Key Parts
If you interested in the high level architecture and the low level design please read the workload.

Thanks for using MySQL!

Follow me on twitter

Filed under:
Astuce, MySQL, optimisation by Olivier DASINI

30 mins with MySQL JSON functions

July 23, 2018

Tags: CTE, JSON, NoSQL

Note: You may also be interested by 30 mins with JSON in MySQL

Note 2: Handling JSON documents could be also done with MySQL Document Store.

JSON (JavaScript Object Notation) is a popular way for moving data between various systems, including databases. Starting with 5.7 MySQL supports a native JSON data type (internal binary format for efficiency) and a set of built-in JSON functions that allows you to perform operations on JSON documents.

This blog post is not a complete overview of the entire MySQL JSON functions set (RTFM instead) but rather an arbitrary presentation of some of them.

Note: MySQL 8 enables an alternative way of working with MySQL as a document store. (Not cover in this blog post).

I’m using MySQL 8.0.11, downloadable here.

JSON Utility Functions

JSON_PRETTY

Improve readability with JSON_PRETTY

By default, display a JSON document in MySQL looks like something like this :

SELECT doc FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
doc: {"_id": "564b3259666906a86ea90a99", "name": "Dj Reynolds Pub And Restaurant", "grades": [{"date": {"$date": 1409961600000}, "grade": "A", "score": 2}, {"date": {"$date": 1374451200000}, "grade": "A", "score": 11}, {"date": {"$date": 1343692800000}, "grade": "A", "score": 12}, {"date": {"$date": 1325116800000}, "grade": "A", "score": 12}], "address": {"coord": [-73.98513559999999, 40.7676919], "street": "West   57 Street", "zipcode": "10019", "building": "351"}, "borough": "Manhattan", "cuisine": "Irish", "restaurant_id": "30191841"}

You can have a prettier display with JSON_PRETTY :

SELECT JSON_PRETTY(doc) FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
JSON_PRETTY(doc): {
  "_id": "564b3259666906a86ea90a99",
  "name": "Dj Reynolds Pub And Restaurant",
  "grades": [
    {
      "date": {
        "$date": 1409961600000
      },
      "grade": "A",
      "score": 2
    },
    {
      "date": {
        "$date": 1374451200000
      },
      "grade": "A",
      "score": 11
    },
    {
      "date": {
        "$date": 1343692800000
      },
      "grade": "A",
      "score": 12
    },
    {
      "date": {
        "$date": 1325116800000
      },
      "grade": "A",
      "score": 12
    }
  ],
  "address": {
    "coord": [
      -73.98513559999999,
      40.7676919
    ],
    "street": "West   57 Street",
    "zipcode": "10019",
    "building": "351"
  },
  "borough": "Manhattan",
  "cuisine": "Irish",
  "restaurant_id": "30191841"
}

JSON_STORAGE_SIZE

Return the number of bytes used to store the binary representation of a JSON document with JSON_STORAGE_SIZE.

SELECT max(JSON_STORAGE_SIZE(doc)) FROM restaurants;
+-----------------------------+
| max(JSON_STORAGE_SIZE(doc)) |
+-----------------------------+
|                         916 |
+-----------------------------+

SELECT avg(JSON_STORAGE_SIZE(doc)) FROM restaurants;
+-----------------------------+
| avg(JSON_STORAGE_SIZE(doc)) |
+-----------------------------+
|                    537.2814 |
+-----------------------------+

SELECT min(JSON_STORAGE_SIZE(doc)) FROM restaurants;
+-----------------------------+
| min(JSON_STORAGE_SIZE(doc)) |
+-----------------------------+
|                         255 |
+-----------------------------+

In this collection, the heavier document is 916 bytes, the lighter is 255 and the average size is 537.2814

Note: This is the space used to store the JSON document as it was inserted into the column, prior to any partial updates that may have been performed on it afterwards.

Functions That Search JSON Values

JSON_EXTRACT (->) / JSON_UNQUOTE / ->> operator

JSON_EXTRACT (or ->) returns data from a JSON document.

JSON_UNQUOTE unquotes JSON value and returns the result as a utf8mb4 string.

->> the JSON unquoting extraction operator is a shortcut for JSON_UNQUOTE(JSON_EXTRACT())

SELECT JSON_EXTRACT(doc, "$.cuisine") FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
JSON_EXTRACT(doc, "$.cuisine"): "Irish"


SELECT doc->"$.cuisine" FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
doc->"$.cuisine": "Irish"

Both queries above are similar.

If you want the same result but without quotes use ->> or JSON_UNQUOTE(JSON_EXTRACT()) :

SELECT JSON_UNQUOTE(JSON_EXTRACT(doc, "$.cuisine")) FROM restaurants LIMIT 1\G
*************************** 1. row ***************************
JSON_UNQUOTE(JSON_EXTRACT(doc, "$.cuisine")): Irish


SELECT doc->>"$.cuisine" FROM restaurants LIMIT 1\G
doc->>"$.cuisine": Irish

Both queries above are similar.

JSON_CONTAINS

Search whether the value of specified key matches a specified value with JSON_CONTAINS.

SELECT count(*) 
FROM restaurants 
WHERE JSON_CONTAINS(doc, '"Creole"', '$.cuisine');
+----------+
| count(*) |
+----------+
|       24 |
+----------+


SELECT doc->>"$.name" 
FROM restaurants 
WHERE JSON_CONTAINS(doc, '"Creole"', '$.cuisine');
+-----------------------------------------------+
| doc->>"$.name"                                |
+-----------------------------------------------+
| Belvedere Restaurant                          |
| Chez Macoule Restaurant                       |
| Paradise Venus Restaurant                     |
| Heavenly Fritaille Restaurant                 |
| Yolie'S Bar & Restaurant                      |
| Yo-Yo Fritaille                               |
| Kal Bakery & Restaurant                       |
| Bon Appetit Restaurant                        |
| Katou Fin Restaurant                          |
| Alhpa Restaurant                              |
| Lakay Buffet Restaurant                       |
| La Tranquilite Restaurant                     |
| La Caye Restaurant                            |
| Nous Les Amis Restaurant & Bakery             |
| Yoyo Fritaille                                |
| Fresh Crown Restaurant                        |
| Tonel Restaurant & Lounge                     |
| Grace Devine Pastry And Restaurant Restaurant |
| Viva Bubble Tea                               |
| Cafe Creole Restaurant N Bakery               |
| Delly'S Place Restaurant & Fritaille          |
| Creole Plate                                  |
| Chez Nous Restaurant & Fritaille              |
| Combite Creole                                |
+-----------------------------------------------+

JSON_CONTAINS_PATH

Indicate whether a JSON document contains data at a given path or paths with JSON_CONTAINS_PATH.

Let’s insert a dummy document in the collection restaurants

INSERT INTO restaurants (doc) VALUES ('{"_id": "1234", "name": "Daz Restaurant", "cuisine": "West Indian", "restaurant_id": "4321"}');

How many documents without grades? :

SELECT count(*), JSON_CONTAINS_PATH(doc, 'one', '$.grades') cp 
FROM restaurants 
GROUP BY cp;
+----------+------+
| count(*) | cp   |
+----------+------+
|        1 |    0 |
|    25359 |    1 |
+----------+------+

Ok, only 1. We can easily check the structure of this document :

SELECT JSON_PRETTY(doc) 
FROM restaurants 
WHERE JSON_CONTAINS_PATH(doc, 'one', '$.grades') = 0\G
*************************** 1. row ***************************
JSON_PRETTY(doc): {
  "_id": "1234",
  "name": "Daz Restaurant",
  "cuisine": "West Indian",
  "restaurant_id": "4321"
}

A bridge between 2 models

To paraphrase David Stokes (MySQL Community Manager) in his book MySQL and JSON – A practical Programming Guide.

“

The advantages of traditional relational data and schemaless data are both large. But in some cases, data in a schema needs to be schemaless, or schemaless-data needs to be in a schema.

“

Making such metamorphosis is very easy to do with MySQL!

Relational to JSON

JSON_OBJECT

Evaluates a list of key-value pairs and returns a JSON object containing those pairs with JSON_OBJECT.

A traditional SQL query with a relational result set. The JSON document output non-JSON data :

SELECT doc->>"$.name" 
FROM restaurants 
WHERE JSON_CONTAINS(doc, '"Creole"', '$.cuisine') 
LIMIT 2;
+-------------------------+
| doc->>"$.name"          |
+-------------------------+
| Belvedere Restaurant    |
| Chez Macoule Restaurant |
+-------------------------+

This result set could be convert in a JSON format, actually a JSON object :

SELECT JSON_OBJECT("Name", doc->>"$.name") 
FROM restaurants 
WHERE JSON_CONTAINS(doc, '"Creole"', '$.cuisine') 
LIMIT 2;
+-------------------------------------+
| JSON_OBJECT("Name", doc->>"$.name") |
+-------------------------------------+
| {"Name": "Belvedere Restaurant"}    |
| {"Name": "Chez Macoule Restaurant"} |
+-------------------------------------+

Other example :

SELECT Name, Population 
FROM City 
WHERE CountryCode='fra' 
ORDER BY Population DESC 
LIMIT 5;
+-----------+------------+
| Name      | Population |
+-----------+------------+
| Paris     |    2125246 |
| Marseille |     798430 |
| Lyon      |     445452 |
| Toulouse  |     390350 |
| Nice      |     342738 |
+-----------+------------+


SELECT JSON_OBJECT("CityName",Name, "CityPop", Population) 
FROM City 
WHERE CountryCode='fra' 
ORDER BY Population DESC 
LIMIT 5;
+-----------------------------------------------------+
| JSON_OBJECT("CityName",Name, "CityPop", Population) |
+-----------------------------------------------------+
| {"CityPop": 2125246, "CityName": "Paris"}           |
| {"CityPop": 798430, "CityName": "Marseille"}        |
| {"CityPop": 445452, "CityName": "Lyon"}             |
| {"CityPop": 390350, "CityName": "Toulouse"}         |
| {"CityPop": 342738, "CityName": "Nice"}             |
+-----------------------------------------------------+

JSON_OBJECTAGG

Takes two column names or expressions and returns a JSON object containing key-value pairs with JSON_OBJECTAGG.

Grouping rows are very often useful. This why we implemented some JSON aggregate functions like this one.

SELECT JSON_OBJECTAGG(Name, CountryCode) 
FROM City  
GROUP BY id 
ORDER BY RAND() 
LIMIT 5;
+-----------------------------------+
| JSON_OBJECTAGG(Name, CountryCode) |
+-----------------------------------+
| {"Reno": "USA"}                   |
| {"Hanam": "KOR"}                  |
| {"Laizhou": "CHN"}                |
| {"Yogyakarta": "IDN"}             |
| {"Tantoyuca": "MEX"}              |
+-----------------------------------+

Note:
- It’s usually not a good idea to use ORDER BY RAND(). It works like a charm for small dataset, but it’s a true performance killer with huge datasets.
- The best practice is to do it in the application or pre-compute random value in the database.

JSON_ARRAY

Evaluate a list of values and returns a JSON array containing those values with JSON_ARRAY.

Next example is an hierarchical query using a recursive Common Table Expression aka recursive CTE (or WITH Syntax)

WITH RECURSIVE emp_ext (id, name, path) AS ( 
    SELECT id, name, CAST(id AS CHAR(200)) 
    FROM employees 
    WHERE manager_id IS NULL 
    UNION ALL 
    SELECT s.id, s.name, CONCAT(m.path, ",", s.id) 
    FROM emp_ext m 
        JOIN employees s ON m.id=s.manager_id 
) 
SELECT id,name, path FROM emp_ext ORDER BY path;
+------+---------+-----------------+
| id   | name    | path            |
+------+---------+-----------------+
|  333 | Yasmina | 333             |
|  198 | John    | 333,198         |
|   29 | Pedro   | 333,198,29      |
| 4610 | Sarah   | 333,198,29,4610 |
|   72 | Pierre  | 333,198,29,72   |
|  692 | Tarek   | 333,692         |
|  123 | Adil    | 333,692,123     |
+------+---------+-----------------+

JSON format output with JSON_OBJECT & JSON_ARRAY :

WITH RECURSIVE emp_ext (id, name, path) AS ( 
    SELECT id, name, CAST(id AS CHAR(200)) 
    FROM employees 
    WHERE manager_id IS NULL 
    UNION ALL 
    SELECT s.id, s.name, CONCAT(m.path, ",", s.id) 
    FROM emp_ext m 
        JOIN employees s ON m.id=s.manager_id 
) 
SELECT JSON_OBJECT("ID",id, "Name",name, "Path", JSON_ARRAY(path)) 
FROM emp_ext 
ORDER BY path;
+-------------------------------------------------------------+
| JSON_OBJECT("ID",id, "Name",name, "Path", JSON_ARRAY(path)) |
+-------------------------------------------------------------+
| {"ID": 333, "Name": "Yasmina", "Path": ["333"]}             |
| {"ID": 198, "Name": "John", "Path": ["333,198"]}            |
| {"ID": 29, "Name": "Pedro", "Path": ["333,198,29"]}         |
| {"ID": 4610, "Name": "Sarah", "Path": ["333,198,29,4610"]}  |
| {"ID": 72, "Name": "Pierre", "Path": ["333,198,29,72"]}     |
| {"ID": 692, "Name": "Tarek", "Path": ["333,692"]}           |
| {"ID": 123, "Name": "Adil", "Path": ["333,692,123"]}        |
+-------------------------------------------------------------+

JSON_ARRAYAGG

Aggregate a result set as a single JSON array whose elements consist of the rows with JSON_ARRAYAGG.

With this other JSON aggregate functions we will see different combinations of JSON format output :

SELECT CountryCode, JSON_ARRAYAGG(City.Name) 
FROM City 
    JOIN Country ON (City.CountryCode=Country.Code) 
WHERE Continent='Europe' 
GROUP BY 1 
LIMIT 5;
+-------------+--------------------------------------------------------------------------------------------------------------+
| CountryCode | JSON_ARRAYAGG(City.Name)                                                                                     |
+-------------+--------------------------------------------------------------------------------------------------------------+
| ALB         | ["Tirana"]                                                                                                   |
| AND         | ["Andorra la Vella"]                                                                                         |
| AUT         | ["Graz", "Linz", "Salzburg", "Innsbruck", "Wien", "Klagenfurt"]                                              |
| BEL         | ["Antwerpen", "Brugge", "Gent", "Schaerbeek", "Charleroi", "Namur", "Liège", "Mons", "Bruxelles [Brussel]"]  |
| BGR         | ["Šumen", "Sofija", "Stara Zagora", "Plovdiv", "Pleven", "Varna", "Sliven", "Burgas", "Dobric", "Ruse"]      |
+-------------+--------------------------------------------------------------------------------------------------------------+

SELECT JSON_OBJECT("CountryCode",CountryCode), JSON_OBJECT("CityName",JSON_ARRAYAGG(City.Name)) 
FROM City 
    JOIN Country ON (City.CountryCode=Country.Code) 
WHERE Continent='Europe' 
GROUP BY 1 
LIMIT 5;
+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------+
| JSON_OBJECT("CountryCode",CountryCode) | JSON_OBJECT("CityName",JSON_ARRAYAGG(City.Name))                                                                           |
+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------+
| {"CountryCode": "ALB"}                 | {"CityName": ["Tirana"]}                                                                                                   |
| {"CountryCode": "AND"}                 | {"CityName": ["Andorra la Vella"]}                                                                                         |
| {"CountryCode": "AUT"}                 | {"CityName": ["Wien", "Graz", "Linz", "Salzburg", "Innsbruck", "Klagenfurt"]}                                              |
| {"CountryCode": "BEL"}                 | {"CityName": ["Schaerbeek", "Mons", "Namur", "Brugge", "Liège", "Antwerpen", "Charleroi", "Gent", "Bruxelles [Brussel]"]}  |
| {"CountryCode": "BGR"}                 | {"CityName": ["Burgas", "Šumen", "Dobric", "Sliven", "Pleven", "Stara Zagora", "Ruse", "Varna", "Plovdiv", "Sofija"]}      |
+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------+

SELECT JSON_OBJECT("Code",CountryCode, "CityName", JSON_ARRAYAGG(City.Name)) 
FROM City 
    JOIN Country ON (City.CountryCode=Country.Code) 
WHERE Continent='Europe' 
GROUP BY CountryCode 
LIMIT 5;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| JSON_OBJECT("Code",CountryCode, "CityName", JSON_ARRAYAGG(City.Name))                                                                     |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| {"Code": "ALB", "CityName": ["Tirana"]}                                                                                                   |
| {"Code": "AND", "CityName": ["Andorra la Vella"]}                                                                                         |
| {"Code": "AUT", "CityName": ["Graz", "Linz", "Salzburg", "Innsbruck", "Wien", "Klagenfurt"]}                                              |
| {"Code": "BEL", "CityName": ["Bruxelles [Brussel]", "Antwerpen", "Brugge", "Gent", "Schaerbeek", "Charleroi", "Namur", "Liège", "Mons"]}  |
| {"Code": "BGR", "CityName": ["Ruse", "Šumen", "Sofija", "Stara Zagora", "Plovdiv", "Pleven", "Varna", "Sliven", "Burgas", "Dobric"]}      |
+-------------------------------------------------------------------------------------------------------------------------------------------+

JSON to Relational

Ok let’s transform JSON data into relational data!

JSON_TABLE

Extract data from a JSON document and returns it as a relational table having the specified columns with JSON_TABLE.

Actually I highly recommend you to spend time in the documentation for this powerful function, that allows you to map JSON data into a temporary relational table and then query from this table.

Enough blabla, let’s see some examples :

SELECT GNP 
FROM countryinfo, JSON_TABLE(doc, "$" COLUMNS (GNP int PATH "$.GNP")) AS jst 
WHERE _id='FRA';
+---------+
| GNP     |
+---------+
| 1424285 |
+---------+

SELECT GNP, Name, LifeExpectancy 
FROM countryinfo, JSON_TABLE(doc, "$" COLUMNS (GNP int PATH "$.GNP", Name char(255) PATH "$.Name", LifeExpectancy int PATH "$.demographics.LifeExpectancy")) AS jst 
WHERE _id IN ('FRA', 'USA');
+---------+---------------+----------------+
| GNP     | Name          | LifeExpectancy |
+---------+---------------+----------------+
| 1424285 | France        |             79 |
| 8510700 | United States |             77 |
+---------+---------------+----------------+

SELECT name AS "Creole Cuisine" 
FROM restaurant.restaurants, JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine")) AS jst 
WHERE cuisine='Creole';
+-----------------------------------------------+
| Creole Cuisine                                |
+-----------------------------------------------+
| Belvedere Restaurant                          |
| Chez Macoule Restaurant                       |
| Paradise Venus Restaurant                     |
| Heavenly Fritaille Restaurant                 |
| Yolie'S Bar & Restaurant                      |
| Yo-Yo Fritaille                               |
| Kal Bakery & Restaurant                       |
| Bon Appetit Restaurant                        |
| Katou Fin Restaurant                          |
| Alhpa Restaurant                              |
| Lakay Buffet Restaurant                       |
| La Tranquilite Restaurant                     |
| La Caye Restaurant                            |
| Nous Les Amis Restaurant & Bakery             |
| Yoyo Fritaille                                |
| Fresh Crown Restaurant                        |
| Tonel Restaurant & Lounge                     |
| Grace Devine Pastry And Restaurant Restaurant |
| Viva Bubble Tea                               |
| Cafe Creole Restaurant N Bakery               |
| Delly'S Place Restaurant & Fritaille          |
| Creole Plate                                  |
| Chez Nous Restaurant & Fritaille              |
| Combite Creole                                |
+-----------------------------------------------+

JSON_TABLE – Nested Data

Walk down the JSON document path and retrieve nested data.

For example, extract all grades for Hawaiian cuisine restaurants :

SELECT name, cuisine, gradeID, grade 
FROM restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", NESTED PATH "$.grades[*]" COLUMNS (gradeID FOR ORDINALITY, grade char(20) PATH "$.grade"))) AS jst 
WHERE cuisine='Hawaiian';
+------------------+----------+---------+-------+
| name             | cuisine  | gradeID | grade |
+------------------+----------+---------+-------+
| Makana           | Hawaiian |       1 | C     |
| Makana           | Hawaiian |       2 | C     |
| Makana           | Hawaiian |       3 | A     |
| Makana           | Hawaiian |       4 | C     |
| Makana           | Hawaiian |       5 | A     |
| General Assembly | Hawaiian |       1 | A     |
| General Assembly | Hawaiian |       2 | A     |
| General Assembly | Hawaiian |       3 | A     |
| General Assembly | Hawaiian |       4 | A     |
| Onomea           | Hawaiian |       1 | A     |
| Onomea           | Hawaiian |       2 | A     |
+------------------+----------+---------+-------+

JSON_TABLE – Missing Data

Specify what to do when data is missing.

Default behavior :

SELECT name, cuisine, borough 
FROM restaurant.restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", borough char(100) PATH "$.borough")) AS jst  
LIMIT 2;
+--------------------------------+-------------+-----------+
| name                           | cuisine     | borough   |
+--------------------------------+-------------+-----------+
| Daz Restaurant                 | West Indian | NULL      |
| Dj Reynolds Pub And Restaurant | Irish       | Manhattan |
+--------------------------------+-------------+-----------+

Enforce the default behavior :

SELECT name, cuisine, borough 
FROM restaurant.restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", borough char(100) PATH "$.borough" NULL ON EMPTY)) AS jst 
LIMIT 2;
+--------------------------------+-------------+-----------+
| name                           | cuisine     | borough   |
+--------------------------------+-------------+-----------+
| Daz Restaurant                 | West Indian | NULL      |
| Dj Reynolds Pub And Restaurant | Irish       | Manhattan |
+--------------------------------+-------------+-----------+

Raise an error :

SELECT name, cuisine, borough 
FROM restaurant.restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", borough char(100) PATH "$.borough" ERROR ON EMPTY)) AS jst 
LIMIT 2;
ERROR 3665 (22035): Missing value for JSON_TABLE column 'borough'

Specify a default value :

SELECT name, cuisine, borough 
FROM restaurant.restaurants,JSON_TABLE(doc, "$" COLUMNS (name char(100) PATH "$.name", cuisine char(100) PATH "$.cuisine", borough char(100) PATH "$.borough" DEFAULT '"<UNKNOW>"' ON EMPTY)) AS jst 
LIMIT 2;
+--------------------------------+-------------+-----------+
| name                           | cuisine     | borough   |
+--------------------------------+-------------+-----------+
| Daz Restaurant                 | West Indian | <UNKNOW>  |
| Dj Reynolds Pub And Restaurant | Irish       | Manhattan |
+--------------------------------+-------------+-----------+

Wrapup

I’ll stop here this introduction to this rich MySQL JSON functions world. I presented a subset of these functions but it definitely worth to spend some time to discover the entire set e.g. how to create, modify, indexing, … JSON documents.

Furthermore, if your workload does not fit in the relational model, you should use the MySQL 8 Document Store, that provide a CRUD API and some other cool stuffs. I’ll blog about it soon, so stay tune!

Anyway I’ll recommend you to read : Top 10 reasons for NoSQL with MySQL.

Misc

Documentation

Articles

Other resources

You’ll find some of the sample databases used in this article here.
Restaurants collection could be find here.
Some books that could be useful : here.

Thanks for using MySQL!

Filed under:
json, MySQL, NoSQL by Olivier DASINI

Plenty of new MySQL books

July 12, 2018

Tags: Book, cluster, Document Store, JSON, NDB

In the old days, when we wanted to strengthen our skills the only option was to buy a good book (this one litteraly changed my life, but also this one, this one, …). Nowadays one can find a lot of resources on the Internet, however quality is often poor :(.

Fortunately there are still some great people who are brave enough to write new books that will hopefully have an impact on the new generation of women and men (and the current generation as well) who build modern applications with MySQL, the world’s most popular open source database.

Let me introduce you 3 MySQL books :

MySQL and JSON: A Practical Programming Guide

MySQL and JSON: A Practical Programming Guide
Discover how to use JavaScript Object Notation (JSON) with MySQL

Author : David Stokes (MySQL Community Manager)

ISBN-13 : 978-1260135442
ISBN-10 : 1260135446

Publisher : Oracle Press

https://www.mhprofessional.com/9781260135442-usa-mysql-and-json-a-practical-programming-guide-group

Introducing the MySQL 8 Document Store

Introducing the MySQL 8 Document Store
Building schemaless database solutions

Author : Dr. Charles Bell (Development Manager)

eBook ISBN : 978-1-4842-2725-1
Softcover ISBN : 978-1-4842-2724-4
DOI : 10.1007/978-1-4842-2725-1

Publisher : Apress

https://www.apress.com/fr/book/9781484227244

Pro MySQL NDB Cluster

Pro MySQL NDB Cluster
Master the MySQL Cluster Lifecycle

Authors : Jesper Wisborg Krogh(MySQL Senior Principal Technical Support Engineer), Mikiya Okuno(MySQL Technical Analyst)

eBook ISBN : 978-1-4842-2982-8
Softcover ISBN : 978-1-4842-2981-1
DOI : 10.1007/978-1-4842-2982-8

Publisher : Apress

https://www.apress.com/gp/book/9781484229811

I’ll finish this post with a Groucho Marx quotes :

“Outside of a dog, a book is man’s best friend. Inside of a dog it’s too dark to read.”

MySQL Books

Updated 2018/08/06

I also recommend my colleague Jesper‘s blog post: MySQL Books.

Updated 2018/10/30

I also recommend my other colleague Dave Stokes‘s blog post: Two More MySQL Books for 2018.

Thanks for using (and reading) MySQL!

Filed under:
Book, Misc, MySQL by Olivier DASINI

MySQL Security – MySQL Enterprise Firewall

April 16, 2018

Tags: Firewall, MySQL Enterprise, security

When thinking about security within a MySQL installation, you should consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications. MySQL provides many tools / features / plugins in order to protect your data including some advanced features like Audit, TDE, Data Masking & De-Identification, Password Management, Password Validation Plugin, User Account Locking, etc…

MySQL Security

In this seventh episode of the MySQL Security series, we will see how MySQL Enterprise Firewall can help you to strengthen the protection of your data, in real-time, against cyber security threats including SQL Injection attacks by monitoring, alerting, and blocking unauthorized database activity without any changes to your applications.

Installing the MySQL Enterprise Firewall Plugin

MySQL Enterprise Firewall installation is an easy one-time operation that involves running a script (e.g. linux_install_firewall.sql in this blog post (Linux and similar systems that use .so as the file name suffix); win_install_firewall.sql for Windows systems that use .dll as the file name suffix) located in the share directory of your MySQL installation.

I’m using MySQL 5.7.21 Enterprise Edition :

mysql> 
SELECT version();
+-------------------------------------------+
| version()                                 |
+-------------------------------------------+
| 5.7.21-enterprise-commercial-advanced-log |
+-------------------------------------------+

Updated on 22nd of August 2018
Note: MySQL Enterprise Firewall works with MySQL 8.0 as well. In other words examples below could be done with MySQL 8.0.12+

MySQL Enterprise Firewall does not work together with the MySQL Query Cache. Fortunately the query cache is disabled by default.

mysql> 
SHOW VARIABLES LIKE 'query_cache_type';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| query_cache_type | OFF   |
+------------------+-------+

Note
The query cache is deprecated as of MySQL 5.7.20, and is removed in MySQL 8.0.

Note
For a great query cache tuning advice from Domas Mituzas : click here 🙂

Now we can installed the Firewall

mysql> 
SOURCE /usr/share/mysql/linux_install_firewall.sql

Database changed
Query OK, 0 rows affected (0.07 sec)

Query OK, 0 rows affected (0.06 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

And check if it has been launched with the system variable mysql_firewall_mode :

mysql> 
SHOW GLOBAL VARIABLES LIKE 'mysql_firewall_mode';
+---------------------+-------+
| Variable_name       | Value |
+---------------------+-------+
| mysql_firewall_mode | ON    |
+---------------------+-------+

Alternatively, we can also add mysql_firewall_mode under the [mysqld] option group in the MySQL configuration file :

[mysqld]
mysql_firewall_mode=ON

It is also possible to disable or enable the firewall at runtime :

mysql> 
SET GLOBAL mysql_firewall_mode = OFF;

SET GLOBAL mysql_firewall_mode = ON;

Playtime

The MySQL Firewall is installed!

Let’s assume now we have an application that uses the schema sakila in this instance. This application has a dedicated user account (myApp@localhost) and all the privileges on sakila :

mysql> 
CREATE USER 'myApp'@'localhost' IDENTIFIED BY 'AppU$3rPwd';

GRANT ALL ON sakila.* TO 'myApp'@'localhost';

Note

The firewall maintains whitelist rules on a per-account basis.

Regular queries from this hypothetical application are :

UPDATE rental SET return_date = <date> WHERE rental_id = <id>;
SELECT get_customer_balance(<id>, <date>);

But first, users are authenticated with :

SELECT staff_id, first_name, email, last_name, username, password FROM staff WHERE username = ‘<userName>’ AND password=sha1(<userPassword>);

Warning
Query above is not safe nor optimal for production.

Firewall allows 3 modes :

recording, the firewall adds the normalized statement to the account whitelist rules.
protecting, the firewall compares the normalized statement to the account whitelist rules. If there is a match, the statement passes and the server continues to process it. Otherwise, the server rejects the statement and returns an error to the client. The firewall also writes the rejected statement to the error log if the mysql_firewall_trace system variable is enabled.
detecting, the firewall matches statements as in protecting mode, but writes nonmatching statements to the error log without denying access.

MySQL Enterprise Firewall

Recording mode

Ok now we know our queries, let’s go back to the Firewall.

The basic and powerful idea of the MySQL Firewall is to deny SQL statement execution based on matching against a whitelist. In other words the Firewall learns acceptable statement patterns.

In order to create this whitelist, we’ll switch the Firewall in the RECORDING mode using sp_set_firewall_mode stored procedure :

mysql> 
-- register the account with the firewall and place it in recording mode
CALL mysql.sp_set_firewall_mode('myApp@localhost', 'RECORDING');

+-----------------------------------------------+
| read_firewall_whitelist(arg_userhost,FW.rule) |
+-----------------------------------------------+
| Imported users: 0
Imported rules: 0
          |
+-----------------------------------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

We can know see what is the status of the Firewall for any user with INFORMATION_SCHEMA.MYSQL_FIREWALL_USERS table :

mysql> 
-- Check firewall status
SELECT MODE 
FROM INFORMATION_SCHEMA.MYSQL_FIREWALL_USERS 
WHERE USERHOST = 'myApp@localhost';
+-----------+
| MODE      |
+-----------+
| RECORDING |
+-----------+

During the recording mode, we can run the application. The queries generated by the application will be recorded in the Firewall’s whitelist :

# Login failed (Mike / 0000)
mysql_myApp> 
SELECT staff_id, first_name, email, last_name, username, password 
FROM sakila.staff 
WHERE username = 'Mike' AND password=sha1(0000)\G
Empty set (0.00 sec)


# Login succeed (Mike / 12345)
SELECT staff_id, first_name, email, last_name, username, password 
FROM sakila.staff 
WHERE username = 'Mike' AND password=sha1(12345)\G
*************************** 1. row ***************************
  staff_id: 1
first_name: Mike
     email: Mike.Hillyer@sakilastaff.com
 last_name: Hillyer
  username: Mike
  password: 8cb2237d0679ca88db6464eac60da96345513964

Other queries…

mysql_myApp>
UPDATE rental SET return_date = NOW() WHERE rental_id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

SELECT get_customer_balance(1, NOW());
+--------------------------------+
| get_customer_balance(1, NOW()) |
+--------------------------------+
|                           0.00 |
+--------------------------------+

And so on…

When the training is done switch the Firewall to protecting mode.

Protecting mode

Use the sp_set_firewall_mode stored procedure to switch the registered user to protecting mode:

mysql> 
-- Switch the Firewall in protecting mode
CALL mysql.sp_set_firewall_mode('myApp@localhost', 'PROTECTING');
Query OK, 3 rows affected (0.00 sec)


-- Check
SELECT * FROM INFORMATION_SCHEMA.MYSQL_FIREWALL_USERS;
+------------------+------------+
| USERHOST         | MODE       |
+------------------+------------+
| myApp@localhost  | PROTECTING |
+------------------+------------+

Firewall stores SQL statements on a normalized digest form. You can check the whitelist with INFORMATION_SCHEMA.MYSQL_FIREWALL_WHITELIST table :

mysql>
-- Check whitelist content 
SELECT USERHOST, RULE 
FROM INFORMATION_SCHEMA.MYSQL_FIREWALL_WHITELIST 
WHERE USERHOST = 'myApp@localhost';

+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| USERHOST        | RULE                                                                                                                                                     |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| myApp@localhost | SELECT `get_customer_balance` ( ? , NOW ( ) )                                                                                                            |
| myApp@localhost | UPDATE `rental` SET `return_date` = NOW ( ) WHERE `rental_id` = ?                                                                                        |
| myApp@localhost | SELECT `staff_id` , `first_name` , `email` , `last_name` , `username` , PASSWORD FROM `sakila` . `staff` WHERE `username` = ? AND PASSWORD = `sha1` (?)  |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

Note
For additional training you can switch back recording mode or even update (that is an UPDATE query) this table if necessary using the normalize_statement UDF.

In protecting mode, there are 2 kind of queries for the application point of view :

Acceptable

mysql_app> 
UPDATE rental SET return_date = NOW() WHERE rental_id = 1;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0


UPDATE rental SET return_date = NOW() WHERE rental_id = 42;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0


SELECT get_customer_balance(5, NOW());
+--------------------------------+
| get_customer_balance(5, NOW()) |
+--------------------------------+
|                           0.00 |
+--------------------------------+
1 row in set (0.01 sec)

Unacceptable

mysql_app>
DROP TABLE rental;
ERROR 1045 (28000): Statement was blocked by Firewall


UPDATE rental SET return_date = NOW();
ERROR 1045 (28000): Statement was blocked by Firewall

SQL injection

One of the big advantage of the MySQL Firewall is that it can help protect against SQL Injection attacks. In this post, I will not go into details of what is an SQL injection. However below a simplistic example to illustrate the overall principle.

User name and password are needed for authentication :

mysql_app> 
-- user = Mike : OK
-- password = 12345 : OK
-- Login OK

SELECT staff_id, first_name, email, last_name, username, password 
FROM sakila.staff 
WHERE username = 'Mike' AND password=sha1(12345)\G
*************************** 1. row ***************************
  staff_id: 1
first_name: Mike
     email: Mike.Hillyer@sakilastaff.com
 last_name: Hillyer
  username: Mike
  password: 8cb2237d0679ca88db6464eac60da96345513964

Low quality code can generate unsafe queries :

mysql>
-- user = Mike' AND '1'='1'\G --  : Not OK
-- password = <whatever>
-- SQL injection /!\

SELECT staff_id, first_name, email, last_name, username, password 
FROM sakila.staff 
WHERE username = 'Mike' AND '1'='1'\G --   AND password=sha1(<whatever>)\G
*************************** 1. row ***************************
  staff_id: 1
first_name: Mike
     email: Mike.Hillyer@sakilastaff.com
 last_name: Hillyer
  username: Mike
  password: 8cb2237d0679ca88db6464eac60da96345513964

Fortunately they are blocked by the MySQL Firewall :

mysql_app>
-- user = Mike' AND '1'='1'\G --  : Not OK
-- password = <whatever>
-- Blocked by Firewall because not matching whitelist

SELECT staff_id, first_name, email, last_name, username, password 
FROM sakila.staff 
WHERE username = 'Mike' AND '1'='1'\G --   AND password=sha1(12345)\G

ERROR 1045 (28000): Statement was blocked by Firewall

Rejected queries can be seen in the MySQL error log if the mysql_firewall_trace system variable is enabled.

mysql> 
SET GLOBAL mysql_firewall_trace = ON;
Query OK, 0 rows affected (0.00 sec)


SHOW VARIABLES LIKE 'mysql_firewall_trace';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| mysql_firewall_trace | ON    |
+----------------------+-------+

Unacceptable queries :

mysql_app> 
SELECT * FROM sakila.staff;
ERROR 1045 (28000): Statement was blocked by Firewall

DELETE FROM rental;
ERROR 1045 (28000): Statement was blocked by Firewall

TRUNCATE mysql.user;
ERROR 1045 (28000): Statement was blocked by Firewall

are available in the MySQL error log :

mysql> 
system tail -n3 /var/log/mysqld.log

2018-04-16T08:46:09.353950Z 1966 [Note] Plugin MYSQL_FIREWALL reported: 'ACCESS DENIED for 'myApp@localhost'. Reason: No match in whitelist. Statement: SELECT * FROM `sakila` . `staff` '
2018-04-16T08:46:09.354198Z 1966 [Note] Plugin MYSQL_FIREWALL reported: 'ACCESS DENIED for 'myApp@localhost'. Reason: No match in whitelist. Statement: DELETE FROM `rental` '
2018-04-16T08:46:09.354403Z 1966 [Note] Plugin MYSQL_FIREWALL reported: 'ACCESS DENIED for 'myApp@localhost'. Reason: No match in whitelist. Statement: TRUNCATE `mysql` . `user`

Detecting mode

MySQL Enterprise Firewall can also be used into a intrusion-detecting mode that writes suspicious statements to the error log but does not deny access.

mysql> 
CALL mysql.sp_set_firewall_mode('myApp@localhost', 'DETECTING');
Query OK, 3 rows affected (0.06 sec)


SELECT * FROM INFORMATION_SCHEMA.MYSQL_FIREWALL_USERS;
+-----------------+-----------+
| USERHOST        | MODE      |
+-----------------+-----------+
| myApp@localhost | DETECTING |
+-----------------+-----------+

Now using the application user account, suspicious queries will not be blocked :

mysql_app> 
-- Suspicious query
SHOW TABLES LIKE 'customer%';
+------------------------------+
| Tables_in_sakila (customer%) |
+------------------------------+
| customer                     |
| customer_list                |
+------------------------------+
2 rows in set (0.00 sec)

however a message is written into the MySQL error log :

mysql> 
system tail -n1 /var/log/mysqld.log
2018-04-16T09:01:47.133398Z 1966 [Note] Plugin MYSQL_FIREWALL reported: 'SUSPICIOUS STATEMENT from 'myApp@localhost'. Reason: No match in whitelist. Statement: SHOW TABLES LIKE ? '

Monitor the Firewall

MySQL Enterprise Firewall provides the following status variables :

Firewall_access_denied : The number of statements rejected by MySQL Enterprise Firewall.
Firewall_access_granted : The number of statements accepted by MySQL Enterprise Firewall.
Firewall_access_suspicious : The number of statements logged by MySQL Enterprise Firewall as suspicious for users who are in DETECTING mode.
Firewall_cached_entries : The number of statements recorded by MySQL Enterprise Firewall, including duplicates.

mysql> 
SHOW GLOBAL STATUS LIKE 'Firewall%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Firewall_access_denied     | 9     |
| Firewall_access_granted    | 8     |
| Firewall_access_suspicious | 1     |
| Firewall_cached_entries    | 3     |
+----------------------------+-------+

Uninstall the Firewall

To remove MySQL Enterprise Firewall, execute the following statements :

mysql> 
USE mysql;

DROP TABLE mysql.firewall_whitelist;
DROP TABLE mysql.firewall_users;
UNINSTALL PLUGIN mysql_firewall;
UNINSTALL PLUGIN mysql_firewall_whitelist;
UNINSTALL PLUGIN mysql_firewall_users;
DROP FUNCTION set_firewall_mode;
DROP FUNCTION normalize_statement;
DROP FUNCTION read_firewall_whitelist;
DROP FUNCTION read_firewall_users;
DROP FUNCTION mysql_firewall_flush_status;
DROP PROCEDURE mysql.sp_set_firewall_mode;
DROP PROCEDURE mysql.sp_reload_firewall_rules;

Note
You may have to kill the application remaining connections (e.g. KILL CONNECTION) or reconnect the application user (e.g. mysql> connect)

MySQL Enterprise Edition

It reduces the risk, cost, and complexity in developing, deploying, and managing business-critical MySQL applications.

MySQL Enterprise Edition server Trial Download (Note – Select Product Pack: MySQL Database).

In order to go further

MySQL Security Series

Reference Manual

MySQL Security

MySQL Security Serie – Password Management

Olivier’s MySQL Channel

Thanks for using MySQL!

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

Filed under:
MySQL, Security, Tuto by Olivier DASINI

MySQL Security – MySQL Enterprise Transparent Data Encryption

April 10, 2018

Tags: encryption, MySQL Enterprise, security

When thinking about security within a MySQL installation, you should consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications. MySQL provides many tools / features / plugins in order to protect your data including some advanced features like Audit, Data Masking & De-Identification, Firewall, Password Management, Password Validation Plugin, User Account Locking, etc…

MySQL Security

In this sixth episode of the MySQL Security series, we will see how data-at-rest encryption helps organizations implement stronger security controls and satisfy regulatory compliance.

You will be able to protect the privacy of your information, prevent data breaches and help meet popular regulatory requirements including GDPR, PCI DSS, HIPAA with MySQL Enterprise Transparent Data Encryption aka TDE.

MySQL Keyring

MySQL Enterprise Transparent Data Encryption relies on a MySQL Keyring plugin for master encryption key management. MySQL Server provides many plugin choices including :

keyring_file : that stores keyring data in a file local to the server host. Available in all MySQL distributions.
keyring_encrypted_file : that stores keyring data in an encrypted file local to the server host. Available in MySQL Enterprise Edition.
keyring_okv : a KMIP 1.1 plugin for use with KMIP-compatible backend keyring storage products such as Oracle Key Vault and Gemalto SafeNet KeySecure Appliance. Available in MySQL Enterprise Edition.
keyring_aws : that communicates with the Amazon Web Services Key Management Service for key generation and uses a local file for key storage. Available in MySQL Enterprise Edition.

In this article I’ll use the keyring_encrypted_file plugin with MySQL Enterprise Edition 5.7.21.

Updated on 22nd of August 2018
Note: MySQL Enterprise Transparent Data Encryption works with MySQL 8.0 as well. In other words examples below could be done with MySQL 8.0.12+

Warning
The keyring_file and keyring_encrypted_file plugins for encryption key management are not intended as a regulatory compliance solution. Security standards such as PCI, FIPS, and others require use of key management systems to secure, manage, and protect encryption keys in key vaults or hardware security modules (HSMs).

These keyring plugins would be use with MySQL Enterprise Audit to store the audit log file encryption password (you can read my previous article) and InnoDB tables when encrypted with MySQL Enterprise Transparent Data Encryption, actually the topic of this article but you already know that, right? :).

Keyring Plugin Installation : keyring_encrypted_file

Like I stated before many keyring plugins are available and I’ll use the keyring plugin : keyring_encrypted_file, that stores keyring data in an encrypted file local to the server host.

Only one keyring plugin should be enabled at a time. So let’s check that we have none so far :

mysql>
SELECT PLUGIN_NAME, PLUGIN_STATUS, PLUGIN_LIBRARY, PLUGIN_DESCRIPTION 
FROM INFORMATION_SCHEMA.PLUGINS 
WHERE PLUGIN_NAME = 'keyring_encrypted_file'\G
Empty set (0.01 sec)

To be usable by the server, the plugin library file must be located in the MySQL plugin directory :

mysql>
SHOW VARIABLES LIKE 'plugin_dir';
+---------------+------------------+
| Variable_name | Value            |
+---------------+------------------+
| plugin_dir    | /var/lib/plugin/ |
+---------------+------------------+


system ls  /var/lib/plugin/ | grep keyring
keyring_encrypted_file.so
keyring_file.so
keyring_okv.so
keyring_udf.so

Update the MySQL configuration file and restart the MySQL server :

[mysqld]
early-plugin-load=keyring_encrypted_file.so
keyring_encrypted_file_data=/var/lib/mysql-keyring/keyring-encrypted
keyring_encrypted_file_password=V&rySec4eT

To be usable during the server startup process, keyring_encrypted_file must be loaded using the early-plugin-load option.
To specify the password for encrypting the keyring data file, set the keyring_encrypted_file_password system variable.
The keyring_encrypted_file_data system variable optionally configures the location of the file used by the keyring_encrypted_file plugin for data storage.

Warning
Because the my.cnf file stores a password when written as shown, it should have a restrictive mode and be accessible only to the account used to run the MySQL server.

mysql> 
SELECT PLUGIN_NAME, PLUGIN_TYPE, PLUGIN_STATUS, PLUGIN_LIBRARY, PLUGIN_DESCRIPTION 
FROM INFORMATION_SCHEMA.PLUGINS 
WHERE PLUGIN_NAME = 'keyring_encrypted_file'\G
*************************** 1. row ***************************
       PLUGIN_NAME: keyring_encrypted_file
       PLUGIN_TYPE: KEYRING
     PLUGIN_STATUS: ACTIVE
    PLUGIN_LIBRARY: keyring_encrypted_file.so
PLUGIN_DESCRIPTION: store/fetch authentication data to/from an encrypted file

We can have a look at the Keyring file. Its location could be found with keyring_file_data variable :

mysql>
SHOW VARIABLES LIKE 'keyring_encrypted_file_data'\G
*************************** 1. row ***************************
Variable_name: keyring_encrypted_file_data
        Value: /var/lib/mysql-keyring/keyring-encrypted


system ls -lh /var/lib/mysql-keyring/keyring-encrypted
... 0 Apr 5 16:36 /var/lib/mysql-keyring/keyring-encrypted

After a keyring plugin is configured, InnoDB tables may be encrypted.

InnoDB Transparent Encryption

MySQL Enterprise Transparent Data Encryption

InnoDB supports data encryption for InnoDB tables stored in file-per-table tablespaces.

I’m using Sakila Database, available here.

Below a couple of random queries :

mysql> 
USE sakila;

SELECT rental_id
FROM rental
WHERE inventory_id = 3043
    AND customer_id = 53
    AND return_date IS NULL;
+-----------+
| rental_id |
+-----------+
|     11657 |
+-----------+
1 row in set (0,00 sec)


-- Find Overdue DVDs
SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer, address.phone, film.title
FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id
  INNER JOIN address ON customer.address_id = address.address_id
  INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id
  INNER JOIN film ON inventory.film_id = film.film_id
WHERE rental.return_date IS NULL
  AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()
  LIMIT 5;
+------------------+--------------+------------------------+
| customer         | phone        | title                  |
+------------------+--------------+------------------------+
| KNIGHT, GAIL     | 904253967161 | HYDE DOCTOR            |
| MAULDIN, GREGORY | 80303246192  | HUNGER ROOF            |
| JENKINS, LOUISE  | 800716535041 | FRISCO FORREST         |
| HOWELL, WILLIE   | 991802825778 | TITANS JERK            |
| DIAZ, EMILY      | 333339908719 | CONNECTION MICROCOSMOS |
+------------------+--------------+------------------------+
1 row in set (0,01 sec)

Data-at-Rest Encryption

Usually it is not necessary to encrypt all the tables. After a deep data analysis let’s assume 2 relevant InnoDB tables must be encrypted : customer and staff.

So let’s encrypted these tables!

We can do it easily with a CREATE or ALTER TABLE statement using ENCRYPTION=”Y” :

mysql> 
ALTER TABLE sakila.customer ENCRYPTION="Y";
Query OK, 599 rows affected (0,19 sec)
Records: 599  Duplicates: 0  Warnings: 0


ALTER TABLE sakila.staff ENCRYPTION="Y";
Query OK, 2 rows affected (0,15 sec)
Records: 2  Duplicates: 0  Warnings: 0

There we go!

Note
Plan appropriately when altering an existing table with the ENCRYPTION option. ALTER TABLE … ENCRYPTION operations rebuild the table using ALGORITHM=COPY. ALGORITHM=INPLACE is not supported.

Identify Tables that Use InnoDB Tablespace Encryption

This information is available in INFORMATION_SCHEMA.TABLES :

mysql>
SELECT TABLE_SCHEMA, TABLE_NAME, CREATE_OPTIONS 
FROM INFORMATION_SCHEMA.TABLES 
WHERE CREATE_OPTIONS = 'ENCRYPTION="Y"';
+--------------+------------+----------------+
| TABLE_SCHEMA | TABLE_NAME | CREATE_OPTIONS |
+--------------+------------+----------------+
| sakila       | customer   | ENCRYPTION="Y" |
| sakila       | staff      | ENCRYPTION="Y" |
+--------------+------------+----------------+

Query encrypted Tables

Transparent Data Encryption means transparent for the clients ie no change in your applications.

In other words, you don’t have to modify your queries and sames queries with same datasets will give me the same result :

mysql> 
USE sakila;

SELECT rental_id
FROM rental
WHERE inventory_id = 3043
    AND customer_id = 53
    AND return_date IS NULL;
+-----------+
| rental_id |
+-----------+
|     11657 |
+-----------+
1 row in set (0,00 sec)


-- Find Overdue DVDs
SELECT CONCAT(customer.last_name, ', ', customer.first_name) AS customer, address.phone, film.title
FROM rental INNER JOIN customer ON rental.customer_id = customer.customer_id
  INNER JOIN address ON customer.address_id = address.address_id
  INNER JOIN inventory ON rental.inventory_id = inventory.inventory_id
  INNER JOIN film ON inventory.film_id = film.film_id
WHERE rental.return_date IS NULL
  AND rental_date + INTERVAL film.rental_duration DAY < CURRENT_DATE()
  LIMIT 5;
+------------------+--------------+------------------------+
| customer         | phone        | title                  |
+------------------+--------------+------------------------+
| KNIGHT, GAIL     | 904253967161 | HYDE DOCTOR            |
| MAULDIN, GREGORY | 80303246192  | HUNGER ROOF            |
| JENKINS, LOUISE  | 800716535041 | FRISCO FORREST         |
| HOWELL, WILLIE   | 991802825778 | TITANS JERK            |
| DIAZ, EMILY      | 333339908719 | CONNECTION MICROCOSMOS |
+------------------+--------------+------------------------+
1 row in set (0,01 sec)

Master Key Rotation

Legal and regulatory requirements require a periodically rotation of the master encryption key. It’s also more than recommended to rotate the master key whenever you suspect a security breach. Rotating the master encryption key only changes the master encryption key and re-encrypts tablespace keys.

To rotate the master encryption key use ALTER INSTANCE ROTATE INNODB MASTER KEY :

mysql> 
system ls -lh /var/lib/mysql-keyring/keyring-encrypted
... 197 Apr  5 16:44 /var/lib/mysql-keyring/keyring-encrypted


ALTER INSTANCE ROTATE INNODB MASTER KEY;
Query OK, 0 rows affected (0,01 sec)


system ls -lh /var/lib/mysql-keyring/keyring-encrypted
... 325 Apr  5 17:18 /var/lib/mysql-keyring/keyring-encrypted

Et voilà!

Now you know the MySQL Enterprise Transparent Encryption basics \o/

I would like to address a last item, regarding a common customer question about the disk footprint of an encrypted table.

So let’s play a little bit more with TDE.

We have a table, encryptedTable, not yet encrypted with 5 millions rows

Table size is 722 MB

mysql> 
SHOW TABLE STATUS LIKE 'encryptedTable'\G
*************************** 1. row ***************************
           Name: encryptedTable
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 4986035
 Avg_row_length: 61
    Data_length: 304922624
Max_data_length: 0
   Index_length: 449921024
      Data_free: 2097152
 Auto_increment: NULL
    Create_time: 2018-04-10 12:02:35
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 


-- Rows = 5 million
SELECT count(*) FROM encryptedTable;
+----------+
| count(*) |
+----------+
|  5000000 |
+----------+


-- Table size = 722MB
SELECT   
    CONCAT(sum(ROUND(data_length / ( 1024 * 1024 ), 4)), 'MB') DATA,   
    CONCAT(sum(ROUND(index_length / ( 1024 * 1024 ),4)), 'MB') INDEXES, 
    CONCAT(sum(ROUND(data_free / ( 1024 * 1024 ), 4)), 'MB') DATA_FREE, 
    CONCAT(sum(ROUND(( data_length + index_length+data_free) / ( 1024 * 1024 ), 4)), 'MB') 'TOTAL' 
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'test'
    AND TABLE_NAME = ('encryptedTable')\G
*************************** 1. row ***************************
     DATA: 290.7969MB
  INDEXES: 429.0781MB
DATA_FREE: 2.0000MB
    TOTAL: 721.8750MB

Table is now encrypted

ALTER TABLE encryptedTable ENCRYPTION="Y";
Query OK, 5000000 rows affected (6 min 42,58 sec)
Records: 5000000  Duplicates: 0  Warnings: 0

Table size is now : 870 MB

-- Table size = 870MB
SELECT   
    CONCAT(sum(ROUND(data_length / ( 1024 * 1024 ), 4)), 'MB') DATA,   
    CONCAT(sum(ROUND(index_length / ( 1024 * 1024 ),4)), 'MB') INDEXES, 
    CONCAT(sum(ROUND(data_free / ( 1024 * 1024 ), 4)), 'MB') DATA_FREE, 
    CONCAT(sum(ROUND(( data_length + index_length+data_free) / ( 1024 * 1024 ), 4)), 'MB') 'TOTAL' 
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'test'
    AND TABLE_NAME = ('encryptedTable')\G
*************************** 1. row ***************************
     DATA: 249.7813MB
  INDEXES: 614.6563MB
DATA_FREE: 6.0000MB
    TOTAL: 870.4375MB

Is there a way to reclaim space (and optionally improve I/O efficiency)? Yes of course, OPTIMIZE TABLE :

OPTIMIZE TABLE test.encrypTable\G
*************************** 1. row ***************************
   Table: test.encrypTable
      Op: optimize
Msg_type: note
Msg_text: Table does not support optimize, doing recreate + analyze instead
*************************** 2. row ***************************
   Table: test.encrypTable
      Op: optimize
Msg_type: status
Msg_text: OK
2 rows in set (1 min 32,52 sec)

Table size is again 722 MB

-- Table size = 722MB
SELECT   
    CONCAT(sum(ROUND(data_length / ( 1024 * 1024 ), 4)), 'MB') DATA,   
    CONCAT(sum(ROUND(index_length / ( 1024 * 1024 ),4)), 'MB') INDEXES, 
    CONCAT(sum(ROUND(data_free / ( 1024 * 1024 ), 4)), 'MB') DATA_FREE, 
    CONCAT(sum(ROUND(( data_length + index_length+data_free) / ( 1024 * 1024 ), 4)), 'MB') 'TOTAL' 
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'test'
    AND TABLE_NAME = ('encryptedTable')\G
*************************** 1. row ***************************
     DATA: 290.7969MB
  INDEXES: 429.0781MB
DATA_FREE: 2.0000MB
    TOTAL: 721.8750MB

MySQL Enterprise Edition

It reduces the risk, cost, and complexity in developing, deploying, and managing business-critical MySQL applications.

MySQL Enterprise Edition server Trial Download (Note – Select Product Pack: MySQL Database).

In order to go further

MySQL Security Series

Reference Manual

MySQL Security

MySQL Security Serie – Password Management

Olivier’s MySQL Channel

Thanks for using MySQL!

Watch my videos on my YouTube channel and subscribe.