MySQL Security – MySQL Enterprise Data Masking and De-Identification

March 19, 2019

When thinking about security within a MySQL installation, you should consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications. MySQL provides many tools / features / plugins in order to protect your data including some advanced features like Transparent Data Encryption aka TDE,  Audit, Data Masking & De-Identification, Firewall, Password Management, Password Validation Plugin, etc…

MySQL Security

In order to mitigate the effects of data breaches, and therefore the associated risks for your organization’s brand and reputation, popular regulations or standards including GDPR, PCI DSS, HIPAA,… recommand (among others things) data masking and de-identification.

According to Wikipedia:

  • Data masking or data obfuscation is the process of hiding original data with modified content (characters or other data.)
  • De-identification is the process used to prevent a person’s identity from being connected with information. For example, data produced during human subject research might be de-identified to preserve research participants’ privacy.

In other words, MySQL Enterprise Data Masking and De-Identification hides sensitive information by replacing real values with substitutes in order to protect sensitive data while they are still look real and consistent.

This the topic of this eight episode of this MySQL  Security series (URLs to all the articles at the end of this page).

MySQL Enterprise Data Masking and De-Identification

The simplest way to present this MySQL feature :
A built-in database solution to help organizations protect sensitive data from unauthorized uses

MySQL Enterprise Masking and De-identificaiton protects sensitive data from unauthorized users.

Note:

MySQL Enterprise Data Masking and De-Identification is an extension included in MySQL Enterprise Edition, a commercial product.

Available in MySQL 8.0, as of 8.0.13 and in MySQL 5.7, as of 5.7.24.

First step, installation.

Installation

MySQL Enterprise Data Masking and De-Identification, is implemented as a plugin library file containing a plugin and user-defined functions (UDFs).
As usual install is easy:

mysql> 
INSTALL PLUGIN data_masking SONAME 'data_masking.so';
CREATE FUNCTION gen_blacklist RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary_drop RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_dictionary_load RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_range RETURNS INTEGER  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_email RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_pan RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_ssn RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION gen_rnd_us_phone RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_inner RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_outer RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_pan RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_pan_relaxed RETURNS STRING  SONAME 'data_masking.so';
CREATE FUNCTION mask_ssn RETURNS STRING  SONAME 'data_masking.so';

You can check the activation of the data masking plugin:

mysql> 
SELECT PLUGIN_NAME, PLUGIN_STATUS, PLUGIN_VERSION, PLUGIN_LIBRARY, PLUGIN_DESCRIPTION 
FROM INFORMATION_SCHEMA.PLUGINS 
WHERE PLUGIN_NAME='data_masking'\G
*************************** 1. row ***************************
       PLUGIN_NAME: data_masking
     PLUGIN_STATUS: ACTIVE
    PLUGIN_VERSION: 0.1
    PLUGIN_LIBRARY: data_masking.so
PLUGIN_DESCRIPTION: Data masking facilities

Note:

If the plugin and UDFs are used on a master replication server, install them on all slave servers as well to avoid replication problems.

Uninstall is simple as well, uninstall the plugin and drop the UDFs:

mysql>
UNINSTALL PLUGIN data_masking;
DROP FUNCTION gen_blacklist;
DROP FUNCTION gen_dictionary;
DROP FUNCTION gen_dictionary_drop;
DROP FUNCTION gen_dictionary_load;
DROP FUNCTION gen_range;
DROP FUNCTION gen_rnd_email;
DROP FUNCTION gen_rnd_pan;
DROP FUNCTION gen_rnd_ssn;
DROP FUNCTION gen_rnd_us_phone;
DROP FUNCTION mask_inner;
DROP FUNCTION mask_outer;
DROP FUNCTION mask_pan;
DROP FUNCTION mask_pan_relaxed;
DROP FUNCTION mask_ssn;

Now we’re ready to play!

Data Generation

One of the nice “side feature” of MySQL Data Masking and De-Identification is the ability to generate business relevant datasets. Because it is not always possible to test/simulate your application on your real dataset (indeed playing with customer credit card or security social numbers is a very bad practice) this feature is very convenient.

Generating Random Data with Specific Characteristics

Several functions are available. They start with these 4 first characters: gen_ and you’ll find the complete list here.
In this article I’ll use the following functions :

  • gen_range() : returns a random integer selected from a given range.
  • gen_rnd_email() : returns a random email address in the example.com domain.
  • gen_rnd_pan() : returns a random payment card Primary Account Number.
  • gen_rnd_us_phone() : returns a random U.S. phone number in the 555 area code not used for legitimate numbers.

Generating Random Data Using Dictionaries

Sometime you will need data with better quality. So another way to generate a relevant dataset is to use dictionaries.

Again several functions are available. They also start with these 4 first characters: gen_ and you’ll find the complete list here.
I’ll use the following functions :

  • gen_dictionary_load() : Loads a file into the dictionary registry and assigns the dictionary a name to be used with other functions that require a dictionary name argument.
  • gen_dictionary() : Returns a random term from a dictionary.

OK, let’s moving forward!
In order to use data from a dictionary we must first load the data.

A dictionary is a plain text file, with one term per line:

$ head /dict/mq_cities.txt
Basse-Pointe
Bellefontaine
Case-Pilote
Ducos
Fonds-Saint-Denis
Fort-de-France
Grand'Rivière
Gros-Morne
L'Ajoupa-Bouillon
La Trinité

Then we must load the dictionaries

Note:

The secure_file_priv variable must be set properly (usually in your my.cnf or my.ini).

mysql> SHOW VARIABLES LIKE 'secure_file_priv'\G
*************************** 1. row ***************************
Variable_name: secure_file_priv
        Value: /dict/
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/Firstnames.txt', 'Firstnames')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/Firstnames.txt', 'Firstnames'): Dictionary load success
1 row in set (0,20 sec)

mysql> SELECT gen_dictionary_load('/dict/Lastnames.txt', 'Lastnames')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/Lastnames.txt', 'Lastnames'): Dictionary load success
1 row in set (0,24 sec)

mysql> SELECT gen_dictionary_load('/dict/JobTitles.txt', 'JobTitles')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/JobTitles.txt', 'JobTitles'): Dictionary load success
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/BirthDates.txt', 'BirthDates')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/BirthDates.txt', 'BirthDates'): Dictionary load success
1 row in set (0,00 sec)

mysql> SELECT gen_dictionary_load('/dict/mq_cities.txt', 'mq_Cities')\G
*************************** 1. row ***************************
gen_dictionary_load('/dict/mq_cities.txt', 'mq_Cities'): Dictionary load success
1 row in set (0,00 sec)

Note:

Dictionaries are not persistent. Any dictionary used by applications must be loaded for each server startup.

Now I have all my bricks to build my business centric test dataset.
For example I can generate a random email address:

mysql> SELECT gen_rnd_email();
+---------------------------+
| gen_rnd_email()           |
+---------------------------+
| rcroe.odditdn@example.com |
+---------------------------+

Or a random city from my dictionary of the cities of Martinique :

mysql> SELECT gen_dictionary('mq_Cities');
+-------------------------------+
| gen_dictionary('mq_Cities')   |
+-------------------------------+
| Fort-de-France                |
+-------------------------------+

Awesome!

Now let’s use these functions to generate some random but business oriented data.
Below our test table called sensitive_data which contains… sensitive data :

CREATE TABLE sensitive_data(
    emp_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    firstname VARCHAR(100) NOT NULL,
    lastname VARCHAR(100) NOT NULL,
    birth_date date,
    email VARCHAR(100) NOT NULL,
    phone VARCHAR(20),
    jobTitle VARCHAR(50),
    salary INT UNSIGNED,
    city VARCHAR(30),
    credit_card CHAR(19),
    PRIMARY KEY (emp_id))
;

I created a stored procedure (sorry but I’m a DBA) to fill my table with data. However a script in your favorite programming language could be a better choice:

DELIMITER //
DROP PROCEDURE IF EXISTS add_rows;
CREATE PROCEDURE add_rows( IN numRow TINYINT UNSIGNED)
BEGIN
    DECLARE cpt TINYINT UNSIGNED DEFAULT 0;
    WHILE cpt < numRow DO
        INSERT INTO sensitive_data(firstname, lastname, birth_date, email, phone, jobTitle, salary, city, credit_card)
        SELECT
        gen_dictionary('Firstnames'),
        gen_dictionary('Lastnames'),
        gen_dictionary('BirthDates'),
        gen_rnd_email(),
        gen_rnd_us_phone(),
        gen_dictionary('JobTitles'),
        gen_range(30000, 120000),
        gen_dictionary('mq_Cities'),
        gen_rnd_pan()
        FROM DUAL;
        SET cpt = cpt + 1;
        SELECT sleep(1);
    END WHILE;
END//
DELIMITER ;


-- Call the procedure and insert 10 rows in the table
CALL add_rows(10);


mysql> SELECT firstname, lastname, phone, salary, city FROM sensitive_data;
+-----------+-----------+----------------+--------+------------------+
| firstname | lastname  | phone          | salary | city             |
+-----------+-----------+----------------+--------+------------------+
| Fresh     | Daz       | 1-555-381-3165 |  78920 | Ducos            |
| Doowon    | Vieri     | 1-555-645-3332 |  78742 | Macouba          |
| Marsja    | Speckmann | 1-555-455-3688 |  56526 | Les Trois-Îlets  |
| Carrsten  | Speckmann | 1-555-264-8108 |  51253 | Fort-de-France   |
| Yonghong  | Marrevee  | 1-555-245-0883 |  86820 | Le Lorrain       |
| Shuji     | Magliocco | 1-555-628-3771 |  88615 | Le Marin         |
| Luisa     | Sury      | 1-555-852-7710 | 117957 | Le Morne-Rouge   |
| Troy      | Zobel     | 1-555-805-0270 |  78801 | Bellefontaine    |
| Lunjin    | Pettis    | 1-555-065-0517 |  69782 | Le Prêcheur      |
| Boriana   | Marletta  | 1-555-062-4226 |  70970 | Saint-Joseph     |
+-----------+-----------+----------------+--------+------------------+
10 rows in set (0,00 sec)

It looks like real data, it smells like real data, it sounds like real data but these are not real data. That’s what we wanted 🙂

Data Masking and De-Identification

Many masking functions are available. They start with these 5 first characters: mask_ and you’ll find the complete list here.
I’ll use the following functions :

mask_inner() masks the interior of its string argument, leaving the ends unmasked. Other arguments specify the sizes of the unmasked ends.

SELECT phone, mask_inner(phone, 0, 4) FROM sensitive_data LIMIT 1;
+----------------+-------------------------+
| phone          | mask_inner(phone, 0, 4) |
+----------------+-------------------------+
| 1-555-381-3165 | XXXXXXXXXX3165          |
+----------------+-------------------------+

mask_outer() does the reverse, masking the ends of its string argument, leaving the interior unmasked. Other arguments specify the sizes of the masked ends.

SELECT birth_date, mask_outer(birth_date, 5, 0) FROM sensitive_data LIMIT 1;
+------------+------------------------------+
| birth_date | mask_outer(birth_date, 5, 0) |
+------------+------------------------------+
| 1954-06-08 | XXXXX06-08                   |
+------------+------------------------------+

mask_pan() masks all but the last four digits of the number;
mask_pan_relaxed() is similar but does not mask the first six digits that indicate the payment card issuer unmasked.

SELECT mask_pan(credit_card), mask_pan_relaxed(credit_card) FROM sensitive_data LIMIT 1;
+-----------------------+-------------------------------+
| mask_pan(credit_card) | mask_pan_relaxed(credit_card) |
+-----------------------+-------------------------------+
| XXXXXXXXXXXX4416      | 262491XXXXXX4416              |
+-----------------------+-------------------------------+

Note:

If you deal with U.S. Social Security Numbers, you could also use mask_ssn() function.

e.g. mysql> SELECT mask_ssn(gen_rnd_ssn());

So how to masked and de-identified customer sensitive data ?


There are different strategies. One is to use views.
Thus you already have a first level of security because you can choose only the columns the business need and/or filter the rows.
Furthermore you have another level of security because you can control who can access these data with relevant privileges, with or without roles.

Let’s see some examples:

Ex. 1
Mask the firstname (firstname) & the lastname (lastname)

CREATE VIEW v1_mask AS
  SELECT
    mask_inner(firstname, 1, 0) AS firstname,
    mask_outer(lastname, 3, 3) AS lastname,
    salary
  FROM sensitive_data;
SELECT * FROM v1_mask WHERE salary > 100000;
+-----------+----------+--------+
| firstname | lastname | salary |
+-----------+----------+--------+
| LXXXX     | XXXX     | 117957 |
+-----------+----------+--------+

Ex. 2
Mask the credit card number (credit_card)

CREATE VIEW v2_mask AS
  SELECT
    firstname,
    lastname,
    email,
    phone,
    mask_pan(credit_card) AS credit_card
  FROM sensitive_data;  
SELECT email, phone, credit_card 
FROM v2_mask 
WHERE firstname='Fresh' AND lastname='Daz';
+---------------------------+----------------+------------------+
| email                     | phone          | credit_card      |
+---------------------------+----------------+------------------+
| bcnnk.wnruava@example.com | 1-555-381-3165 | XXXXXXXXXXXX4416 |
+---------------------------+----------------+------------------+

Ex. 3
Replace real values of employee id (emp_id) and birth date (birth_date) with random ones.

CREATE VIEW v3_mask AS
  SELECT
    gen_range(1, 1000) AS emp_id,
    FROM_DAYS(gen_range(715000, 731000)) AS birth_date,
    jobTitle,
    salary,
    city 
  FROM sensitive_data;
SELECT DISTINCT
    jobTitle,
    max(salary) OVER w AS Max,
    min(salary) OVER w AS Min,
    AVG(salary) OVER w AS Avg
FROM v3_mask
WINDOW w AS (
    PARTITION BY jobTitle
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
+--------------------+--------+-------+------------+
| jobTitle           | Max    | Min   | Avg        |
+--------------------+--------+-------+------------+
| Assistant Engineer |  78920 | 78920 | 78920.0000 |
| Engineer           |  88615 | 88615 | 88615.0000 |
| Manager            |  78801 | 51253 | 65027.0000 |
| Senior Engineer    |  86820 | 70970 | 78895.0000 |
| Staff              |  78742 | 69782 | 74262.0000 |
| Technique Leader   | 117957 | 56526 | 87241.5000 |
+--------------------+--------+-------+------------+

Et voilà!
As a conclusion, MySQL Enterprise Masking and De-Identification enables organization to:

  • Meet regulatory requirements and data privacy laws
  • Significantly reduce the risk of a data breach
  • Protect confidential information

To conclude this conclusion, I recommend to read Data Masking in MySQL blog post from the MySQL Server Blog.

MySQL Enterprise Edition

MySQL Enterprise Edition includes the most comprehensive set of advanced features, management tools and technical support to achieve the highest levels of MySQL scalability, security, reliability, and uptime.

It reduces the risk, cost, and complexity in developing, deploying, and managing business-critical MySQL applications.

MySQL Enterprise Edition server Trial Download (Note – Select Product Pack: MySQL Database).

MySQL Enterprise Edition

In order to go further

MySQL Security Series

  1. Password Validation Plugin
  2. Password Management
  3. User Account Locking
  4. The Connection-Control Plugins
  5. Enterprise Audit
  6. Enterprise Transparent Data Encryption (TDE)
  7. Enterprise Firewall
  8. Enterprise Data Masking and De-Identification

Reference Manual

MySQL Security

Blog posts

Thanks for using MySQL!

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

15 Responses to “MySQL Security – MySQL Enterprise Data Masking and De-Identification”

  1. […] Enterprise Data Masking and De-Identification […]

  2. […] Enterprise Data Masking and De-Identification […]

  3. […] Enterprise Data Masking and De-Identification […]

  4. […] Enterprise Data Masking and De-Identification […]

  5. […] Enterprise Data Masking and De-Identification […]

  6. […] to protect your data including some advanced features like Transparent Data Encryption aka TDE, Data Masking & De-Identification, Firewall, Password Management, Password Validation Plugin, […]

  7. […] / plugins in order to protect your data including some advanced features like Audit, TDE, Data Masking & De-Identification, Password Management, Password Validation Plugin, User Account Locking, […]

  8. […] Enterprise Data Masking and De-Identification : Anonymisation et obfuscation de données. […]

  9. […] the MySQL instance for test or development, this step is not covered by this blog but using our masking and De-Identification can be used to solve this step 😉 I have tested above procedures using MySQL 8.0.18. Happy […]

  10. […] your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Password Expiration Policy, Password Reuse Policy, Password Verification-Required […]

  11. […] your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Reuse Policy, Password Verification-Required […]

  12. […] your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Verification-Required […]

  13. […] protect your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Reuse Policy, […]

  14. […] your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Reuse Policy, Password […]

  15. […] protect your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Reuse Policy, Password […]