MySQL Security – MySQL Enterprise Data Masking and De-Identification

March 19, 2019

When thinking about security within a MySQL installation, you should consider a wide range of possible procedures / best practices and how they affect the security of your MySQL server and related applications. MySQL provides many tools / features / plugins in order to protect your data including some advanced features like Transparent Data Encryption aka TDE,  Audit, Data Masking & De-Identification, Firewall, Password Management, Password Validation Plugin, etc…

MySQL Security

In order to mitigate the effects of data breaches, and therefore the associated risks for your organization’s brand and reputation, popular regulations or standards including GDPR, PCI DSS, HIPAA,… recommand (among others things) data masking and de-identification.

According to Wikipedia:

  • Data masking or data obfuscation is the process of hiding original data with modified content (characters or other data.)
  • De-identification is the process used to prevent a person’s identity from being connected with information. For example, data produced during human subject research might be de-identified to preserve research participants’ privacy.

In other words, MySQL Enterprise Data Masking and De-Identification hides sensitive information by replacing real values with substitutes in order to protect sensitive data while they are still look real and consistent.

This the topic of this eight episode of this MySQL  Security series (URLs to all the articles at the end of this page).

MySQL Enterprise Data Masking and De-Identification

The simplest way to present this MySQL feature :
A built-in database solution to help organizations protect sensitive data from unauthorized uses

MySQL Enterprise Masking and De-identificaiton protects sensitive data from unauthorized users.

Note:

MySQL Enterprise Data Masking and De-Identification is an extension included in MySQL Enterprise Edition, a commercial product.

Available in MySQL 8.0, as of 8.0.13 and in MySQL 5.7, as of 5.7.24.

First step, installation.

Installation

MySQL Enterprise Data Masking and De-Identification, is implemented as a plugin library file containing a plugin and user-defined functions (UDFs).
As usual install is easy:

You can check the activation of the data masking plugin:

Note:

If the plugin and UDFs are used on a master replication server, install them on all slave servers as well to avoid replication problems.

Uninstall is simple as well, uninstall the plugin and drop the UDFs:

Now we’re ready to play!

Data Generation

One of the nice “side feature” of MySQL Data Masking and De-Identification is the ability to generate business relevant datasets. Because it is not always possible to test/simulate your application on your real dataset (indeed playing with customer credit card or security social numbers is a very bad practice) this feature is very convenient.

Generating Random Data with Specific Characteristics

Several functions are available. They start with these 4 first characters: gen_ and you’ll find the complete list here.
In this article I’ll use the following functions :

  • gen_range() : returns a random integer selected from a given range.
  • gen_rnd_email() : returns a random email address in the example.com domain.
  • gen_rnd_pan() : returns a random payment card Primary Account Number.
  • gen_rnd_us_phone() : returns a random U.S. phone number in the 555 area code not used for legitimate numbers.

Generating Random Data Using Dictionaries

Sometime you will need data with better quality. So another way to generate a relevant dataset is to use dictionaries.

Again several functions are available. They also start with these 4 first characters: gen_ and you’ll find the complete list here.
I’ll use the following functions :

  • gen_dictionary_load() : Loads a file into the dictionary registry and assigns the dictionary a name to be used with other functions that require a dictionary name argument.
  • gen_dictionary() : Returns a random term from a dictionary.

OK, let’s moving forward!
In order to use data from a dictionary we must first load the data.

A dictionary is a plain text file, with one term per line:

Then we must load the dictionaries

Note:

The secure_file_priv variable must be set properly (usually in your my.cnf or my.ini).

Note:

Dictionaries are not persistent. Any dictionary used by applications must be loaded for each server startup.

Now I have all my bricks to build my business centric test dataset.
For example I can generate a random email address:

Or a random city from my dictionary of the cities of Martinique :

Awesome!

Now let’s use these functions to generate some random but business oriented data.
Below our test table called sensitive_data which contains… sensitive data :

I created a stored procedure (sorry but I’m a DBA) to fill my table with data. However a script in your favorite programming language could be a better choice:

It looks like real data, it smells like real data, it sounds like real data but these are not real data. That’s what we wanted 🙂

Data Masking and De-Identification

Many masking functions are available. They start with these 5 first characters: mask_ and you’ll find the complete list here.
I’ll use the following functions :

mask_inner() masks the interior of its string argument, leaving the ends unmasked. Other arguments specify the sizes of the unmasked ends.

mask_outer() does the reverse, masking the ends of its string argument, leaving the interior unmasked. Other arguments specify the sizes of the masked ends.

mask_pan() masks all but the last four digits of the number;
mask_pan_relaxed() is similar but does not mask the first six digits that indicate the payment card issuer unmasked.

Note:

If you deal with U.S. Social Security Numbers, you could also use mask_ssn() function.

e.g. mysql> SELECT mask_ssn(gen_rnd_ssn());

So how to masked and de-identified customer sensitive data ?


There are different strategies. One is to use views.
Thus you already have a first level of security because you can choose only the columns the business need and/or filter the rows.
Furthermore you have another level of security because you can control who can access these data with relevant privileges, with or without roles.

Let’s see some examples:

Ex. 1
Mask the firstname (firstname) & the lastname (lastname)

Ex. 2
Mask the credit card number (credit_card)

Ex. 3
Replace real values of employee id (emp_id) and birth date (birth_date) with random ones.

Et voilà!
As a conclusion, MySQL Enterprise Masking and De-Identification enables organization to:

  • Meet regulatory requirements and data privacy laws
  • Significantly reduce the risk of a data breach
  • Protect confidential information

To conclude this conclusion, I recommend to read Data Masking in MySQL blog post from the MySQL Server Blog.

MySQL Enterprise Edition

MySQL Enterprise Edition includes the most comprehensive set of advanced features, management tools and technical support to achieve the highest levels of MySQL scalability, security, reliability, and uptime.

It reduces the risk, cost, and complexity in developing, deploying, and managing business-critical MySQL applications.

MySQL Enterprise Edition server Trial Download (Note – Select Product Pack: MySQL Database).

MySQL Enterprise Edition

In order to go further

MySQL Security Series

  1. Password Validation Plugin
  2. Password Management
  3. User Account Locking
  4. The Connection-Control Plugins
  5. Enterprise Audit
  6. Enterprise Transparent Data Encryption (TDE)
  7. Enterprise Firewall
  8. Enterprise Data Masking and De-Identification

Reference Manual

MySQL Security

Blog posts

Thanks for using MySQL!

Follow me on twitter

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using MySQL!

15 Responses to “MySQL Security – MySQL Enterprise Data Masking and De-Identification”

  1. […] Enterprise Data Masking and De-Identification […]

  2. […] Enterprise Data Masking and De-Identification […]

  3. […] Enterprise Data Masking and De-Identification […]

  4. […] Enterprise Data Masking and De-Identification […]

  5. […] Enterprise Data Masking and De-Identification […]

  6. […] to protect your data including some advanced features like Transparent Data Encryption aka TDE, Data Masking & De-Identification, Firewall, Password Management, Password Validation Plugin, […]

  7. […] / plugins in order to protect your data including some advanced features like Audit, TDE, Data Masking & De-Identification, Password Management, Password Validation Plugin, User Account Locking, […]

  8. […] Enterprise Data Masking and De-Identification : Anonymisation et obfuscation de données. […]

  9. […] the MySQL instance for test or development, this step is not covered by this blog but using our masking and De-Identification can be used to solve this step 😉 I have tested above procedures using MySQL 8.0.18. Happy […]

  10. […] your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Password Expiration Policy, Password Reuse Policy, Password Verification-Required […]

  11. […] your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Reuse Policy, Password Verification-Required […]

  12. […] your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Verification-Required […]

  13. […] protect your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Reuse Policy, […]

  14. […] your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Reuse Policy, Password […]

  15. […] protect your data including some advanced features like Transparent Data Encryption (TDE), Audit, Data Masking & De-Identification, Firewall, Random Password Generation, Password Expiration Policy, Password Reuse Policy, Password […]