30 mins with MySQL Query Rewriter

February 25, 2016

Lire cet article en français

TL;TR

Sometime DBAs have to deal with problematic queries and cannot tackle the problem at the source (problematic queries from ORMs, third party apps,… or source unknown…).

MySQL 5.7 provides :

  • A pre and post parse query rewrite APIs
    • Users can write their own plug-ins
    • You can eliminates many legacy use cases for proxies
  • With the post-parse query plugin, you can:
    • Rewrite problematic queries without the need to make application changes
    • Add hints
    • Modify join order

 

Query Rewrite Plugin APIs

Paraphrasing this MySQL Server Blog’s post :

MySQL now offer two APIs for writing query rewrite plugins.

Pre-parse rewrite plugin API,  is for when you know exactly – as in character-by-character exactly – what the offending queries look like. This one has a hook to intercept the query string right before it’s parsed.

Post-parse rewrite plugin API, comes in right after parsing and acts on the parse tree. It offers the basic functionality to walk over the parsed query, which is a lot more efficient than dealing with a string.

This blog post is about the Rewriter plugin, a post-parse query rewrite plugin, include in the MySQL 5.7 distribution (as of MySQL 5.7.6).

 

Rewriter plugin

Installation and checks

You are in MySQL world, so that’s obviously very easy! 🙂

To install the Rewriter query rewrite plugin, run install_rewriter.sql located in the share directory of your MySQL installation.

 

You can now check if everything is ok

New database objects were also created

 

The rewrite_rules table in the query_rewrite database provides persistent storage for the rules that the Rewriter plugin uses to decide whether to rewrite statements.

 

You can enable and disable the plugin at runtime :

 

Obviously it also possible to enable the plugin in the configuration file (my.cnf | my.ini)

 

 

Rewrite a query

Post parse rewrite plugin works only with SELECT statements.

You can try non-select statements but you’ll see, in rewrite_rules.message column, the following error message when you’ll flush the rules:

 

Rewriter plugin usage is very simple. Let’s start with a dummy example: transform a SELECT n into a SELECT n+1 (n is an integer).

Patterns

 -> SELECT 1      # Input

 <= SELECT 2    # Output

Add the rewrite rule

To add rules for the Rewriter plugin, add rows to the rewrite_rules table.

 

rewrite_rules table content is:

 

Rewriter’s stats shows:

 

Flush the rewrite rule

Then invoke the flush_rewrite_rules() stored procedure to load the rules from the table into the plugin.

When the plugin reads each rule from the rules table, it computes a normalized form (digest) from the pattern and a digest hash value, and updates the normalized_pattern and pattern_digest columns:

 

Within a pattern template, ? characters act as parameter markers that match data values. Parameter markers can be used only where data values should appear, not for SQL keywords, identifiers, and so forth. The ? characters should not be enclosed within quotation marks.

 

If the query parsing failed the stored procedure will raise an error:

You have more details in query_rewrite.rewrite_rules.message column

e.g.

Rewriter’s stats shows now:

 

Rewritten query

So we added and flushed the rules, we can now execute a query according to the pattern and see the behaviour

 

Magic !!!

The query was rewritten “on the fly”. However some interesting details are hiding in the warning.

All we need to know are in the Note : 1105

Rewriter’s stats are updated accordingly:

 

To disable an existing rule, modify its enabled column and reload the table into the plugin:

 

You can also delete table rows

 

OK folks! Let’s see more relevant examples

 

Rewrite plugin examples

Ex 1

Rewrite a Join into a Sub-query; ie for performance reason you must rewrite the query but you don’t have access to the app.

Patterns

 -> SELECT count(distinct emp_no) FROM employees.employees INNER JOIN employees.salaries USING(emp_no) WHERE DATEDIFF(to_date, from_date) < {integer};

 <= SELECT count(emp_no) FROM employees.employees WHERE emp_no IN ( SELECT emp_no FROM employees.salaries WHERE DATEDIFF(to_date, from_date) < {integer});

 

Without the rule

 

Add the rewrite rule

 

With the rule

The query time is dropped from 12.93 s to 3.77

 

 

Ex 2

Limit query execution time; ie add MySQL 5.7 hint /*+ MAX_EXECUTION_TIME(X)*/ that is query duration can’t be more than X milliseconds

Patterns

 -> SELECT count(distinct emp_no) FROM employees.employees INNER JOIN employees.salaries USING(emp_no) WHERE salary = {integer};

 <= SELECT /*+ MAX_EXECUTION_TIME(10000)*/ count(distinct emp_no) FROM employees.employees INNER JOIN employees.salaries USING(emp_no) WHERE salary = {integer};

 

Without the rule

 

Add the rewrite rule

 

With the rule

 

 

Ex 3

Schema evolution; ie table column added (or dropped) but you can’t change the query (or not immediately)

Patterns

 -> SELECT first_name, last_name FROM employees.employees WHERE year(hire_date) = 2000;

 <= SELECT first_name, last_name, birth_date FROM employees.employees WHERE year(hire_date) = 2000;

 

Without the rule

 

Add the rewrite rule

 

With the rule

 

Other ideas ?

– Index hints : the optimizer don’t use the best index, so you can rewrite the query with USE | FORCE | IGNORE index

– Prevent SELECT with “infinite result, so you can rewrite the query adding LIMIT 1000 or whatever.

Like mysql client text –safe-update option

Sky is the limit 🙂

 

The Rewriter plugin does not replace proper code and server optimisation neither fine tuned query.

Nevertheless it can be very useful.

 

Give it a try, it definitely worth more than 30 minutes.

 

 

Going further

Post-parse query rewrite plugin

 

Pre-parse query rewrite plugin

 

Thank you for using MySQL!

0

30 mins with JSON in MySQL

November 17, 2015

Lire cet article en français

Note: This article is inspired by Morgan Tocker‘s talk MySQL 5.7 + JSON.

As you may know MySQL 5.7 is GA and has over than 150 new features. One of them is a Native JSON Data Type and JSON Functions: “Allows for efficient and flexible storage, search and manipulation of schema-less data. Enhancements include a new internal binary format, support for easy integration within SQL, and index management on the JSON Documents using generated columns”.

Sounds interesting! Let’s take half an hour to have a foretaste…

 

Get JSON documents

First let’s get data in JSON format. Mirco Zeiss provides a really big JSON file representing san francisco’s subdivision parcels (from SF Open Data) at https://github.com/zemirco/sf-city-lots-json

To use these data some tweaks are necessary:

Looks good!

Note: The size of JSON documents stored in JSON columns is limited to the value of the max_allowed_packet system variable. (While the server manipulates a JSON value internally in memory, it can be larger; the limit applies when the server stores it.).

 

Our JSON document will be stored in an InnoDB table: features

 

Another way to store JSON documents is to put them in a string (VARCHAR or TEXT).

Let’s see if there are some differences between JSON documents stored in a string or in a JSON column.

 

Note: TEXT type is not large enough to handle our JSON data. (ERROR 1406 (22001): Data too long for column ‘feature’ at row 17360). LONGTEXT will do the job.

 

Populate tables

Note: In order to have a better idea of query execution time on my old (and not so stable) laptop with a small MySQL config (e.g. Buffer pool = 128MB), I ran the queries many time using mysqlslap: mysqlslap -c1 -i <N> { Concurrency = 1 / Iteration > 20 (depending on the query duration) }

So most of the time I’ll show mysqlslap output e.g. “Minimum number of seconds to run all queries: 59.392 seconds

provides by mysqlslap instead of regular query output e.g. “Query OK, 206560 rows affected (59.39 sec)“.

 

Copy JSON data in features

Minimum number of seconds to run all queries: 59.392 seconds

 

Copy JSON data in features_TEXT

Minimum number of seconds to run all queries: 39.784 seconds

 

Loading 206560 records into my respective tables shows performance difference about 40% slower in JSON column compare to TEXT.

However, be aware that MySQL JSON data type provides:

  • Automatic validation of JSON documents stored in JSON columns. Meaning that invalid documents produce an error.
  • Optimized storage format. JSON documents stored in JSON columns are converted to an internal format that permits quick read access to document elements. When the server later must read a JSON value stored in this binary format, the value need not be parsed from a text representation. The binary format is structured to enable the server to look up subobjects or nested values directly by key or array index without reading all values before or after them in the document.

 

Nothing comparable with the TEXT data type, in other words these features have a cost, logic and fair!

 

Let’s have a look on tables metadata:

 

 

Interesting point here, LONGTEXT data type consume more space than the JSON (optimized storage format) data type, about 20% more.

 

Return data from a JSON document

MySQL 5.7 provides a bunch of JSON functions.

JSON_EXTRACT returns data from a JSON document. Furthermore since MySQL 5.7.9 you can use inlined JSON path expressions that simplifies queries that deal with JSON data and make them more human-readable:

e.g.

JSON_EXTRACT(col, “$.json_field”) is similar to col->”$.json_field”

 

So how about retrieve our JSON documents?

Table with JSON data type

Minimum number of seconds to run all queries: 4.470 seconds

 

Table with TEXT data type

Minimum number of seconds to run all queries: 29.365 seconds

 

Get these documents implies a full table scan (no surprise).

However we can see the power of the MySQL JSON internal format that permits quick read access to document elements.

In this example the query execution time is about 7 times faster with JSON data type compare to TEXT.

 

Generated column

JSON columns cannot be indexed. BUT you can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column. Generated columns can either be materialized (stored) or non-materialized (virtual).

Create a generated column is quite easy. And a VIRTUAL one is costless because column values are not stored, but are evaluated when rows are read, immediately after any BEFORE triggers.

A virtual column takes no storage. It’s usually what you’ll need in real life.

 

Last but not least it’s only about metadata change so adding a virtual column is fast (no table rebuild). It only requires a quick system table update that registers the new metadata.

 

 

Note: The disadvantage of such approach is that values are stored twice; once as the value of the generated column and once in the index.

 

New table descriptions are now:

 

 

Let’s have a look on table metadata:

 

Identical!

As expected data length is respectively the same.

 

Is there any cost difference between selecting a JSON documents from the virtual column and the JSON_EXTRACT function?

Minimum number of seconds to run all queries: 2.790 seconds

 

Minimum number of seconds to run all queries: 25.933 seconds

 

Obviously the QEP is the same: Full Table Scan (FTS).

Anyway 2 comments:

  • MySQL JSON internal format is still more efficient than TEXT data type, in this example query execution time is about 8 times faster with JSON.
  • In this example FTS on the virtual generated column (feature_type) is faster than the usage of json_extract function on the JSON document in the SELECT clause (from 4.470 to 2.790).

 

Create indexes on generated column

As of MySQL 5.7.8, InnoDB supports secondary indexes on virtual columns.

Adding or dropping a secondary index on a virtual column is an in-place operation.

 

 

New table descriptions are:

 

 

Let’s have another look on table metadata:

 

 

The index on feature_type column is materialized. Its size it’s approximately 6 MB.

 

Now because of this index, the query should be more efficient:

Minimum number of seconds to run all queries: 0.178 seconds

 

Minimum number of seconds to run all queries: 0.178 seconds

 

As expected the optimizer uses the index (feature_type) and the query execution time is much better in both cases (from 2.790 to 0.178 for JSON column).

 

Wrapup

MySQL 5.7 implements native JSON data type support and provides a set of function that allows to Create, Search, Modify JSON values and Return JSON attributes values as well. That’s good and I guess many developers will be happy to use this new feature.

Generated columns is also an interesting feature. It could be used among others to simulate functional indexes, as a materialized cache for often used expressions… or like we did to provide index management on the JSON documents.

 

Give it a try, it definitely worth more than 30 minutes.

 

Want to know more about MySQL 5.7?

 

Going further

MySQL Documentation

The JSON data type

https://dev.mysql.com/doc/refman/5.7/en/json.html

 

JSON Functions

https://dev.mysql.com/doc/refman/5.7/en/json-functions.html

 

CREATE TABLE and Generated Columns

http://dev.mysql.com/doc/refman/5.7/en/create-table.html#create-table-generated-columns

 

MySQL Server Blog

Native JSON Data Type and Binary Format

http://mysqlserverteam.com/json-labs-release-native-json-data-type-and-binary-format/

 

JSON functions

http://mysqlserverteam.com/json-labs-release-json-functions-part-1-manipulation-json-data/

http://mysqlserverteam.com/mysql-5-7-lab-release-json-functions-part-2-querying-json-data/

 

Inline JSON Path Expressions in MySQL 5.7

http://mysqlserverteam.com/inline-json-path-expressions-in-mysql-5-7/

 

Getting Started With MySQL & JSON on Windows

http://mysqlserverteam.com/getting-started-with-mysql-json-on-windows/

 

Effective Functional Indexes in InnoDB

http://mysqlserverteam.com/json-labs-release-effective-functional-indexes-in-innodb/

 

MySQL 5.7

What Is New in MySQL 5.7

https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html

 

Complete list of new features in MySQL 5.7

http://www.thecompletelistoffeatures.com/

 

0

Free ebooks on Packt Publishing website

February 28, 2015
Tags:

 

Packt Publishing started a Free Learning Campaign by providing a free ebook everyday till 6th March 2015.

How ? Follow this link: http://bit.ly/1DoRno5

0

Generate html with the mysql client

September 12, 2013

I’m a big fan of the MySQL command line tool ie the default text client modestly named: mysql. I use it everyday because you can do almost everything with it (DML, DDL, DCL, administrative tasks,…).

It has many options including generate xml or html output.

 

So you can easily script it and generate an HTML document:

 

 

It’s very simple and it’s valid HTML but it is not valid XHTML!

Why, in 2013, a tool generates HTML rather than XHTML ?

1

MySQL Tech Day @Paris on Oct 10, 2013

September 10, 2013

MySQL Tech Day @Paris


We’re happy to announce you that MySQL Tech Day will take place in Paris on Oct 10, 2013 in Oracle main office. It’ll be a full day event giving you an occasion to listen directly from Oracle developers about most of the improvements made recently within MySQL 5.6 and 5.7 development.

 

The agenda is the following :

  • Overview: MySQL Innovation @Oracle

  • MySQL Performance: Latest improvements in MySQL 5.6 & 5.7

  • MySQL Optimizer: What is new, what is coming

  • 50 tips to boost MySQL Performance

  • MySQL Performance Schema: Overview / HOWTO

  • MySQL Performance Schema: Hands-on (live lab)

 

Don’t miss such an opportunity to get most of the hot news about MySQL improvement directly from the source! It’ll be just two weeks after MySQL Connect, so there will be really a lot of news! So, book this date in your agenda right now and then don’t forget to register yourself for the event – attending is totally free, but places are limited, so registration is mandatory to secure enough seats.

 

To register : please, contact Alexis Bensa:

  • by email: alexis.bensa(@)oracle.com

  • or directly by phone: 01.57.60.27.96

 

Oracle main office address: Portes de La Defense 15, boulevard Charles de Gaulle.

 

Further details will be communicated over a time.

update (Oct 2): schedules

0

ALTER Optimization

August 14, 2012

One of my colleague asked me : « Is an ALTER TABLE with many specifications is faster than 1 ALTER TABLE by specification ? »

The answers seems quite intuitive, anyway figures are better than a long speech…

 

Context :

  • MySQL Community Server 5.5.22
  • InnoDB table with 8 539 238 rows.
  • 17 columns with INT, CHAR, VARCHAR, TEXT, DATETIME, … fields

 

 

Adding 3 indexes (with a single command)

 

 

Total: 146.43 sec

 

 

Deleting the index (with a single command)

 

 

Note: With MySQL 5.5, delete secondary indexes (not PK) is costless.

 

 

Adding 3 indexes (with 3 commands)

 

 

 

mysql5.5> ALTER TABLE lien ADD KEY IdxPaysLangueCreation (Pays,Langue,Creation);

Query OK, 0 rows affected (1 min 7.24 sec)

Records: 0 Duplicates: 0 Warnings: 0

 

Total: 151.16 sec (4 % slower)

 

Of course, more data to moved implies a wider gap between the 2 methods

 

 

 

Deleting the index (with 3 commands)

 

 

 

 

Total: 1.77 sec

 

QED

 

 

Divers

 

 

 

 

0