30 mins with JSON in MySQL
Note: This article is inspired by Morgan Tocker‘s talk MySQL 5.7 + JSON.
Note 2: You may also be interested by 30 mins with MySQL JSON functions
Note 3: Handling JSON documents could be also done with MySQL Document Store.
As you may know MySQL 5.7 is GA and has over than 150 new features. One of them is a Native JSON Data Type and JSON Functions: “Allows for efficient and flexible storage, search and manipulation of schema-less data. Enhancements include a new internal binary format, support for easy integration within SQL, and index management on the JSON Documents using generated columns”.
Sounds interesting! Let’s take half an hour to have a foretaste…
Get JSON documents
First let’s get data in JSON format. Mirco Zeiss provides a really big JSON file representing san francisco’s subdivision parcels (from SF Open Data) at https://github.com/zemirco/sf-city-lots-json
To use these data some tweaks are necessary:
$ grep "^{ \"type" citylots.json > properties.json $ head -n1 properties.json { "type":"Feature", "properties":{ "MAPBLKLOT":"0001001", "BLKLOT":"0001001", "BLOCK_NUM":"0001", "LOT_NUM":"001", "FROM_ST":"0", "TO_ST":"0", "STREET":"UNKNOWN", "ST_TYPE":null, "ODD_EVEN":"E" }, "geometry":{ "type":"Polygon", "coordinates":[ [ [ -122.422003528252475, 37.808480096967251, 0.0 ], [ -122.422076013325281, 37.808835019815085, 0.0 ], [ -122.421102174348633, 37.808803534992904, 0.0 ], [ -122.421062569067274, 37.808601056818148, 0.0 ], [ -122.422003528252475, 37.808480096967251, 0.0 ] ] ] } }
Looks good!
Note: The size of JSON documents stored in JSON columns is limited to the value of the max_allowed_packet system variable. (While the server manipulates a JSON value internally in memory, it can be larger; the limit applies when the server stores it.).
Our JSON document will be stored in an InnoDB table: features
CREATE TABLE features ( id int(11) NOT NULL AUTO_INCREMENT, feature json NOT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB;
Another way to store JSON documents is to put them in a string (VARCHAR or TEXT).
Let’s see if there are some differences between JSON documents stored in a string or in a JSON column.
CREATE TABLE features_TEXT ( id int(11) NOT NULL AUTO_INCREMENT, feature longtext NOT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB;
Note: TEXT type is not large enough to handle our JSON data. (ERROR 1406 (22001): Data too long for column ‘feature’ at row 17360). LONGTEXT will do the job.
Populate tables
Note: In order to have a better idea of query execution time on my old (and not so stable) laptop with a small MySQL config (e.g. Buffer pool = 128MB), I ran the queries many time using mysqlslap: mysqlslap -c1 -i <N> { Concurrency = 1 / Iteration > 20 (depending on the query duration) }
So most of the time I’ll show mysqlslap output e.g. “Minimum number of seconds to run all queries: 59.392 seconds”
provides by mysqlslap instead of regular query output e.g. “Query OK, 206560 rows affected (59.39 sec)“.
Copy JSON data in features
LOAD DATA INFILE 'properties.json' INTO TABLE features (feature);
Minimum number of seconds to run all queries: 59.392 seconds
Copy JSON data in features_TEXT
LOAD DATA INFILE 'properties.json' INTO TABLE features_TEXT (feature);
Minimum number of seconds to run all queries: 39.784 seconds
Loading 206560 records into my respective tables shows performance difference about 40% slower in JSON column compare to TEXT.
However, be aware that MySQL JSON data type provides:
- Automatic validation of JSON documents stored in JSON columns. Meaning that invalid documents produce an error.
- Optimized storage format. JSON documents stored in JSON columns are converted to an internal format that permits quick read access to document elements. When the server later must read a JSON value stored in this binary format, the value need not be parsed from a text representation. The binary format is structured to enable the server to look up subobjects or nested values directly by key or array index without reading all values before or after them in the document.
Nothing comparable with the TEXT data type, in other words these features have a cost, logic and fair!
Let’s have a look on tables metadata:
SHOW TABLE STATUS LIKE 'features'\G *************************** 1. row *************************** Name: features Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 184218 Avg_row_length: 1250 Data_length: 230326272 #220 MB Index_length: 0 Data_free: 3145728
SHOW TABLE STATUS LIKE 'features_TEXT'\G *************************** 1. row *************************** Name: features_TEXT Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 188784 Avg_row_length: 1370 Data_length: 258654208 #247 MB Index_length: 0 Data_free: 4194304
Interesting point here, LONGTEXT data type consume more space than the JSON (optimized storage format) data type, about 20% more.
Return data from a JSON document
MySQL 5.7 provides a bunch of JSON functions.
JSON_EXTRACT returns data from a JSON document. Furthermore since MySQL 5.7.9 you can use inlined JSON path expressions that simplifies queries that deal with JSON data and make them more human-readable:
e.g.
JSON_EXTRACT(col, “$.json_field”) is similar to col->”$.json_field”
So how about retrieve our JSON documents?
Table with JSON data type
SELECT DISTINCT feature->"$.type" AS json_extract FROM features\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: features type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 182309 filtered: 100.00 Extra: Using temporary
Minimum number of seconds to run all queries: 4.470 seconds
Table with TEXT data type
SELECT DISTINCT feature->"$.type" AS json_extract FROM features_TEXT\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: features_TEXT type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 177803 filtered: 100.00 Extra: Using temporary
Minimum number of seconds to run all queries: 29.365 seconds
Get these documents implies a full table scan (no surprise).
However we can see the power of the MySQL JSON internal format that permits quick read access to document elements.
In this example the query execution time is about 7 times faster with JSON data type compare to TEXT.
Generated column
JSON columns cannot be indexed. BUT you can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column. Generated columns can either be materialized (stored) or non-materialized (virtual).
Create a generated column is quite easy. And a VIRTUAL one is costless because column values are not stored, but are evaluated when rows are read, immediately after any BEFORE triggers.
A virtual column takes no storage. It’s usually what you’ll need in real life.
Last but not least it’s only about metadata change so adding a virtual column is fast (no table rebuild). It only requires a quick system table update that registers the new metadata.
ALTER TABLE features ADD feature_type VARCHAR(30) AS (feature->"$.type") VIRTUAL; Query OK, 0 rows affected (0.08 sec) Records: 0 Duplicates: 0 Warnings: 0
ALTER TABLE features_TEXT ADD feature_type VARCHAR(30) AS (feature->"$.type") VIRTUAL; Query OK, 0 rows affected (0.10 sec) Records: 0 Duplicates: 0 Warnings: 0
Note: The disadvantage of such approach is that values are stored twice; once as the value of the generated column and once in the index.
New table descriptions are now:
CREATE TABLE features ( id int(11) NOT NULL AUTO_INCREMENT, feature json NOT NULL, feature_type varchar(30) GENERATED ALWAYS AS (feature->"$.type") VIRTUAL, PRIMARY KEY (id) ) ENGINE=InnoDB
CREATE TABLE features_TEXT ( id int(11) NOT NULL AUTO_INCREMENT, feature longtext NOT NULL, feature_type varchar(30) GENERATED ALWAYS AS (feature->"$.type") VIRTUAL, PRIMARY KEY (id) ) ENGINE=InnoDB
Let’s have a look on table metadata:
ANALYZE TABLE features, features_TEXT; SHOW TABLE STATUS LIKE 'features'\G *************************** 1. row *************************** Name: features Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 184218 Avg_row_length: 1250 Data_length: 230326272 #220 MB Index_length: 0 Data_free: 314572
SHOW TABLE STATUS LIKE 'features_TEXT'\G *************************** 1. row *************************** Name: features_TEXT Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 188784 Avg_row_length: 1370 Data_length: 258654208 #247 MB Index_length: 0 Data_free: 4194304
Identical!
As expected data length is respectively the same.
Is there any cost difference between selecting a JSON documents from the virtual column and the JSON_EXTRACT function?
SELECT DISTINCT feature_type FROM features\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: features type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 195195 filtered: 100.00 Extra: Using temporary
Minimum number of seconds to run all queries: 2.790 seconds
SELECT DISTINCT feature_type FROM features_TEXT\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: features_TEXT type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 171004 filtered: 100.00 Extra: Using temporary
Minimum number of seconds to run all queries: 25.933 seconds
Obviously the QEP is the same: Full Table Scan (FTS).
Anyway 2 comments:
- MySQL JSON internal format is still more efficient than TEXT data type, in this example query execution time is about 8 times faster with JSON.
- In this example FTS on the virtual generated column (feature_type) is faster than the usage of json_extract function on the JSON document in the SELECT clause (from 4.470 to 2.790).
Create indexes on generated column
As of MySQL 5.7.8, InnoDB supports secondary indexes on virtual columns.
Adding or dropping a secondary index on a virtual column is an in-place operation.
ALTER TABLE features ADD INDEX (feature_type); Query OK, 0 rows affected (5.04 sec) Records: 0 Duplicates: 0 Warnings: 0
ALTER TABLE features_TEXT ADD INDEX (feature_type); Query OK, 0 rows affected (27.89 sec) Records: 0 Duplicates: 0 Warnings: 0
New table descriptions are:
CREATE TABLE features ( id` int(11) NOT NULL AUTO_INCREMENT, feature` json NOT NULL, feature_type varchar(30) GENERATED ALWAYS AS (feature->"$.type") VIRTUAL, PRIMARY KEY (id), KEY feature_type (feature_type) ) ENGINE=InnoDB
CREATE TABLE features_TEXT ( id int(11) NOT NULL AUTO_INCREMENT, feature longtext NOT NULL, feature_type varchar(30) GENERATED ALWAYS AS (feature->"$.type") VIRTUAL, PRIMARY KEY (id), KEY feature_type (feature_type) ) ENGINE=InnoDB
Let’s have another look on table metadata:
ANALYZE TABLE features, features_TEXT; SHOW TABLE STATUS LIKE 'features'\G *************************** 1. row *************************** Name: features Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 180400 Avg_row_length: 1276 Data_length: 230326272 #220 MB Index_length: 5783552 #6 MB Data_free: 5242880
SHOW TABLE STATUS LIKE 'features_TEXT'\G *************************** 1. row *************************** Name: features_TEXT Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 192445 Avg_row_length: 1344 Data_length: 258654208 #247 MB Index_length: 5783552 #6 MB Data_free: 2097152
The index on feature_type column is materialized. Its size it’s approximately 6 MB.
Now because of this index, the query should be more efficient:
SELECT DISTINCT feature_type FROM features\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: features type: index possible_keys: feature_type key: feature_type key_len: 33 ref: NULL rows: 193763 filtered: 100.00 Extra: Using index
Minimum number of seconds to run all queries: 0.178 seconds
SELECT DISTINCT feature_type FROM features_TEXT\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: features_TEXT type: index possible_keys: feature_type key: feature_type key_len: 33 ref: NULL rows: 208134 filtered: 100.00 Extra: Using index
Minimum number of seconds to run all queries: 0.178 seconds
As expected the optimizer uses the index (feature_type) and the query execution time is much better in both cases (from 2.790 to 0.178 for JSON column).
Wrapup
MySQL 5.7 implements native JSON data type support and provides a set of function that allows to Create, Search, Modify JSON values and Return JSON attributes values as well. That’s good and I guess many developers will be happy to use this new feature.
Generated columns is also an interesting feature. It could be used among others to simulate functional indexes, as a materialized cache for often used expressions… or like we did to provide index management on the JSON documents.
Give it a try, it definitely worth more than 30 minutes.
Want to know more about MySQL 5.7?
- December 8th, @Paris join the Oracle MySQL Tech Tour Paris
- Other location: Oracle MySQL Tech Tour
Going further
MySQL Documentation
The JSON data type
https://dev.mysql.com/doc/refman/5.7/en/json.html
JSON Functions
https://dev.mysql.com/doc/refman/5.7/en/json-functions.html
CREATE TABLE and Generated Columns
http://dev.mysql.com/doc/refman/5.7/en/create-table.html#create-table-generated-columns
MySQL Server Blog
Native JSON Data Type and Binary Format
http://mysqlserverteam.com/json-labs-release-native-json-data-type-and-binary-format/
JSON functions
http://mysqlserverteam.com/json-labs-release-json-functions-part-1-manipulation-json-data/
http://mysqlserverteam.com/mysql-5-7-lab-release-json-functions-part-2-querying-json-data/
https://mysqlserverteam.com/new-json-functions-in-mysql-5-7-22/
Inline JSON Path Expressions in MySQL 5.7
http://mysqlserverteam.com/inline-json-path-expressions-in-mysql-5-7/
Getting Started With MySQL & JSON on Windows
http://mysqlserverteam.com/getting-started-with-mysql-json-on-windows/
Effective Functional Indexes in InnoDB
http://mysqlserverteam.com/json-labs-release-effective-functional-indexes-in-innodb/
MySQL 5.7
What Is New in MySQL 5.7
https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html
Complete list of new features in MySQL 5.7
http://www.thecompletelistoffeatures.com/
Thanks for using MySQL!
Cloud Solutions Architect at Oracle
MySQL Geek, author, blogger and speaker
I’m an insatiable hunger of learning.
—–
Blog: www.dasini.net/blog/en/
Twitter: https://twitter.com/freshdaz
SlideShare: www.slideshare.net/freshdaz
Youtube: https://www.youtube.com/channel/UC12TulyJsJZHoCmby3Nm3WQ
—–
[…] Read this post in English […]
[…] Note: You may also be interested by 30 mins with JSON in MySQL […]
Merci beaucoups pour les info 🙂 !
[…] 30 mins with JSON in MySQL […]
How to import GeoJson file in mysql so as to apply spatial queries on it?