MySQL Functional Indexes
Since MySQL 5.7 one can put indexes on expressions, aka functional indexes, using generated columns. Basically you first need to use the generated column to define the functional expression, then indexed this column.
Quite useful when dealing with JSON functions, you can find an example here and the documentation there.
Starting with MySQL 8.0.13 we have now an easiest way to create functional indexes (or functional key parts as mentioned in the documentation) \o/
Let’s see how with a quick practical example.
Below salaries table structure:
mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
Table: salaries
Create Table: CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`),
CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0,00 sec)
It contains some data
mysql> SELECT count(*) FROM salaries;
+----------+
| count(*) |
+----------+
| 2844047 |
+----------+
mysql> SELECT * FROM salaries LIMIT 3;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 10001 | 60117 | 1986-06-26 | 1987-06-26 |
| 10001 | 62102 | 1987-06-26 | 1988-06-25 |
| 10001 | 66074 | 1988-06-25 | 1989-06-25 |
+--------+--------+------------+------------+
Let’s focus on the following query:
SELECT * FROM salaries WHERE YEAR(to_date)=1985
mysql> SELECT * FROM salaries WHERE YEAR(to_date)=1985;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 14688 | 42041 | 1985-07-06 | 1985-08-08 |
...snip...
| 498699 | 40000 | 1985-09-25 | 1985-09-28 |
+--------+--------+------------+------------+
89 rows in set (0,80 sec)
mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: salaries
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2838426
filtered: 100.00
Extra: Using where
We have a full table scan ( type: ALL
), meaning no index is used. Perhaps because there is no index on column to_date… 😉
So let’s add an index on to_date !
mysql> ALTER TABLE salaries ADD INDEX idx_to_date (to_date);
Query OK, 0 rows affected (17,13 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
Table: salaries
Create Table: CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`),
KEY `idx_to_date` (`to_date`),
CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
And run again the query with the hope of a better execution plan
mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: salaries
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2838426
filtered: 100.00
Extra: Using where
Ouch! Still have a full table scan !
The index can’t be used because of the use of a function (YEAR()) on the indexed column (to_date).
BTW if you’re really surprise, maybe you should read this. 😉
This is the case when you need a functional index!
mysql> ALTER TABLE salaries ADD INDEX idx_year_to_date((YEAR(to_date)));
Query OK, 0 rows affected (20,04 sec)
Records: 0 Duplicates: 0 Warnings: 0
The syntax is very similar of the creation of a “regular” index. Although you must be aware of the double parentheses: (( <expression> ))
We can now see our new index named idx_year_to_date and the indexed expression year(to_date) :
mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
Table: salaries
Create Table: CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`),
KEY `idx_to_date` (`to_date`),
KEY `idx_year_to_date` ((year(`to_date`))),
CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
mysql> SELECT INDEX_NAME, EXPRESSION
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA='employees'
AND TABLE_NAME = "salaries"
AND INDEX_NAME='idx_year_to_date';
+------------------+-----------------+
| INDEX_NAME | EXPRESSION |
+------------------+-----------------+
| idx_year_to_date | year(`to_date`) |
+------------------+-----------------+
Let’s test our query again
mysql> explain SELECT * FROM salaries WHERE YEAR(to_date)=1985\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: salaries
partitions: NULL
type: ref
possible_keys: idx_year_to_date
key: idx_year_to_date
key_len: 5
ref: const
rows: 89
filtered: 100.00
Extra: NULL
mysql> SELECT * FROM salaries WHERE YEAR(to_date)=1985;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 14688 | 42041 | 1985-07-06 | 1985-08-08 |
...snip...
| 498699 | 40000 | 1985-09-25 | 1985-09-28 |
+--------+--------+------------+------------+
89 rows in set (0,00 sec)
Here we go!
Now the query is able to use the index. And in this case we have a positive impact on the execution time.
It is also interesting to note that it is possible to use idx_to_date, the first index created (the non functional one) if we can rewrite the original query:
mysql> EXPLAIN SELECT *
FROM salaries
WHERE to_date BETWEEN '1985-01-01' AND '1985-12-31'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: salaries
partitions: NULL
type: range
possible_keys: idx_to_date
key: idx_to_date
key_len: 3
ref: NULL
rows: 89
filtered: 100.00
Extra: Using index condition
mysql> SELECT *
FROM salaries
WHERE to_date BETWEEN '1985-01-01' AND '1985-12-31'
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 20869 | 40000 | 1985-02-17 | 1985-03-01 |
...snip...
| 45012 | 66889 | 1985-08-16 | 1985-12-31 |
+--------+--------+------------+------------+
89 rows in set (0,00 sec)
This saves an index, I mean less indexes to maintain for the engine. Also speaking of maintenance cost, the cost to maintain a functional index is higher than the cost of a regular one.
In the other side the execution plan is less good (query cost higher) and obviously you must rewrite the query.
Requirements and restrictions.
A primary key cannot be a functional index:
mysql> CREATE TABLE t1 (i INT, PRIMARY KEY ((ABS(i))));
ERROR 3756 (HY000): The primary key cannot be a functional index
You can not index non-deterministic functions (RAND(), UNIX_TIMESTAMP(), NOW()…)
mysql> CREATE TABLE t1 (i int, KEY ((RAND(i))));
ERROR 3758 (HY000): Expression of functional index 'functional_index' contains a disallowed function.
SPATIAL and FULLTEXT indexes cannot have functional key parts.
Conclusion
Functional index is an interesting and a relevant feature, it could be very useful to optimize your queries without rewrite them and especially when dealing with JSON documents and other complex types.
Obviously all the details you must know are in the MySQL documentation: Functional Key Parts
If you interested in the high level architecture and the low level design please read the workload.
Thanks for using MySQL!
Cloud Solutions Architect at Oracle
MySQL Geek, author, blogger and speaker
I’m an insatiable hunger of learning.
—–
Blog: www.dasini.net/blog/en/
Twitter: https://twitter.com/freshdaz
SlideShare: www.slideshare.net/freshdaz
Youtube: https://www.youtube.com/channel/UC12TulyJsJZHoCmby3Nm3WQ
—–
[…] MySQL Functional Indexes […]
It would be nice if MariaDB will implements similar solution
Well MariaDB is a forked of MySQL 5.1.
First available source code of 5.1 was in 2005…
Remember what was the #1 movie in the box office in 2005? : http://dasini.net/blog/2017/07/19/mysql-first-public-releases/
IMHO that does not make much sense to hope that these 2 products will behave in the same way nor have the same features.
Anyway MySQL 8 is Great! 🙂
Thanks for your comment.
Olivier
Hi,
thanks for your post.
I would like to understand why Function-based index is not used in my case.
Thanks and best regards.
Michel Ramirez
select count(*) from TSK_PLANNING_PLN => 4957
select count(*) from TSK_PLANNING_PLN where PLN_END_DATE IS NULL => 1660
Index => ALTER TABLE TSK_PLANNING_PLN ADD INDEX IND_MRO_PLN_END_DATE_COALESCE((COALESCE(`PLN_END_DATE`,STR_TO_DATE(‘9999/01/01′,’%Y/%m/%d’))));
Request :
EXPLAIN SELECT *
FROM
TSK_PLANNING_PLN pln USE INDEX (IND_MRO_PLN_END_DATE_COALESCE)
WHERE
COALESCE(PLN_END_DATE,STR_TO_DATE(‘9999/01/01′,’%Y/%m/%d’)) >= STR_TO_DATE(‘2020/12/21′,’%Y/%m/%d’)
AND pln.pln_start_date 1505 lines.
Sorry i forgot to show you, explain plan
id|select_type|table|partitions|type|possible_keys |key|key_len|ref|rows|filtered|Extra |
–|———–|—–|———-|—-|—————————–|—|——-|—|—-|——–|———–|
1|SIMPLE |pln | |ALL |IND_MRO_PLN_END_DATE_COALESCE| | | |5032| 11.26|Using where|
Best regard.
Hello,
well there are many reason why the optimizer would not use an index.
One of them is because the optimizer think it is more efficient to not use it (usually because of the column cardinality).
If you try your query with FORCE INDEX instead of USE INDEX what is the EXPLAIN result ?
If you think the statistics are not accurate you can also run ANALYZE TABLE
Cheers,
Olivier