MySQL Recovery

February 23, 2021

If you are dealing with data, and you most probably are if you are reading this, one of your biggest fears would be not to be able to retrieve them. In a world where data actually surround us, it is critical to be able to retrieve them fast and with the best consistency.

Thus, it is always a good idea to have high availability settings in place to avoid loosing your data.

However, most of the times, we may wish or we may need to save the database and our data, and be a DBA-hero. Not an easy task, and it may be smoother to just perform a backup-restore. Sadly, this is not always the case.

So, this is what we will be facing in this article, we are going to see what to do when there is a data corruption in MySQL and the steps we need to perform to try saving our database.

Before any task or operation, it is always a good idea to copy the already existing data at an OS level:

First, let’s stop MySQL:

And then:

Before attempting to bring back up MySQL, it is a good idea to execute an innochecksum to see the status of your database or of a specific table.

A few words on innochecksum, in case you are not familiar with it:

innochecksum prints checksums for InnoDB files. This tool reads an InnoDB tablespace file, calculates the checksum for each page, compares the calculated checksum to the stored checksum, and reports mismatches, which indicate damaged pages. It was originally developed to speed up verifying the integrity of tablespace files after power outages but can also be used after file copies. Because checksum mismatches cause InnoDB to deliberately shut down a running server, it may be preferable to use this tool rather than waiting for an in-production server to encounter the damaged pages.

So, in order to check a table, you may use a simple command like:

The status will be logged on file /tmp/log.txt, if the file doesn’t exist innochecksum can do this for you. And, the output will be something like the following:

page::41354; log sequence number:first = ; second =
Page:: uncorrupted

Innochecksum offers a variety of options, the most useful is to be able to choose a starting or an ending page, to save time:

Please be aware that innochecksum can be executed only on a Database that is not running. Otherwise, you are going to receive a lock error:

Once you have checked the tables and discovered the problematic table or tables, it is time to attempt to startup the server and see what we can do. Very important, on my.cnf it is imperative to add the following:

log_error_verbosity: The log_error_verbosity system variable specifies the verbosity for handling events intended for the error log. I always prefer to have this set as it is very helpful to identify various issues and at the same time to be able to check the progress on the error_log file.

innodb_force_recovery: The crash recovery mode, typically only changed in serious troubleshooting situations. Possible values are from 0 to 6. If server comes up with a value of 3 and below then the data can be recovered, most probably. Always start with 1 and increase one by one, if the server is not coming up with your previous choice.

Now that we have these settings on our configuration, it is time to start-up the server and hope for the best:

If we are lucky and the server does manage to come up with a value among 1 to 3, it is time to take a full dump of the table(s) that are corrupted. If you are using 8.0.22 version and above, then you may take the dump via MySQL Shell:

Otherwise, you may use the old way:

Once the backup is completed successfully, it is time to drop the corrupted table:

And then restore it from the dump we have taken:

Once the restoration is done, we need to restart the MySQL server, but this time without the innodb_force_recovery parameter. Simply, edit your configuration file and mark it as a comment:

Hopefully, this will work and our database will be up and running with all its’ data intact! Unfortunately, this is not always the case and the database won’t be able to come up for us to retrieve our files. In situations like these, we need to restore from an existing backup or from a replica, if they are available.

To avoid facing serious trouble with your data, condider having in place one (or better yet all) of the following:

  • Full backups ( at a daily,weekly,monthly base)
  • Incremental backups
  • Binlog backups
  • In sync replicas
  • Innodb cluster

Additionally, in cases of a system crash, it is always helpful to set certain parameters to maintain data consistency. Let’s view the most basic ones:

  • innodb_flush_log_at_trx_commit = 1 :

Controls the balance between strict ACID compliance for commit operations and higher performance that is possible when commit-related I/O operations are rearranged and done in batches.
You can achieve better performance by changing the default value but then you can lose transactions in a crash.
The default setting of 1 is required for full ACID compliance. Logs are written and flushed to disk at each transaction commit.

  • sync_binlog = 1 :

Enables synchronization of the binary log to disk before transactions are committed. This is the safest setting but can have a negative impact on performance
due to the increased number of disk writes. In the event of a power failure or operating system crash, transactions that are missing from the binary log are only in a prepared state.
This permits the automatic recovery routine to roll back the transactions, which guarantees that no transaction is lost from the binary log.

  • innodb_doublewrite = 1 :

The doublewrite buffer is a storage area where InnoDB writes pages flushed from the buffer pool before writing the pages to their proper positions in the InnoDB data files.
If there is an operating system, storage subsystem, or unexpected mysqld process exit in the middle of a page write, InnoDB can find a good copy of the page from the doublewrite buffer
during crash recovery.

  • relay_log_recovery = 1 :

If enabled, this variable enables automatic relay log recovery immediately following server startup. The recovery process creates a new relay log file, initializes the SQL thread position
to this new relay log, and initializes the I/O thread to the SQL thread position. Reading of the relay log from the source then continues.

This global variable is read-only at runtime. Its value can be set with the –relay-log-recovery option at replica server startup, which should be used following an unexpected halt of
a replica to ensure that no possibly corrupted relay logs are processed, and must be used in order to guarantee a crash-safe replica.

  • relay_log_purge = 1 :

Disabling purging of relay logs when enabling the –relay-log-recovery option risks data consistency and is therefore not crash-safe.

  • innodb_directories :

Another parameter that can assist with corrupted and crash-recovery tablespaces is innodb_directories. Tablespace discovery during crash recovery relies on the innodb_directories setting to identify tablespaces referenced in the redo logs. Innodb_directories can be defined on the start-up and they can be used when moving your tablespaces while the server is offline.

On my.cnf:

At this point, it is wise to mention that the recovery time is also depended on the innodb_log_file_size.

Generally, the combined size of the log files should be large enough that the server can smooth out peaks and troughs in workload activity, which often means that there is enough redo log space to handle more than an hour of write activity. The larger the value, the less checkpoint flush activity is required in the buffer pool, saving disk I/O.

Larger log files also make crash recovery slower.

Conclusion

MySQL does offer a stability, however it is really hard to avoid a corruption or to not face having your server crashing at some point. If safety measurements are in place, take a deep breath and dive in to save your data.

When you crash and burn, you have to pick yourself up and go on and hope to make up for it.

Burt Reynolds

References

Leave a Reply