MySQL collation complexities leads to backup incompatibilities

Created on 15 February 2022, almost 3 years ago
Updated 26 July 2024, 5 months ago

Problem/Motivation

When backing up a site in Ubuntu 20.04 and then importing it into Windows, a status message is seen "The file could not be imported."
However on the source system and other Ubuntu based systems it works perfectly.

Steps to reproduce

- Install Drupal 9 on Ubuntu 20.04 with LAMP

- Install backup_migrate

- Configure private files directory in settings.php

- Enable backup_migrate

- Take a backup and save to private files directory or make a download

- Keeping the file system of the site, install it in a clean database in XAMPP, Windows.

- When the site is ready enable the backup_migrate module and go to the module configuration to restore the database. Either way in the Restore or Saved Backup tabs.

- Attempt to restore

- Backup fails with on-screen message: "The file could not be imported." No error messages are visible in logs.

Proposed resolution

I tried to import the compressed backup file directly from phpMyAdmin and got the following error:
#1273 - Unknown Collation: 'utf8mb4_0900_ai_ci'.

Then I unzipped the file and edited the .mysql file replacing 'utf8mb4_0900_ai_ci' for 'utf8_general_ci'.
Update the compressed using 7zip.

I tried to import again from phpMyAdmin and got the following error:
#1253 - COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'utf8mb4'.

Then I unzipped the file and edited the .mysql file replacing 'utf8mb4' for 'utf8'.
Update the compressed using 7zip.

I tried to import again from phpMyAdmin and was successfully imported

However, when I visited the site nothing had changed.
Then I go to the module configuration to restore the database in the Restore tab uploading the modified file.

It took a while to run and the site returned a fatal PHP error. Then I went to update.php and the message "Restore complete" was displayed. I continued to run update.php and when it finished I happily got my site back without any errors.

It was quite cumbersome but it worked out in the end.

Remaining tasks

I would like to suggest to the developers to use this as a guideline to improve cross-platform compatibility and to add a Mysql collation incompatibility exception to display this kind of error in a more detailed way. Then it would be necessary some option to modify the collation to one compatible with the future destination of the site when making the backup. Thank you all for this great project and I hope it will continue to improve.

Translated with www.DeepL.com/Translator (free version)

πŸ› Bug report
Status

Closed: won't fix

Version

5.1

Component

Miscellaneous

Created by

πŸ‡¨πŸ‡ΊCuba dariemlazaro Havana

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¨πŸ‡¦Canada xmacinfo Canada

    Previously, utf8mb4_general_ci was the default collation. Because the utf8mb4_0900_ai_ci collation is now the default, new tables have the ability to store characters outside the Basic Multilingual Plane by default. Emojis can now be stored by default.

    β€” https://www.monolune.com/articles/what-is-the-utf8mb4_0900_ai_ci-collation

    I got hit by this issue a couple of times and now I learn that utf8mb4_0900_ai_ci is the new default.

    Backup and Migrate should not fail here and automatically convert default collation from utf8mb4_0900_ai_ci and utf8mb4_general_ci and vice versa. The implementation might not be simple. Automated conversion should be limited only to the former and new default MySQL collations.

    MySQL snippets that may help

    # For each database:
    ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
    # For each table:
    ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    # For each column:
    ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    # The above code is for 'VARCHAR' column. The exact statement depends on the column type, maximum length, and other properties.
    

    Ther is also the possibility of doing a database backup directly to another collation using the --default-character-set flag.

  • πŸ‡ΊπŸ‡ΈUnited States crutch

    Thank you for this solution. Just moved over to aws/rds using 8.0.35 last week. Found this issue when trying to restore to local dev to do some work. Had 379 tables using utf8mb4_0900_ai_ci. Followed the steps above and it restored successfully.

    Before going live, we restored from old hosting to aws with no issues. So does that mean it translates/converts from utf8mb4_general_ci to utf8mb4_0900_ai_ci with no issue, but not the opposite direction?

  • Status changed to Closed: won't fix 5 months ago
  • πŸ‡¨πŸ‡¦Canada xmacinfo Canada

    Being bold here. I am closing this ticket as there are two solutions.

    Solution 1: Make sure both MySQL server are using the same version. E.g. MySQL 8.x. There should not be any collation collision when running the same major version.

    Solution 2:: Use a workaround with search/replace in a text editor. See comment #7 above.

    Note that with Solution 2 you may need to unzip the file, search/replace the collection, save the file and gzip it for Backup and Migrate to process it correctly.

Production build 0.71.5 2024