I’ve been running into several problems with restoring MySQL backups. Namely, the backups come from an environment other than the one I’m working in and I’m forced to remove superuser commands contained in the backups.
The problem is when trying to remove those commands I’m constantly getting UTF-8 encoding errors because there are loads of invalid character sequences.
Why would MySQL encode a backup as UTF-8 if the data isn’t actually UTF-8? This feels like bad design to me.
Not only are there different character sets that seem like it’s Unicode, but the set in MySQL can change based on the session, the client, the server, the db , the table and the column. All six of them can have different encodings.
Just make sure all are using the same 4 byte Unicode. Different collation is ok when backing up because only important when comparing strings.