I’ve been running into several problems with restoring MySQL backups. Namely, the backups come from an environment other than the one I’m working in and I’m forced to remove superuser commands contained in the backups.

The problem is when trying to remove those commands I’m constantly getting UTF-8 encoding errors because there are loads of invalid character sequences.

Why would MySQL encode a backup as UTF-8 if the data isn’t actually UTF-8? This feels like bad design to me.

  • Björn Tantau@swg-empire.de
    link
    fedilink
    arrow-up
    1
    ·
    4 days ago

    There is no such thing* as a UTF-8 file. It’s just text encoded in some way. It’s only a UTF-8 file if everything is encoded as UTF-8 which it’s evidently not.

    You can even tell MySQL to export perfectly valid UTF-8 text encoded as ISO 8859-1 to import into a UTF-8 table without any troubles (maybe apart from stuff that could not be encoded in ISO 8859-1).

    *Yes, technically there could be a BOM at the beginning but almost no tool uses that and most get confused by it. And it would still not force any data written to it to be UTF-8.

    • folekaule@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      3 days ago

      The Unicode standard allows, but recommends against, adding a BOM for utf8 files. Utf8 does not need them.

      I’ve only seen Microsoft tools adding that, and it breaks some parsers.

      Please don’t add BOM to utf8 files unless for some reason you need them.

    • undefined@lemmy.hogru.chOP
      link
      fedilink
      arrow-up
      1
      ·
      3 days ago

      Right, but if you’re telling the software to encode a file as UTF-8 maybe the software should actually encode it as UTF-8.