I am dealing with a module (CMIS) that is trying to insert content into the database with the wrong charset. I was getting the following two errors:
Warning: htmlspecialchars(): Invalid multibyte sequence in argument in check_plain() (line 1545 of /var/www/drupal/includes/bootstrap.inc).
PDOException: in field_sql_storage_field_storage_write() (line 448 of /var/www/drupal/modules/field/modules/field_sql_storage/field_sql_storage.module).
However, the exception handling code in errors.inc combined with check_plain() means that the error message is completely lost in the PDOException. Note that the exception above has no message or trace.
That is, in includes/errors.inc#_drupal_decode_exception(), if $exception->args() contains invalid characters such as from file content, the displayed exception in the logs is simply:
PDOException: in field_sql_storage_field_storage_write()
If I remove print_r($exception->args, TRUE) from the $message, then the exception becomes (infinitely more useful):
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xD0\xCF\x11\xE0\xA1\xB1...' for column 'body_value' at row 1: INSERT INTO {field_data_body} (entity_type, entity_id, revision_id, bundle, delta, language, body_value, body_summary, body_format) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8); in field_sql_storage_field_storage_write() (line 448 of C:\hubnet\drupal\modules\field\modules\field_sql_storage\field_sql_storage.module).
This is definitely due to the if ($exception instanceof PDOException) block, because if I remove it, the exception becomes (still useful):
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xD0\xCF\x11\xE0\xA1\xB1...' for column 'body_value' at row 1 in PDOStatement->execute() (line 2139 of C:\hubnet\drupal\includes\database\database.inc).
I'm not sure what the best solution is, since in most cases you'd want to keep the arguments. Is there a way to check that a given string is valid w.r.t. a given charset before passing it to check_plain() - perhaps with drupal_validate_utf8()?
This seems to almost be a duplicate of "Error messages not generated in UTF8 in some situations" - but this is occurring on two different platforms (Linux, Windows) with MySQL.