Simple XML broken with UTF-16LE

Created on 1 May 2019, over 5 years ago
Updated 27 July 2023, over 1 year ago

All my migrations previously worked with XML files encoded in UTF-16LE but were suddenly broken after upgrading to Migrate Plus 4.2.

Drupal\migrate\MigrateException: Fatal Error 73: expected '>'
Line: 542
Column: 20
File:  in Drupal\migrate_plus\Plugin\migrate_plus\data_parser\SimpleXml->openSourceUrl() (line 51 of modules/contrib/migrate_plus/src/Plugin/migrate_plus/data_parser/SimpleXml.php).

It turns out that the issue #3046753 Make XML parser more resilient introduced a call with trim() before simplexml_load_string()

protected function openSourceUrl($url) {
    // Clear XML error buffer. Other Drupal code that executed during the
    // migration may have polluted the error buffer and could create false
    // positives in our error check below. We are only concerned with errors
    // that occur from attempting to load the XML string into an object here.
    libxml_clear_errors();

    $xml_data = $this->getDataFetcherPlugin()->getResponseContent($url);
    $xml = simplexml_load_string(trim($xml_data));
    foreach (libxml_get_errors() as $error) {
      $error_string = self::parseLibXmlError($error);
      throw new MigrateException($error_string);
    }
    $this->registerNamespaces($xml);
    $xpath = $this->configuration['item_selector'];
    $this->matches = $xml->xpath($xpath);
    return TRUE;
  }

The function trim() is not safe when working with multibyte encoded string, whereas SimpleXML can perfectly handle multibyte data. I don't think it necessary to call trim() before simplexml_load_string. If your XML has an empty line before the openning tag, your XML is not well-formed and required special treatment. Adding trim() to the generic parser will prevent it from working properly with Unicode data.

🐛 Bug report
Status

Needs review

Version

6.0

Component

Plugins

Created by

🇦🇺Australia sonnykt Melbourne, Australia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024