Document that source IDs can only contain alphanumeric characters or underscores

Created on 4 August 2023, over 1 year ago
Updated 5 September 2024, 5 months ago

Problem/Motivation

CSV files can have column names with spaces. This produces unintended effects when using the default ID map plugin: \Drupal\migrate\Plugin\migrate\id_map\Sql

On PHP 8.1, you get the following warning when rolling back a migration that has a column name with a space.
[warning] Undefined array key "Some Header with Space" Sql.php:232

That happens in \Drupal\migrate\Plugin\migrate\id_map\Sql::getSourceIdsHash

This is because the column is used an alias for the field to be added in the SELECT statement. And aliases are spaced in `\Drupal\Core\Database\Connection::escapeAlias`. Per the method description: “Escapes an alias name string. Force all alias names to be strictly alphanumeric-plus-underscore”.

Steps to reproduce

  1. In a site running PHP 8.1 or higher, create a CSV migration that has a column name with a space used as ID. For example: "Project ID".
  2. Import the migration.
  3. Rollback the migration.

Proposed resolution

Document that column names have to be alphanumeric characters or underscores.

Remaining tasks

  • Create and review patch.
  • Evaluate if we need to mention that this happens when the ID map uses or extends \Drupal\migrate\Plugin\migrate\id_map\Sql.
  • Shall we file a similar issue in https://www.drupal.org/project/migrate_plus since this could also happen in JSON files which that module supports?
  • Shall we create an issue in Drupal core to document this as part of `\Drupal\migrate\Plugin\MigrateSourceInterface::getIds`?

User interface changes

None.

API changes

None.

Data model changes

None.

Notes

This issue was discussed in slack https://drupal.slack.com/archives/C226VLXBP/p1689078305424729

For reference, below is a comment with some of the research into the issue.

  1. In \Drupal\migrate\Plugin\migrate\id_map\Sql::deleteDestination there is a call to \Drupal\migrate\Plugin\migrate\id_map\Sql::lookupSourceId
  2. In there, Each source key is added to the query via $query->addField('map', $id_map_field_name, $source_field_name);
  3. Later in the same method, there is a $query->execute() which eventually reaches \Drupal\Core\Database\Query\Select::__toString
  4. When the query is assembling the list of (query) fields to be added in the SELECT statement via the foreach ($this->fields as $field) loop, there is a call to \Drupal\Core\Database\Connection::escapeAlias
  5. And in there, the alias is escaped using preg_replace('/[^A-Za-z0-9_]+/', '', $field). Per the method description: “Escapes an alias name string. Force all alias names to be strictly alphanumeric-plus-underscore”. In there Some Value gets transformed to SomeValue
  6. Back in deleteDestination , there is a call to $this->getSourceIdsHash($source_id_values)
  7. In \Drupal\migrate\Plugin\migrate\id_map\Sql::getSourceIdsHash, the $source_id_values argument is keyed by the escaped field alias. But the call to $this->sourceIdFields() returns the unescaped raw source key.
  8. So, in $source_id_values[$field_name] , the raw Some Value does not exists as a key. By then, the key has changed to SomeValue. And this is what triggers the error.
  9. And this probably has other unintended consequences. Also in getSourceIdsHash , there is a call to hash which receives $source_id_values as an argument. At this point, I stopped investigating.
📌 Task
Status

Fixed

Version

3.0

Component

Documentation

Created by

🇳🇮Nicaragua dinarcon

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @dinarcon
  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 7.4 & MySQL 8
    last update over 1 year ago
    14 pass
  • 🇳🇮Nicaragua dinarcon

    Attaching documentation patch for review. Shall we also mention this in the README.md file?

  • 🇺🇸United States mlncn Minneapolis, MN, USA

    Very impressive investigation and documentation! Would it be possible to throw a clear error when encountering a space… since someone (probably me) won't read the documentation and still be confused when the "inexplicable" error happens?

  • Status changed to Fixed over 1 year ago
  • heddn Nicaragua

    Added some validation and tests and landed this. Thanks for the documentation and heads up that this is a problem.

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed 11 months ago
  • 🇫🇷France b2f

    We should have address this problem to the root, we don't need to remove such important existing functionalities.

    I patched lookupSourceId Sql.php method in Core so that source keys spaces (or any other special characters for that matter?) are not removed anymore:

    https://www.drupal.org/project/drupal/issues/3422603 🐛 Fixing source IDs with spaces in Sql.php Fixed

  • 🇺🇸United States caesius

    This update should be reverted per the above comment and issue 💬 Revert requirement that IDs not include spaces Needs work

    I'll also note that per the issue title and description, the resolution should have been to "Document that column names have to be alphanumeric characters or underscores" and yet the actual implementation prevents the use of spaces in column names entirely. This breaks backward compatibility with migrations that were created before this change was implemented and does not document an upgrade path.

Production build 0.71.5 2024