Problem/Motivation
In Drupal a content entity is allowed to have a string field as ID. The string is not necessarily expected to only contain ASCII characters, it is allowed to contain UTF-8. Edit, see comment #7: Actually, the default setting is non-ascii, the developer needs to explicitly set the setting in order to limit it to ascii characters.
However, in case a content entity has string ID, the Field API enforces the entity_id
column to be of varchar_ascii
type, which is realized in mysql as a VARCHAR with ascii encoding.
Therefore, any field added on a such entity type, will be throwing a DatabaseException each time an entity with non-ascii characters in its ID is saved, if any field with dedicated table receives a value. If, on the other hand, the entity type does not have any fields with dedicated tables (or, if the specific entity that is saved does not have values to any of these fields), the entity will be saved without problems.
Similar problem can be found on the ..._target_id
column for entity reference fields.
According to the Database API:
"A special 'varchar_ascii' type is also available for limiting machine name field to US ASCII characters."
According to this statement, the usage of varchar_ascii for the field columns is wrong. I cannot think of a reason why an entity ID must follow this same rule with machine names.
Steps to reproduce
- Create a custom module, and define a custom content entity type by following any relevant tutorial, such as
https://www.drupal.org/docs/drupal-apis/entity-api/creating-a-content-en... β
- In your entity type's class that extends
ContentEntityBase
, when implementing the
public static function baseFieldDefinitions
which declares the base fields, override the definition of your id-field as
$fields["id"] = BaseFieldDefinition::create('string')->setSetting('is_ascii', FALSE) ...
- Install the module and check your entity type's table in the database: You will see that the column for the ID is a VARCHAR with encoding UTF-8 (utf8mb4)
- Add a field of any type to your entity type, for example a string one
- Check your field's table in the database: You will see that the
entity_id
column< is a VARCHAR(128) with ascii encoding.
- Try to create an entity with a non-ascii character in its ID, without setting any value on the field you created above: The entity is saved without problems.
- Try to either create another entity with a non-ascii character in its ID, or just update the previous one, but this time set any value to the other field: You will get a DatabaseException
Problem: The entity type's table is allowed to have a row with non-ascii characters in the ID column, but the field's table is not allowed to have the same ID in its entity_id
column.
The problem seems to by caused in \Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::getDedicatedTableSchema
, in this code snippet:
$id_definition = $this->fieldStorageDefinitions[$entity_type->getKey('id')];
if ($id_definition->getType() == 'integer') {
$id_schema = [
'type' => 'int',
'unsigned' => TRUE,
'not null' => TRUE,
'description' => 'The entity id this data is attached to',
];
}
else {
$id_schema = [
'type' => 'varchar_ascii',
'length' => 128,
'not null' => TRUE,
'description' => 'The entity id this data is attached to',
];
}
Here the condition is only checking about the ID field being an integer or not. In the second case, it assumes by default that the length of the ID is 128 and that it will only contain ascii characters.
Proposed resolution
Use varchar instead of varchar_ascii.
Remaining tasks
User interface changes
None, as far as I can tell.
API changes
TBD
Data model changes
TBD
Release notes snippet
TBD