- Issue created by @danielkim7755
- π¬π§United Kingdom 2dareis2do
I have this issue as well. If using rss feed it is common to use guid. Guid is a string.
Here is an example where guid is used.
<item> <title>Roots: (Overseas) and Is Any Body Home? at Streatham Space Project - London Theatre 1</title> <link>https://news.google.com/rss/articles/CBMipAFBVV95cUxNTENNa3EtZ182WEJURjRpZmJXelc3YklsLXFIWEludWlqNVBYZFBUbF81UW0xRDNwVW1WYVJGNDFmV3NTSWlBeXgwcGJCcExRc1EwUHJYZTlTV0x6M2lOQjROa21PYjNKTlhQbnZlNkZKd3FJRG5LYnFaSEZYRFhsc3l2bHZwbk9UZFpqb1BkY3kxeDZnSXd1b0lkMkJoM2NWd1UzWQ?oc=5</link> <guid isPermaLink="false">CBMipAFBVV95cUxNTENNa3EtZ182WEJURjRpZmJXelc3YklsLXFIWEludWlqNVBYZFBUbF81UW0xRDNwVW1WYVJGNDFmV3NTSWlBeXgwcGJCcExRc1EwUHJYZTlTV0x6M2lOQjROa21PYjNKTlhQbnZlNkZKd3FJRG5LYnFaSEZYRFhsc3l2bHZwbk9UZFpqb1BkY3kxeDZnSXd1b0lkMkJoM2NWd1UzWQ</guid> <pubDate>Wed, 01 May 2024 07:00:00 GMT</pubDate> <description><a href="https://news.google.com/rss/articles/CBMipAFBVV95cUxNTENNa3EtZ182WEJURjRpZmJXelc3YklsLXFIWEludWlqNVBYZFBUbF81UW0xRDNwVW1WYVJGNDFmV3NTSWlBeXgwcGJCcExRc1EwUHJYZTlTV0x6M2lOQjROa21PYjNKTlhQbnZlNkZKd3FJRG5LYnFaSEZYRFhsc3l2bHZwbk9UZFpqb1BkY3kxeDZnSXd1b0lkMkJoM2NWd1UzWQ?oc=5" target="_blank">Roots: (Overseas) and Is Any Body Home? at Streatham Space Project</a>&nbsp;&nbsp;<font color="#6f6f6f">London Theatre 1</font></description> <source url="https://www.londontheatre1.com">London Theatre 1</source> </item>
From what I can see the html validation pattern only accepts d:d or d,d wher d is a digit. Also does not accept multiple ids divided by a space.
- π¬π§United Kingdom 2dareis2do
Also long ids are restricted to 255 chars or something. Many guid are longer.
Removing the pattern for the form does allow you to use non digital ids
- π¬π§United Kingdom 2dareis2do
Ref the reg expression, I believe it is like so:
^[0-9]+(:[0-9]+)?(,?[0-9]+(:[0-9]+)?)*$
I believe this can be rewritten like so to support all upper and lower case alphabet characters
^[0-9A-Za-z]+(:[0-9A-Za-z]+)?(,?[0-9A-Za-z]+(:[0-9A-Za-z]+)?)*$
- π¬π§United Kingdom 2dareis2do
Looking at the field, I can see the limit is set to 255 chars
$form['options']['idlist'] = [ '#type' => 'textfield', '#title' => $this->t('ID List'), '#maxlength' => 255, '#size' => 60, '#pattern' => '^[0-9]+(' . MigrateTools::DEFAULT_ID_LIST_DELIMITER . '[0-9]+)?(,?[0-9]+(' . MigrateTools::DEFAULT_ID_LIST_DELIMITER . '[0-9]+)?)*$', '#description' => $this->t('Comma-separated list of IDs to process.'), '#states' => [ 'enabled' => [ ':input[name="operation"]' => [['value' => 'import'], 'or', ['value' => 'rollback']], ], ], ];
For my purposes, i think it makes sense to bump this considerably. In fact it might even make more sense to change this to a text area so that a user can see what has been pasted.
- π¬π§United Kingdom 2dareis2do
Ok looks like #pattern is not supported by text area https://www.drupal.org/node/195303 β
- π¬π§United Kingdom 2dareis2do
Updating patch to also support underscores as part of string
- π¬π§United Kingdom 2dareis2do
key can also contain
-
e.g.
CBMi-AFBVV95cUxOR0FZOVowd1U2ZTA5X0diS0RqcjBCT2hqS1NNQ3Y4UkFlUF9IUHkzTTFGUFF6ZV9Ja3daTEV5UXlEaGZvNE5Hc2ZqVTAzVndiUG1CckUxYjBkcGgzdW9NM2J4cUV3dnZKUTE3N0VtUHNzTjlRZXRwZldIMGFKdlBVMWNHT2kzcmxuaWhWeXBiZEZHeG5QWlpCV0Nja2xManhfcEpOQWFlLWcxVVJGb2ZuN2wzcGZoRlNMQjlIRWZSNW12N2c0Yl8wV0lnbThtVGo1bWFQM1BRamwwVVZBSDFiN29EWlQ4TmRmOXFBblBOQzRLM2hHX213SNIB_gFBVV95cUxPSnZudGI2elVjVTRtQVRpNElkRjJPRFBqRFhVNHVEVGxObUlsZzVXRjNlSmZUdHlsbDl6cWg0OHh2T1dUMVNjWXJRNmtVNTJvT2ZpUml5bXc1eV9lUC15V1U1TUFFWUR3bTlEZTlwcGVqZkpUY2hObVNDYm5uSTlTTENMWTFfbE5CQi1ObVhVTlE4N0ppWFNnN3RxOEFCSjVpc2d0UkxXbmxkRG1vZ21mZ3lmd2Q1UkRVN0RoSHNfRUExaGlEMENxWUE0RVpFRXZrdGo5Y1hxcjZmZ3pGakJUWEhzc2VneTJWTnUxSk01RUtoa3B3ZzlRUWtIYVBaUQ
Updated reg expression
- π¬π§United Kingdom 2dareis2do
Looking at flickt api, they use the following syntax for guid
tag:flickr.com,2004:/photo/53927832048
Here is an example:
https://api.flickr.com/services/feeds/photos_public.gne?tags=streatham&f...
So a couple of things here:
- Use of
:
and,
in key. These are currently used as delimiters for entering multiple values - Use of
.
and/
in key. These characters are not currently recognised
Furthermore, as mentioned previously the use of a single line input tag is restrictive, especially when entering multiple value
I am thinking it might be better to use a multi line input (text area) This does not currently support #pattern. That said I am thinking we should accept virtually and value as a value as a key maybe use a newline as a way of demarcating multiple entries. That would also be an UX improvement.
- Use of
- π¬π§United Kingdom 2dareis2do
Ok if I remove the pattern html5 check I can see the flickr id will get split into an array of arrays e.g.
0 = array(2) 0 = "tag" 1 = "flickr.com" 1 = array(2) 0 = "2004" 1 = "/photo/53729127947"
So what seems to happen it will explode a string by
,
, and then each string in this case will get split by:
The code for this is in the main MigrateTools class e.g.
<?php declare(strict_types = 1); namespace Drupal\migrate_tools; /** * Utility functionality for use in migrate_tools. */ class MigrateTools { /** * Default ID list delimiter. */ public const DEFAULT_ID_LIST_DELIMITER = ':'; /** * Build the list of specific source IDs to import. * * @param array $options * The migration executable options. * * The ID list. */ public static function buildIdList(array $options): array { $options += [ 'idlist' => NULL, 'idlist-delimiter' => self::DEFAULT_ID_LIST_DELIMITER, ]; $id_list = []; if (is_scalar($options['idlist'])) { $id_list = explode(',', (string) $options['idlist']); array_walk($id_list, function (&$value) use ($options): void { $value = str_getcsv($value, $options['idlist-delimiter']); }); } return $id_list; } }
- π¬π§United Kingdom 2dareis2do
Modified patch to:
- Use text area to enter ids
- Allow the existing option as the default (use of , and :)
- Add option to disable the default and use one entry per line
See attached screenshot for example of how this looks.