I have a drupal 10.1 site that uses the migration module to import rss feed items and also look up a og:image tag from a feeds source url.
This setup also requires remote_steam_wrapper to import a remote file entity.
When importing images using the DOM process plugin I have the following conditions:
field_media_image:
-
plugin: migrate_process_html
source: link
-
plugin: dom
method: import
# log_messages: false
-
plugin: dom_select
selector: //meta[@property="og:image"]/@content
-
plugin: skip_on_empty
method: process
message: 'Meta Field og:image is missing'
-
plugin: extract
index:
- 0
-
plugin: skip_on_condition
method: row
condition:
plugin: not:matches
regex: /^(https?:\/\/)/i
message: 'We only want a string if it starts with http(s)://'
However some remote images start https:////. These urls pass as valid and are imported successfully. However when drupal tries to render these files say using a teaser or default display, they result in a the following exception:
The website encountered an unexpected error. Please try again later.
InvalidArgumentException: The URI 'https:////m.files.bbci.co.uk/modules/bbc-morph-sport-seo-meta/1.23.3/images/bbc-sport-logo.png' is malformed. in Drupal\Core\Url::fromUri() (line 286 of core/lib/Drupal/Core/Url.php).
Drupal\Core\File\FileUrlGenerator->generate() (Line: 246)
Drupal\image\Plugin\Field\FieldFormatter\ImageFormatter->viewElements() (Line: 89)
Drupal\Core\Field\FormatterBase->view() (Line: 76)
Drupal\Core\Field\Plugin\Field\FieldFormatter\EntityReferenceFormatterBase->view() (Line: 265)
Drupal\Core\Entity\Entity\EntityViewDisplay->buildMultiple() (Line: 339)
Drupal\Core\Entity\EntityViewBuilder->buildComponents() (Line: 281)
Drupal\Core\Entity\EntityViewBuilder->buildMultiple() (Line: 238)
Drupal\Core\Entity\EntityViewBuilder->build()
call_user_func_array() (Line: 111)
Drupal\Core\Render\Renderer->doTrustedCallback() (Line: 788)
Drupal\Core\Render\Renderer->doCallback() (Line: 377)
Drupal\Core\Render\Renderer->doRender() (Line: 204)
Drupal\Core\Render\Renderer->render() (Line: 474)
Drupal\Core\Template\TwigExtension->escapeFilter() (Line: 124)
__TwigTemplate_9389c3ff9b0808d2f7f2ed1006f046b0->doDisplay() (Line: 394)
Twig\Template->displayWithErrorHandling() (Line: 367)
Twig\Template->display() (Line: 379)
Twig\Template->render() (Line: 40)
Twig\TemplateWrapper->render() (Line: 53)
twig_render_template() (Line: 372)
Drupal\Core\Theme\ThemeManager->render() (Line: 436)
Drupal\Core\Render\Renderer->doRender() (Line: 449)
Drupal\Core\Render\Renderer->doRender() (Line: 204)
Drupal\Core\Render\Renderer->render() (Line: 474)
Drupal\Core\Template\TwigExtension->escapeFilter() (Line: 107)
__TwigTemplate_398f8481f4b8ac91d449f82143cb4dab->doDisplay() (Line: 394)
Twig\Template->displayWithErrorHandling() (Line: 367)
Twig\Template->display() (Line: 379)
Twig\Template->render() (Line: 40)
Twig\TemplateWrapper->render() (Line: 53)
twig_render_template() (Line: 372)
Drupal\Core\Theme\ThemeManager->render() (Line: 436)
Drupal\Core\Render\Renderer->doRender() (Line: 204)
Drupal\Core\Render\Renderer->render() (Line: 238)
Drupal\Core\Render\MainContent\HtmlRenderer->Drupal\Core\Render\MainContent\{closure}() (Line: 583)
Drupal\Core\Render\Renderer->executeInRenderContext() (Line: 239)
Drupal\Core\Render\MainContent\HtmlRenderer->prepare() (Line: 128)
Drupal\Core\Render\MainContent\HtmlRenderer->renderResponse() (Line: 90)
Drupal\Core\EventSubscriber\MainContentViewSubscriber->onViewRenderArray()
call_user_func() (Line: 111)
Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher->dispatch() (Line: 171)
Symfony\Component\HttpKernel\HttpKernel->handleRaw() (Line: 74)
Symfony\Component\HttpKernel\HttpKernel->handle() (Line: 58)
Drupal\Core\StackMiddleware\Session->handle() (Line: 48)
Drupal\Core\StackMiddleware\KernelPreHandle->handle() (Line: 106)
Drupal\page_cache\StackMiddleware\PageCache->pass() (Line: 85)
Drupal\page_cache\StackMiddleware\PageCache->handle() (Line: 48)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle() (Line: 51)
Drupal\Core\StackMiddleware\NegotiationMiddleware->handle() (Line: 51)
Drupal\Core\StackMiddleware\StackedHttpKernel->handle() (Line: 704)
Drupal\Core\DrupalKernel->handle() (Line: 19)
Url also renders in browser without issue. I have attached screenshot of this. If I select to open this up in a new tab it seems to remove a // from the url. Opening as shown seems to result in it redirecting to: https://localhost//m.files.bbci.co.uk/... with can;t connect to server message.
https://// also seems to redirect to localhost for me locally with file not found.
I have added browser screenshot that seems to show asset being rendered without issue dom edit screen.
More images attached showing setup
OK, I think the issue stems from a need to improve on my regular expression to check if the url is malformed or not e.g.
-
plugin: skip_on_condition
method: row
condition:
plugin: not:matches
regex: /^(https?:\/\/)/i
message: 'We only want a string if it starts with http(s)://'
In fact there may be a use case for a process plugin to use FILTER_VALIDATE_URL which does seem to work as expected here.