With this plugin we have the ability to follow a redirect
e.g.
process:
'body/value':
-
plugin: migrate_process_html
source: link
jsredirect: false // optional defaults to true
-
plugin: dom
method: import
-
plugin: dom_select
selector: //meta[@property="og:image"]/@content
-
plugin: skip_on_empty
method: row
message: 'Field image is missing'
-
plugin: extract
index:
- 0
-
plugin: skip_on_condition
method: row
condition:
plugin: not:matches
regex: /^(https?:\/\/)[\w\d]/i
message: 'We only want a string if it starts with http(s)://[\w\d]'
-
plugin: file_remote_url
In particular:
plugin: migrate_process_html
source: link
jsredirect: false // optional defaults to true
This functionality would have value being separated out. One reason for this is it may be desirable to use this value for any future lookups.
Background
Many rss feeds now redirect to google:
e.g.
https://news.google.com/rss/articles/CBMidGh0dHBzOi8vd3d3LnN0YW5kYXJkLmN...
which at the time of writing redirects to:
https://www.standard.co.uk/homesandproperty/where-to-live/london-leavers...
However after some time, google may remove this initial link and the resource will no longer be available
Furthermore, any referrals to the end site inevitably being routed via google. This is not really desirable as for one it obfuscates the true source. i.e. your website.
Another thing is, users using those links are always routed via google with additional cookie compliance step etc
My feeling is this could be separated out and used a separate plugin here perhaps something like migrate_process_link.
So the above migration config could be changed to look like this:
process:
'body/value':
-
plugin: migrate_process_js_link
source: link
-
plugin: migrate_process_html
-
plugin: dom
method: import
-
plugin: dom_select
selector: //meta[@property="og:image"]/@content
-
plugin: skip_on_empty
method: row
message: 'Field image is missing'
-
plugin: extract
index:
- 0
-
plugin: skip_on_condition
method: row
condition:
plugin: not:matches
regex: /^(https?:\/\/)[\w\d]/i
message: 'We only want a string if it starts with http(s)://[\w\d]'
-
plugin: file_remote_url
also in a different context:
process:
title: title
'field_feed_item_description/format':
plugin: default_value
default_value: full_html
'field_feed_item_description/value': summary
'field_web_link/uri': link
'field_web_link/title': title
Could be replaced by
process:
title: title
'field_feed_item_description/format':
plugin: default_value
default_value: full_html
'field_feed_item_description/value': summary
'field_web_link/uri':
-
plugin: migrate_process_js_link
source: link
'field_web_link/title': title