Separate out js redirect option

Created on 22 January 2024, 5 months ago

With this plugin we have the ability to follow a redirect

e.g.

process:
  'body/value':
   -
     plugin: migrate_process_html
     source: link
     jsredirect: false // optional defaults to true
   -
     plugin: dom
     method: import
   -
     plugin: dom_select
     selector: //meta[@property="og:image"]/@content
   -
     plugin: skip_on_empty
     method: row
     message: 'Field image is missing'
   -
     plugin: extract
     index:
       - 0
   -
     plugin: skip_on_condition
     method: row
     condition:
       plugin: not:matches
       regex: /^(https?:\/\/)[\w\d]/i
     message: 'We only want a string if it starts with http(s)://[\w\d]'
   -
     plugin: file_remote_url

In particular:

     plugin: migrate_process_html
     source: link
     jsredirect: false // optional defaults to true
   

This functionality would have value being separated out. One reason for this is it may be desirable to use this value for any future lookups.

Background

Many rss feeds now redirect to google:

e.g.

https://news.google.com/rss/articles/CBMidGh0dHBzOi8vd3d3LnN0YW5kYXJkLmN...

which at the time of writing redirects to:

https://www.standard.co.uk/homesandproperty/where-to-live/london-leavers...

However after some time, google may remove this initial link and the resource will no longer be available

Furthermore, any referrals to the end site inevitably being routed via google. This is not really desirable as for one it obfuscates the true source. i.e. your website.

Another thing is, users using those links are always routed via google with additional cookie compliance step etc

My feeling is this could be separated out and used a separate plugin here perhaps something like migrate_process_link.

So the above migration config could be changed to look like this:

process:
'body/value':
-
plugin: migrate_process_js_link
source: link
-
plugin: migrate_process_html
-
plugin: dom
method: import
-
plugin: dom_select
selector: //meta[@property="og:image"]/@content
-
plugin: skip_on_empty
method: row
message: 'Field image is missing'
-
plugin: extract
index:
- 0
-
plugin: skip_on_condition
method: row
condition:
plugin: not:matches
regex: /^(https?:\/\/)[\w\d]/i
message: 'We only want a string if it starts with http(s)://[\w\d]'
-
plugin: file_remote_url

also in a different context:

process:
  title: title
  'field_feed_item_description/format':
    plugin: default_value
    default_value: full_html
  'field_feed_item_description/value': summary
  'field_web_link/uri': link
  'field_web_link/title': title

Could be replaced by

process:
  title: title
  'field_feed_item_description/format':
    plugin: default_value
    default_value: full_html
  'field_feed_item_description/value': summary
  'field_web_link/uri':
   -
     plugin: migrate_process_js_link
     source: link
  'field_web_link/title': title
✨ Feature request
Status

Fixed

Version

1.0

Component

Code

Created by

πŸ‡¬πŸ‡§United Kingdom 2dareis2do

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.69.0 2024