Prevent search engines to index /media/oembed URLs

Created on 8 July 2022, over 2 years ago
Updated 3 August 2023, over 1 year ago

Problem/Motivation

I noticed /media/oembed URLs are being crawled by search engines (and a tool I use for accessibility checks).
Aren't these URLs supposed to be ignored? Perhaps using a noindex meta in them and/or a nofollow in \Drupal\media\Plugin\Field\FieldFormatter\OEmbedFormatter ?

Steps to reproduce

Render any oembed resource using the "oEmbed content" formatter, notice there is no nofollow.
Visit any /media/oembed URL and notice there is no noindex meta
So these pages will be indexed.

Proposed resolution

A simple noindex meta in those pages should be enough.

Remaining tasks

/

User interface changes

/

API changes

/

Data model changes

/

Release notes snippet

/

✨ Feature request
Status

Active

Version

9.5

Component
MediaΒ  β†’

Last updated about 11 hours ago

Created by

πŸ‡§πŸ‡ͺBelgium herved

Live updates comments and jobs are added and updated live.
  • Needs change record

    A change record needs to be drafted before an issue is committed. Note: Change records used to be called change notifications.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Search engines (specifically Google) will still show the pages in the index, but because it can longer crawl because of the change in #3271222 it won't even know that it should not add the pages to the index. I think this ticket still holds value, especially given that the idea was to keep the oembed nodes *out of the index*.

  • Status changed to Active over 1 year ago
  • πŸ‡§πŸ‡ͺBelgium herved

    Interesting, I'm no SEO expert but it looks like you're right.
    If I understand correctly, Disallow via robots.txt doesn't prevent the page to be indexed, only the noindex meta does.
    And if the "noindex, nofollow" meta is present, I believe there is no point in disallowing in robots.txt in this case.

    Which then means that #3271222's implementation could be replaced by the one in this issue?
    I'm reopening then, just to be sure.
    Any opinions @phenaproxima, @alexpott

    Thanks

Production build 0.71.5 2024