Disallow crawling paths under /node by default in robots.txt

Created on 26 January 2025, about 24 hours ago

Problem/Motivation

  1. The vast majority of sites use Pathauto, as seen by installs January 2025:

    Drupal core: 723,408
    Pathauto:    514,780

    From https://www.drupal.org/project/usage

  2. Getting paths such as /node/100 indexed instead of the human readable URL alias /my-alias is bad for SEO ...

Therefore, it makes sense to disallow all paths under /node from getting crawled by default.

There may be reasons why a site wants to allow paths under /node to get crawled, but they are the minority, and can edit robots.txt to allow this with https://www.drupal.org/project/robotstxt .

Steps to reproduce

See in search engines that paths such as /node/100 are getting indexed, instead of the intended human readable URL alias such as /my-alias, harming SEO.

Proposed resolution

Disallow all paths under /node from getting crawled by default.

Remaining tasks

Update the robots.txt file

User interface changes

none

API changes

none

Data model changes

none

Release notes snippet

TBD

📌 Task
Status

Active

Version

11.0 🔥

Component

base system

Created by

🇩🇰Denmark ressa Copenhagen

Live updates comments and jobs are added and updated live.
  • Needs backport to D7

    After being applied to the 8.x branch, it should be considered for backport to the 7.x branch. Note: This tag should generally remain even after the backport has been written, approved, and committed.

Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024