Already encoded URLs are double-encoded, creating broken redirects

Created on 25 January 2021, almost 4 years ago
Updated 22 May 2023, over 1 year ago

Problem/Motivation

When add a redirect using the form, the "to" URL is url encoded before being saved. This is problem if pasting in a URL that is already encoded, as it double encodes it, breaking the redirect.

Steps to reproduce

  1. Visit redirect form to add a new redirect
  2. For the "from" path, enter "test123"
  3. For the "to" path, enter "/sites/default/files/some%20file%20with%20spaces.txt"
  4. Save
  5. In redirect listing table, observe the redirect was saved as "/sites/default/files/some%2520file%2520with%2520spaces.txt" - it was URL encoded again. Visiting the redirect doesn't work.

This scenario is common when a user is creating a redirect by copying and pasting a URL that they want to redirect to on their site, and that URL has some encodings in it already to handle things like spaces.

Note that the Redirect module is not explicitly performing the URL encoding. Drupal's internal URL processing is. Redirect module uses a Link field internally to handle storage of the "to" URL. Internally, it stores the "to" URL as this: "internal:/sites/default/files/some%20file%20with%20spaces.txt". When the RedurectRequestSubscriber asks the redirect entity for the URL to redirect to, the Link field returns it as a Url object using Url::fromUri with argument "internal:/sites/default/files/some%20file%20with%20spaces.txt". Drupal url encodes this in UnroutedUrlAssembler's buildLocalUrl method.

Proposed resolution

Redirect module could run the user input through rawurldecode before saving it. It should assume that URLs entered may already be encoded. If the URL that's entered is NOT encoded, I don't think there is any harm in running it through a decode.

What's also interesting here is that the Url object doesn't return an encoded URL if it thinks it's an external URL. I think the logic behind this is that if a user entered an external URL, it's likely already encoded and shouldn't be re-encoded. But if it's an internal URL reference, it needs to be encoded.

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Needs work

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States bkosborne New Jersey, USA

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States dpagini

    This patch fixes a very narrow use case of a space being encoded to %20, but does not address all encoded characters. For example, a ( or ) get encoded to %28 and %29. That makes me think there should be a more global solution to handle this...?

  • Status changed to Needs work over 1 year ago
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Yes, maybe this should do a urldecode() like the hash issue, and should have a test.

    The code can also be simplifed to use the entity api properly, you almost never want to use getValue() and instead loop over the items.

    Plus, it doesn't really belong in save() but buildEntity(). Wondering if this shouldn't be handled in the core link widget, how does that behave in such a case?

  • πŸ‡²πŸ‡°Macedonia meri_atanasovska

    Patch #3 looks like a great solution, patch works perfectly.

  • πŸ‡ΊπŸ‡ΈUnited States chrisjlock

    Option to check additional hash instead of changing value saved to db.

  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    62 pass, 2 fail
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 9.5.x + Environment: PHP 8.0 & MySQL 5.7 (--ignore-platform-reqs)
    last update over 1 year ago
    63 pass
Production build 0.71.5 2024