UrlHelper::isValid() should handle hyphens, '-', correctly

Created on 19 April 2015, over 10 years ago
Updated 29 May 2025, about 2 months ago

This issue has been reported privately to the security team but it was decided to handle this as public security improvement since no direct vulnerability is involved. This issue was reported by mvonfrie โ†’ .

Problem/Motivation

valid_url() report http://- as valid URL:
$valid = valid_url('http://-', true);

Urls look like this after the scheme (protocol) according to RFC 3986, section 3.2:

authority   = [ userinfo "@" ] host [ ":" port ]
host        = IP-literal / IPv4address / reg-name

reg-name should be

Important quote from below research:

The labels must follow the rules for ARPANET host names. They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.

UrlHelper::isValid() should reject a hostname that starts, ends, or is only a hyphen.

Proposed resolution

Improve the regex in UrlHelper::isValid() (valid_url() in Drupal 7).

Remaining tasks


Review
Commit

User interface changes

none.

API changes

none.

Original comments:

#1 benjy commented April 2, 2015 at 6:35am
Component:		ยป Code
This just seems like a bug to me at this point and could easily be handled in public.
Comment #2 Pere Orga commented April 2, 2015 at 3:43pm
I have tried a few browsers and all think URL http://- is valid.

http://-: http://i.imgur.com/zjaPlzv.png
http://localhost: http://i.imgur.com/UzXV4Xe.png
Comment #3 benjy commented April 3, 2015 at 3:57am
From: https://www.ietf.org/rfc/rfc3986.txt

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-"). Although schemes are case-
insensitive, the canonical form is lowercase and documents that
specify schemes must do so with lowercase letters. An implementation
should accept uppercase letters as equivalent to lowercase in scheme
names (e.g., allow "HTTP" as well as "http") for the sake of
robustness but should only produce lowercase scheme names for
consistency.

scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
So the example with the hyphen is valid because technically it could be part of the scheme?
Comment #4 benjy commented April 3, 2015 at 4:09am
OK, more reading, my first comment wasn't the reason.

So as this issue reports, the format after the scheme is: host        = IP-literal / IPv4address / reg-name

And reg-name is made up of: reg-name      = *( unreserved / pct-encoded / sub-delims )

And unreserved is: unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

Which explains the valid url when starting with a hyphen.
Comment #5 mvonfrie commented April 4, 2015 at 4:07pm
Yes, but exactly before the definition of

reg-name    = *( unreserved / pct-encoded / sub-delims )

the RFC says

Such a name consists of a sequence of domain labels separated by ".",
each domain label starting and ending with an alphanumeric character
and possibly also containing "-" characters.
This is not covered by the formal syntax specification. That means every host name can contain hyphens, but neither start nor end with them or just be a hyphen:

The labels must follow the rules for ARPANET host names. They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.

https://tools.ietf.org/html/rfc1035, 2.3.1
So in my understanding http://- is not a valid url as - is not a valid hostname.
Comment #6 Pere Orga commented April 4, 2015 at 7:36pm
In any case I also think this could be discussed in public. It doesn't look like any harm could be done.
Comment #7 benjy commented April 6, 2015 at 6:11am
+1 for public from me.
๐Ÿ› Bug report
Status

Needs work

Version

11.0 ๐Ÿ”ฅ

Component

other

Created by

pere orga Catalonia

Live updates comments and jobs are added and updated live.
  • Security improvements

    It makes Drupal less vulnerable to abuse or misuse. Note, this is the preferred tag, though the Security tag has a large body of issues tagged to it. Do NOT publicly disclose security vulnerabilities; contact the security team instead. Anyone (whether security team or not) can apply this tag to security improvements that do not directly present a vulnerability e.g. hardening an API to add filtering to reduce a common mistake in contributed modules.

  • Needs issue summary update

    Issue summaries save everyone time if they are kept up-to-date. See Update issue summary task instructions.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • First commit to issue fork.
Production build 0.71.5 2024