Problem/Motivation
If a page has a URL like /blog?tag=security&page=4, the canonical URL will be rendered as /blog?tag=security&page=4.
This will cause problems with SEO and crawlers as they depend on canonical links to be valid. It also makes it impossible to meet Google's recommendations on how to indicate paginated content properly. This is why this issue is initially marked Major.
To reproduce this:
- Create a SimplyTest.me project using the MetaTags module
- Add the metatags field to the basic page content type
- Create a basic page and use the Metatag advanced setting to set the canonical URL to /blog?tag=security&page=4
- Open in Chrome and view the page source (Note: Inspect will hide the & value).
- The canonical href will have be encoded.
Proposed resolution
The underlying problem is in the HtmlTag code. It creates the tag markup using the Attribute class's __toString method. This method escapes all attribute labels and values.
The fix here is to handle link tag href attributes and script tag src attributes as special cases in the HtmlTag code.
The HREF and SRC attributes should not be added to the Attributes class. Instead, their values should be validated with UrlHelper::isValid(). If they fail validation, then they should be fully escaped. These attributes should be added to the tag directly. All other attributes should go thru the Attribute class.
Remaining tasks
- Write the patch
- Write some tests
User interface changes
None
API changes
None
Data model changes
None