The XSS filter should allow more HTML entities

Created on 10 September 2019, about 5 years ago
Updated 14 February 2023, almost 2 years ago

Problem/Motivation

Xss:filter does not recognize HTML entities starting with &#x1 as well-formed.
Example: 𝔷 𝑙 𝑥 🂡
For more entities see https://www.compart.com/en/unicode/html

// Decimal numeric entities.
$string = preg_replace('/&#([0-9]+;)/', '&#\1', $string);
// Hexadecimal numeric entities.
$string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);

For example '🂡' (decimal) and '🂡' (hexadecimal) are the ace of spades symbols. Both should be valid.
The current regular expressions match '🂡' but does not match '🂡'.

Proposed resolution

Update Regex to accept also &#x1 entities.

Remaining tasks

Review for security consequences.

User interface changes

None.

API changes

None.

Data model changes

None.

πŸ› Bug report
Status

Needs work

Version

10.1 ✨

Component
BaseΒ  β†’

Last updated about 5 hours ago

Created by

πŸ‡¨πŸ‡ΏCzech Republic martin_klima

Live updates comments and jobs are added and updated live.
  • Needs framework manager review

    It is used to alert the framework manager core committer(s) that an issue significantly impacts (or has the potential to impact) multiple subsystems or represents a significant change or addition in architecture or public APIs, and their signoff is needed (see the governance policy draft for more information). If an issue significantly impacts only one subsystem, use Needs subsystem maintainer review instead, and make sure the issue component is set to the correct subsystem.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Tagging for framework manager and security review for their thoughts on this.

  • Status changed to Needs work almost 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States greggles Denver, Colorado, USA

    Moving to needs work - in the issue summary and comments I don't see any research about what characters are or aren't safe from the perspective of causing XSS. Ideally there is a definitive source we can reference to help us figure that out. The resources I know that had that kind of information are now offline so I'm not totally sure where to go in 2023.

Production build 0.71.5 2024