Html::cleanCssIdentifier Strips Valid Escaped Characters

Created on 15 October 2017, over 7 years ago
Updated 3 May 2023, almost 2 years ago

Problem/Motivation

Escaped characters in CSS classes are valid. Several popular CSS/Sass libraries use escaped characters (particularly ones which use the BEM methodology) to denote special uses or characteristics of a CSS class.

For instance, the inuitcss CSS library uses the @ character in CSS classes for identifying responsive breakpoint classes and the forward slash / character for fractional grid column classes.

Example CSS with Escaped Characters

.u-1\/1 {
  // CSS here.
}

.u-1\/2\/@tablet {
  // CSS here.
}

.u-1\/3\/@desktop {
  // CSS here.
}

Example Markup Using CSS classes with escaped characters

<div class="o-layout__item u-1/1 u-1/2@tablet u-1/3@desktop">

The Html::cleanCssIdentifier function strips escaped characters however, making Drupal incompatible with CSS which uses them.

Drupal Example of Problem

In a twig template, using the clean_class filter (which uses Html::cleanCssIdentifier), such as:

{% set classes = [
  'o-layout__item',
  ('u-1/1'|clean_class),
  ('u-1/2@tablet'|clean_class),
  ('u-1/3@desktop'|clean_class),
  ('u-1/3@desktop'|clean_class)
  ]
%}
<div{{ attributes.addClass(classes) }}>I'm a div with grid classes</div>

produces the following markup:

<div class="o-layout__item u-1-1 u-1-2tablet u-1-3desktop">I'm a div with grid classes</div>

when the markup should instead be:

<div class="o-layout__item u-1/1 u-1/2@tablet u-1/3@desktop">I'm a div with grid classes</div>

Proposed resolution

Update the regex used in the current Html::cleanCssIdentifier

    // Valid characters in a CSS identifier are:
    // - the hyphen (U+002D)
    // - a-z (U+0030 - U+0039)
    // - A-Z (U+0041 - U+005A)
    // - the underscore (U+005F)
    // - 0-9 (U+0061 - U+007A)
    // - ISO 10646 characters U+00A1 and higher
    // We strip out any character not in the above list.
    $identifier = preg_replace('/[^\x{002D}\x{0030}-\x{0039}\x{0041}-\x{005A}\x{005F}\x{0061}-\x{007A}\x{00A1}-\x{FFFF}]/u', '', $identifier);
    // Identifiers cannot start with a digit, two hyphens, or a hyphen followed by a digit.
    $identifier = preg_replace([
    '/^[0-9]/',
    '/^(-[0-9])|^(--)/'
    ], ['_', '__'], $identifier);
    return $identifier;

to allow for escaped characters.

πŸ› Bug report
Status

Needs work

Version

10.0 ✨

Component
BaseΒ  β†’

Last updated about 17 hours ago

Created by

πŸ‡ΊπŸ‡ΈUnited States pcate

Live updates comments and jobs are added and updated live.
  • CSS

    It involves the content or handling of Cascading Style Sheets.

  • Needs frontend framework manager review

    Used to alert the fron-tend framework manager core committer(s) that a front-end focused issue significantly impacts (or has the potential to impact) multiple subsystems or represents a significant change or addition in architecture or public APIs, and their signoff is needed (see the governance policy for more information). If an issue significantly impacts only one subsystem, use Needs subsystem maintainer review instead, and make sure the issue component is set to the correct subsystem.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States scotwith1t Birmingham, AL

    Incredibly important patch for anyone using the USWDS framework and many others. Thanks @heddn and others! Not sure what else is needed to get this to land, other than passing tests, but hope to see this one in a release soon! :)

  • last update almost 2 years ago
    28,325 pass, 16 fail
  • last update almost 2 years ago
    28,302 pass, 18 fail
  • πŸ‡ΊπŸ‡ΈUnited States scotwith1t Birmingham, AL

    Just an additional note that this has a side effect of changing how clean_class twig filter works. For example, we had a block which had:

    {% set classes = [
      'block',
      'block-' ~ configuration.provider|clean_class,
      'block-' ~ plugin_id|clean_class,
    ] %}
    

    Previously, the plugin_id had its : stripped out and the css was written and applied to that class. After this post, the class name changed because clean_class no longer stripped that out. Not a huge deal, but something I hadn't anticipated and had to do a (very minor) css update to address. I would assume there could be other unexpected side effects for other places/modules that use the cleanCssIdentifier method of this class.

  • πŸ‡§πŸ‡¬Bulgaria nikolabintev

    I am uploading a patch that can be applied to the latest stable Drupal 10 version: 10.2.4

  • πŸ‡³πŸ‡ΏNew Zealand gareth.poole

    This patch seems to work with 10.3

  • πŸ‡ΊπŸ‡ΈUnited States liberatr Portland, OR

    The patch in #46 removes the comment that mentions period:
    // - the period (U+002E)

    It is still present in regex, just not in comment.

  • πŸ‡ΊπŸ‡ΈUnited States sea2709 Texas

    IMHO, I agree that we should allow more characters in CSS classes like Tailwind CSS classes with colon, brackets, etc.
    I'm concerned when we makes changes for the function cleanCssIdentifier, since this function is used to clean the classes generating by Drupal.

    For example, I place a field block in layout builder, that field block has a class like block-field-block:paragraph:grid:field-column if we allow colon in cleanCssIdentifier

    I wonder if generated classes should have colons, and it might impact an existing site which has many elements being styled based on the generated classes. We should think twice before applying changes for cleanCssIdentifier.

  • πŸ‡ΊπŸ‡ΈUnited States Chris Burge

    @sea2709 - That's exactly the issue. We can't change Html::cleanCssIdentifier without causing CSS bugs.

    #13

    I'm wondering if the best way to solve this issue is to use a configuration flag/setting similar to the $conf['allow_css_double_underscores'] setting in D7 that allowed BEM double underscore CSS selectors.

    I think the behavior should be configurable. Maybe a $setting to define additional characters?

  • πŸ‡§πŸ‡ͺBelgium yazzbe

    > I think the behavior should be configurable. Maybe a $setting to define additional characters?

    That is a great idea!

    Just like in the block_class module where it is also configurable to allow special characters in classes.

  • πŸ‡ΊπŸ‡ΈUnited States liberatr Portland, OR

    Rerolled the 10.3 patch from #46 to include the missing comment.

  • πŸ‡ΊπŸ‡ΈUnited States ibbwebguy

    I believe the order of unicode characters is wrong and will result in unwanted characters being valid. The correct order should be:

    $identifier = preg_replace('/[^\x{002D}\x{002E}\x{002F}\x{0030}-\x{0039}\x{003A}\x{0040}\x{0041}-\x{005A}\x{005F}\x{0061}-\x{007A}\x{00A1}-\x{FFFF}]/u', '', $identifier);

    Alternately, the syntax could be shortened to the following, but then the alphanumeric ranges would not be specifically listed.
    $identifier = preg_replace('/[^\x{002D}-\x{003A}\x{0040}-\x{005A}\x{005F}\x{0061}-\x{007A}\x{00A1}-\x{FFFF}]/u', '', $identifier);

  • πŸ‡ΊπŸ‡ΈUnited States ibbwebguy

    Rerolled the 10.3 patch from #51 to include the correct unicode regex.

Production build 0.71.5 2024