Improve StringItem::generateSampleValue()

Created on 31 October 2019, about 5 years ago
Updated 5 April 2023, over 1 year ago

Problem/Motivation

Sample entity value generation generates long strings for text fields. This can cause the layout to break, as noted in #3016507: [PP-1] Break long text strings in layout edit

Proposed resolution

Rather than use \Drupal\Component\Utility\Random::word to generate a single (potentially very long), generate this with multiple calls to word() each of which have a sensible length that better resembles the words we ingest day to day.

Remaining tasks

User interface changes

None-ish.

API changes

None

Data model changes

None

📌 Task
Status

Fixed

Version

10.1

Component
Field 

Last updated 2 days ago

Created by

🇺🇸United States eclipsegc

Live updates comments and jobs are added and updated live.
  • Blocks-Layouts

    Blocks and Layouts Initiative. See the #2811175 Add layouts to Drupal issue.

  • Field UX

    Usability improvements related to the Field UI

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇫🇮Finland lauriii Finland

    Tried to address the feedback on the MR. These changes should make the results appear a little bit more consistent. The minimum length is still an arbitrary value but I documented it as such. I'm not sure there's much else we can do about that.

  • 🇺🇸United States tim.plunkett Philadelphia
  • Status changed to Needs work almost 2 years ago
  • 🇺🇸United States bnjmnm Ann Arbor, MI
    1. +++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/StringItem.php
      @@ -72,7 +72,37 @@ public function getConstraints() {
      +    // When maximum length is more than 10, set an arbitrary minimum length that
      +    // is long enough to resemble a real value.
      

      This doesn't seem accurate anymore, or maybe just not in the ideal place?

    2. +++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/StringItem.php
      @@ -72,7 +72,37 @@ public function getConstraints() {
      +    $min_length = 10;
      +    if ($max_length < 10) {
      +      $min_length = $max_length;
      +    }
      

      We can save some space with a ternary

    3. +++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/StringItem.php
      @@ -72,7 +72,37 @@ public function getConstraints() {
      +      $string .= " $random_value";
      

      Small space saver could be $string .= " {$random->sentences(1)}";

    4. +++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/StringItem.php
      @@ -72,7 +72,37 @@ public function getConstraints() {
      +      $words = explode(' ', $string);
      +      $string = '';
      +      while (mb_strlen($string) < $length) {
      +        $string .= array_shift($words) . ' ';
      +      }
      

      This could still result in $string being longer than $length as it is concatenated before the next check, which could potentially take it past max length depending on the random result. The normalizing basically addresses that but this would further reduce the chances of the string ending in the middle of a word.

       $words = explode(' ', $string);
        $string = array_reduce($words, fn($acc, $item) => 
           mb_strlen("$acc $item") > $length ? $acc : "$acc $item", 
        '');
      
    5. +++ b/core/tests/Drupal/Tests/Core/Field/StringItemTest.php
      @@ -0,0 +1,48 @@
      +    return [
      +      '32' => [32],
      +      '255' => [255],
      +      '4' => [4],
      +      '64' => [64],
      

      Since theres different behavior for max length's under 10, should that be covered here?

    6. Is there any risk of the random max length being too low to be useful since it can potentially be as low as the min length? Perhaps the min could be set to be no lower than a given percentage of the max? Open to other solutions OR an explanation why this isn't a concern.
  • Status changed to Needs review almost 2 years ago
  • 🇫🇮Finland lauriii Finland

    Addressed feedback from #36. I updated the minimum length to 15 characters or 10% of the maximum length, whichever is larger. I also adjusted the length to have slightly more bias towards short values when the maximum length is equal or greater than 255 (the default value) to have more bias towards short values. That seems reasonable given that many people rely on the default value for fields where the value is usually much shorter.

    I didn't implement #36.4 exactly as @bnjmnm suggested because the proposed change wouldn't guarantee that the minimum length is met.

  • Status changed to RTBC almost 2 years ago
  • 🇺🇸United States bnjmnm Ann Arbor, MI

    #37 addresses all of my concerns and implements my suggestions better than what I proposed. The test coverage also looks good.

  • Status changed to Needs work almost 2 years ago
  • 🇷🇴Romania amateescu
    +++ b/core/modules/workspaces/src/Entity/Workspace.php
    @@ -83,7 +83,7 @@ public static function baseFieldDefinitions(EntityTypeInterface $entity_type) {
    -      ->addPropertyConstraints('value', ['Regex' => ['pattern' => '/^[a-z0-9_]+$/']]);
    +      ->addPropertyConstraints('value', ['Regex' => ['pattern' => '/^[a-zA-Z0-9_]+$/']]);
    

    What is the reason for this change?

  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    Yeah, for #40 that change was reverted and fixed in https://git.drupalcode.org/project/drupal/-/merge_requests/1109/diffs?co... and https://git.drupalcode.org/project/drupal/-/merge_requests/1109/diffs?co... but it seems the revert was lost and that change has snuck back in.

    1. +++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/StringItem.php
      @@ -72,7 +72,47 @@ public function getConstraints() {
      +    $values['value'] = rtrim(mb_substr($string, 0, $max_length - 1), ' .') . '.';
      

      Is a full-stop likely to be present at the end of a title field? this feels like overreach

    2. +++ b/core/modules/workspaces/src/Entity/Workspace.php
      @@ -83,7 +83,7 @@ public static function baseFieldDefinitions(EntityTypeInterface $entity_type) {
      -      ->addPropertyConstraints('value', ['Regex' => ['pattern' => '/^[a-z0-9_]+$/']]);
      +      ->addPropertyConstraints('value', ['Regex' => ['pattern' => '/^[a-zA-Z0-9_]+$/']]);
      

      Per above, this shouldn't be needed with the change to ::getModifiedEntityForPostTesting you have already

  • Status changed to Needs review almost 2 years ago
  • 🇫🇮Finland lauriii Finland

    #41.1: Good point! I changed the logic so that if the string contains a single sentence, the string doesn't end with a full stop.

    #41.2 👍

  • Status changed to RTBC almost 2 years ago
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    Thanks, glad to get this Layout builder UX issue fixed.

    This is ready assuming bot agrees

  • 🇷🇴Romania amateescu
    +++ b/core/modules/workspaces/tests/src/Functional/EntityResource/WorkspaceJsonAnonTest.php
    @@ -28,4 +28,23 @@ class WorkspaceJsonAnonTest extends WorkspaceResourceTestBase {
    +    $values['id'] = preg_replace('/[^a-zA-Z0-9_]/', '', $values['id']);
    

    This still needs to be updated to match the (now unchanged) pattern used by the ID field of the Workspace entity type.

  • 🇫🇮Finland lauriii Finland

    Thanks @amateescu!

    • bnjmnm committed 3fc94dc6 on 10.1.x
      Issue #3091478 by Tim Bozeman, lauriii, malcomio, EclipseGc, larowlan,...
  • 🇺🇸United States bnjmnm Ann Arbor, MI

    Committed to 10.1.x. Out of perhaps excess caution, I'd like confirm tests pass on 10.0.x before cherry picking to that branch.

    • bnjmnm committed 829cd7ff on 10.0.x
      Issue #3091478 by Tim Bozeman, lauriii, malcomio, EclipseGc, larowlan,...
  • Status changed to Fixed almost 2 years ago
  • 🇺🇸United States bnjmnm Ann Arbor, MI

    Cherry-picked to 10.0.x

  • Status changed to Needs work almost 2 years ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    This change has resulted in random fails in core/tests/Drupal/KernelTests/Core/Entity/CreateSampleEntityTest.php - if I run this test 50 times locally after this fix it will fail but before it does not. And we're seeing this on DrupalCI - see https://www.drupal.org/pift-ci-job/2600975

    • alexpott committed 100aaec1 on 10.1.x
      Revert "Issue #3091478 by Tim Bozeman, lauriii, malcomio, EclipseGc,...
    • alexpott committed 7cff4a62 on 10.0.x
      Revert "Issue #3091478 by Tim Bozeman, lauriii, malcomio, EclipseGc,...
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I think this might have also broken \Drupal\Tests\user\Kernel\UserEntityTest::testUserValidation on postgres - see https://www.drupal.org/pift-ci-job/2600951 - i'm not 100% sure because I can't reproduce locally but it makes sense because that test generates sample values and I've not see a random test fail in there before.

  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    I think we should revert it then

  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    Oh, you did 🤦 sorry

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    The problem is caused by \Drupal\user\UserNameItem::generateSampleValue()

    The change can result in usernames ending with a space which is going to cause an error. I think we should not call the parent here but do something different.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I suggest we change \Drupal\user\UserNameItem::generateSampleValue() to

      /**
       * {@inheritdoc}
       */
      public static function generateSampleValue(FieldDefinitionInterface $field_definition) {
        $random = new Random();
        // User names larger than 60 characters won't pass validation.
        $max_length = min(UserInterface::USERNAME_MAX_LENGTH, $field_definition->getSetting('max_length'));
        $min_length = min(15, $field_definition->getSetting('max_length'));
        $values['value'] = $random->word(mt_rand($min_length, $max_length));
        return $values;
      }
    
  • Status changed to Needs review almost 2 years ago
  • 🇫🇮Finland lauriii Finland

    This should generate values for the username that more closely resemble real usernames. Not all usernames consist of two words like my algorithm does but it still seems better to generate the name using the sentences generator since it provides random words that are not just arbitrary characters put together. Instead of one word, I'm using two just to allow a bit more variety in the length.

  • Status changed to Needs work almost 2 years ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    One issue for me with the approach in #60 is that you'll never get usernames that are anywhere near the max length. where before they'd be anywhere between 1 and 60 characters.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍
    +++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/StringItem.php
    @@ -72,7 +72,50 @@ public function getConstraints() {
    +    // If the length of the generated string is longer than the maximum length,
    +    // re-create the string word by word to reduce the chances of the string
    +    // ending middle of a word.
    +    if (mb_strlen($string) > $max_length) {
    +      $words = explode(' ', $string);
    +      $string = array_reduce($words, function ($current, $item) use ($length) {
    +        return mb_strlen($current) > $length ? $current : "$current$item ";
    +      }, '');
    +    }
    +
    +    // Normalize the string to be shorter than the maximum length.
    +    $normalized_value = rtrim(mb_substr($string, 0, $max_length - 1), ' .');
    +
    +    // Ensure that the string ends with a full stop if there are multiple
    +    // sentences.
    +    $values['value'] = $normalized_value . (str_contains($normalized_value, '.') ? '.' : '');
    

    This is overly complex and slower compared to

    if (mb_strlen($string) > $max_length) {
      $string = substr($string, 0, $length);
      $string = substr($string, 0, strrpos($string, ' '));
    }
    

    And it is also means you need to normalise... if you did the above you could do...

    // The early return for <= 15
    
    // Reduce the max length to allow us to add a fullstop.
    $max_length -= 1;
    
    // All the code till we start trying to reduce the length if necessary...
    
    if (mb_strlen($string) > $max_length) {
      $string = substr($string, 0, $length);
      $string = substr($string, 0, strrpos($string, ' '));
    }
    
    $string = rtrim($string, ' .')
    
    $values['value'] = rtrim($string, ' .') . '.';
    

    See https://3v4l.org/0tqIa

    This would also mean that single sentences end with a fullstop. Whereas no they don't. If that is by design then we can re-implement some of the odd normalisation logic.

  • Status changed to Needs review almost 2 years ago
  • 🇫🇮Finland lauriii Finland

    I was trying to optimize the previous solution so that we always return string that is at least the length that was calculated for the sample value. This seems unnecessary and allows us to simplify the code.

    Tried to address both #62 and #61.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I've run the core/tests/Drupal/KernelTests/Core/Entity/CreateSampleEntityTest.php test over 100 times locally and there are no fails so #63 is no vulnerable to the random fails caused by #45

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I think that maybe we need to improve how this method works to take into account any constraints there are on the field. There are some issues with current approach. The workspaces fix is only implemented on the test level and therefore is generate a sample workspace ID outside of this it's not going to be valid. That's because \Drupal\Core\Field\Plugin\Field\FieldType\StringItem::generateSampleValue() is only taking care of the max length constraint and nothing else. This is also why the username constraint failed.

    I think in order to generate correct sample values we have to take constraints into account somehow. An extremely tricky constraint is anything based on UniqueFieldConstraint - I think there's a decent chance that the new formulation might generate a higher percentage of same text due to the differences in how \Drupal\Component\Utility\Random::sentences() and \Drupal\Component\Utility\Random::word() work. Ugh.

  • 🇫🇮Finland lauriii Finland

    Maybe this could be acceptable? It doesn't look like the other random generators are taking into account all constraint validators either. Also, there are hard coded assumptions like this in other default value generators. See \Drupal\Core\Field\Plugin\Field\FieldType\LanguageItem::generateSampleValue for example.

  • Status changed to Needs work almost 2 years ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I think #66 is a good first step - taking the unique constraint into account makes sense. Let's proceed here with that.

  • Status changed to Needs review almost 2 years ago
  • 🇫🇮Finland lauriii Finland

    Thanks @alexpott! This should fix the test failures.

  • 🇬🇧United Kingdom longwave UK

    So the test failure is a bit weird:

      1x: Method "Symfony\Component\Validator\Constraint::validatedBy()" might add "string" as a native return type declaration in the future. Do the same in child class "Drupal\Core\Validation\Plugin\Validation\Constraint\UniqueFieldConstraint" now to avoid errors or add an explicit @return annotation to suppress this message.
    

    Not sure why the debug class loader is only now throwing this forward-compatibility warning. We have seen similar in 📌 Use PHP attributes instead of doctrine annotations Fixed where the proposed resolution is a wider deprecation skip and opened 📌 Handle newly added deprecation skip of "%Method "[^"]+" might add "[^"]+" as a native return type declaration in the future. Do the same in (child class|implementation) "Drupal\\[^"]+" now to avoid errors or add an explicit @return annotation to suppress" Postponed to deal with solving that, but that is quite a sledgehammer approach to me and risks us missing other deprecations in other issues.

    A simple fix for this case is just to do what the error says. We can't add a return type because this class might have been extended, but we can add the @return annotation, so that's what I've done in this patch.

  • Status changed to Needs work almost 2 years ago
  • 🇺🇸United States bnjmnm Ann Arbor, MI

    All I spotted were doc nits. I'm going to dig a little further to see if there are any uses of \Drupal\Core\Field\Plugin\Field\FieldType\StringItem::generateSampleValue that might need to be considered, but given that @alexpott has already provided this a good look, there's a good chance that will be fine.

    1. +++ b/core/modules/user/src/UserNameItem.php
      @@ -28,9 +29,38 @@ public function isEmpty() {
      +    // Usernames may or may not consist of spaces. When they consist of space,
      +    // they consist of 2 or 3 words.
      

      This initially confused me. Maybe something like "Begin with a username that includes a space and is either 2 or 3 words. This may change depending on which of the random conditions below return TRUE."

    2. +++ b/core/modules/user/src/UserNameItem.php
      @@ -28,9 +29,38 @@ public function isEmpty() {
      +    // Sometimes usernames may consist of words written with capital letter,
      +    // sometimes not.
      

      Perhaps something like "50% of the time, the words used by the username will be capitalized." The "Sometimes" reads to me as if it's reacting to something that exists, vs what it's actually doing: creating it.

    3. +++ b/core/modules/user/src/UserNameItem.php
      @@ -28,9 +29,38 @@ public function isEmpty() {
      +    // word with a period in the middle of the username.
      

      Can this get a simliar rephrasing: "50% of the time the userame will... etc"

    4. +++ b/core/tests/Drupal/Tests/Core/Field/StringItemTest.php
      @@ -0,0 +1,60 @@
      +    foreach ([TRUE, FALSE] as $unique) {
      +      $definition = $this->prophesize(FieldDefinitionInterface::class);
      +      $constraints = $unique ? [$this->prophesize(UniqueFieldConstraint::class)] : [];
      +      $definition->getConstraint('UniqueField')->willReturn($constraints);
      +      $definition->getSetting('max_length')->willReturn($max_length);
      

      Can there be a quick explanation for what's being checked differently based on $unique? I assume $unique needs to equal the max value, and the first few lines in the condition are just adding the constraint then confirming the constraint was added successfully?

  • 🇺🇸United States bnjmnm Ann Arbor, MI

    With the changes here \Drupal\Core\Field\Plugin\Field\FieldType\UriItem::generateSampleValue creates sample URLs with spaces, so that likely needs to be addressed too.

  • Status changed to Needs review almost 2 years ago
  • 🇫🇮Finland lauriii Finland

    This should address feedback from #72 and #73.

  • @timplunkett opened merge request.
  • 🇺🇸United States tim.plunkett Philadelphia

    Switched to MR, fixed the namespace mismatch

  • Status changed to RTBC almost 2 years ago
  • 🇺🇸United States smustgrave

    Feedback appears to be addressed.

    • larowlan committed 15cde9a5 on 10.1.x
      Issue #3091478 by lauriii, Tim Bozeman, malcomio, tim.plunkett,...
  • Status changed to Fixed over 1 year ago
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    Committed and pushed to 10.1.x, 🤞 we don't get random fails this time 💪

    • larowlan committed abc96bad on 10.1.x
      Revert "Issue #3091478 by lauriii, Tim Bozeman, malcomio, tim.plunkett,...
  • Status changed to Needs work over 1 year ago
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    This introduced another random fail in \Drupal\Tests\user\Kernel\Migrate\MigrateUserStubTest

    In HEAD

    phpunit -c app/core app/core/modules/user/tests/src/Kernel/Migrate/MigrateUserStubTest.php  --repeat=100 --stop-on-failure
    Bootstrapped tests in 0 seconds.
    PHPUnit 9.6.5 by Sebastian Bergmann and contributors.
    
    Testing 
    ....F
    
    Time: 00:03.845, Memory: 10.00 MB
    
    There was 1 failure:
    
    1) Drupal\Tests\user\Kernel\Migrate\MigrateUserStubTest::testStub
    Failed asserting that an object is empty.
    
    
    

    After revert

    phpunit -c app/core app/core/modules/user/tests/src/Kernel/Migrate/MigrateUserStubTest.php  --repeat=100 --stop-on-failure
    Bootstrapped tests in 0 seconds.
    PHPUnit 9.6.5 by Sebastian Bergmann and contributors.
    
    Testing 
    ...............................................................  63 / 100 ( 63%)
    .....................................                           100 / 100 (100%)
    
    Time: 01:16.619, Memory: 10.00 MB
    
    OK (100 tests, 40400 assertions)
    
    
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    🐛 Improve string field sample value generation Closed: duplicate is a similar issue with a different approach, it adds Random::phrase() - worth looking to see if we can combine efforts?

  • Status changed to Needs review over 1 year ago
  • 🇫🇮Finland lauriii Finland

    I don't think the approach from 🐛 Improve string field sample value generation Closed: duplicate would help.

    I added commit that addresses the random fail. It's actually pretty nice because it also allows us to remove a todo comment 😇

    phpunit --configuration /Users/lauri.eskola/Projects/drupal/core/phpunit.xml core/modules/user/tests/src/Kernel/Migrate/MigrateUserStubTest.php --repeat=100 --stop-on-failure
    PHPUnit 9.5.28 by Sebastian Bergmann and contributors.
    
    Warning:       Your XML configuration validates against a deprecated schema.
    Suggestion:    Migrate your XML configuration using "--migrate-configuration"!
    
    Testing
    ...............................................................  63 / 100 ( 63%)
    .....................................                           100 / 100 (100%)
    
    Time: 07:22.825, Memory: 10.00 MB
    
    OK (100 tests, 40400 assertions)
    
  • 🇳🇿New Zealand quietone

    With the change in #86 we have the name generated twice, once in \EntityContentBase::processStubRow and then again in \Drupal\user\Plugin\migrate\destination\EntityUser::processStubRow. Is there no way to do it once?

  • 🇫🇮Finland lauriii Finland

    With the change in #86 we have the name generated twice, once in \EntityContentBase::processStubRow and then again in \Drupal\user\Plugin\migrate\destination\EntityUser::processStubRow. Is there no way to do it once?

    That's true. What's worth noting is that the property was already set twice with majority of the sample values, meaning that this shouldn't cause issues. Also, calling ::generateSampleValue is relatively cheap so I'm not too worried that the value is generated twice.

  • Status changed to RTBC over 1 year ago
  • 🇺🇸United States bnjmnm Ann Arbor, MI

    The solution to addressing the most recent revert-causing-issue seems solid. It was due to a user name item in Migrate not using the generateSampleValue() that fits the requirement for a UserNameItem. The workaround there previously could become out of sync with UserNameItem::generateSampleValue(), but now that it's called directly it should be a more solid solution.

    The rest of the issue has been vetted several times over. I grepped through the code looking for any other potential surprises and did not find any. The migrate/userNameItem slipped through previously as it was not using the expected class to for generateSampleValue(). Lets go for round 3.

  • Status changed to Needs work over 1 year ago
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    Left a comment in the MR - tl;dr I think we should add an issue to fix https://git.drupalcode.org/project/drupal/-/blob/10.1.x/core/modules/mig... and either postpone this on that issue, or add a @todo here pointing to that to remove the duplicate call once the second issue is fixed.

  • Status changed to RTBC over 1 year ago
  • 🇫🇮Finland lauriii Finland

    Opened follow-up and added a @todo comment 👍

  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

    🤞 Third time lucky

    Committed to 10.1.x - thanks everyone

  • Status changed to Fixed over 1 year ago
    • larowlan committed 036bf79c on 10.1.x
      Issue #3091478 by lauriii, Tim Bozeman, malcomio, tim.plunkett,...
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024