- Issue created by @cosmicdreams
 
It appears that TrimWhitspace's use of the "u" regular expression parameter can sometimes deliver null responses for content.
TrimWhitespace uses 4 preg_replace functions that use the "u" paramter to parse incoming text.
 protected function processFieldValue(&$value, $type) {
    if (!$this->getDataTypeHelper()->isTextType($type, ['text', 'string'])) {
      return $value;
    }
    $preserve = $value;
    $value = str_replace(" ", '', $value);
    // Remove multiple spaces.
    $value = preg_replace('/( {2,})+/imu', ' ', $value);
    // Remove spaces before punctuation.
    $value = preg_replace('/\s+([!?.,])/imu', "$1", $value);
    // Remove any space at the start of a string.
    $value = preg_replace('/^\s+/imu', '', $value);
    // Remove any non-printable characters.
    $value = preg_replace('/[[:^print:]]/imu', '', $value);
    $value = trim($value);
  }
When $value is a string that isn't UTF-8 encoded this will return null.
Not sure exactly how to rig up this test but if you ever process content that isn't UTF-8 encoded then the TrimWhitespace filter will turn all provided values to Null.
While I don't understand how I am delivering non-UTF-8 text to indexing, I don't think I've done anything particularly strange to get here.
I wonder if your module should check the incoming encoding of the string and only use UTF filtering when the string is UTF-8.
https://www.php.net/manual/en/function.mb-detect-encoding.php
Get consensus about this fix.
Active
1.0
Code