Word count is incorrect due to concatenation and splitting

Created on 6 April 2023, over 1 year ago
Updated 21 April 2023, over 1 year ago

Problem/Motivation

The word count is incorrect due to the following issues:

  1. The fields values are concatenated without a delimiter. This causes an off by 1 error foreach field concatenated.
    Example:
    • Field 1 text: "<p>one</p>"
    • Field 2 text: "<p>two</p>"
    • After concatenation: "<p>one</p><p>two</p>"
    • After strip_tags for word count: "onetwo"
    • BUG: word count is 1.
    • Expected: word count should be 2.
  2. The combined words text is split by a space and assumes each array item is a "word". This assumption is not true when there are leading spaces, trailing spaces, new line characters or other characters that would not be visible.
    Example:
    • After concatenation: "<p> one</p> <p>two </p>" - Note the space before "one" and the space after "two"
    • After strip_tags for word count: " one two "
    • BUG: word count is 4. The preg_split by \s+ will result in 4 items.
    • Expected: word count should be 2.

Proposed resolution

Proposed fixes for the issues above:

  1. Add a space when combining the field values.
  2. Trim and filter the words array after splitting. This is not ideal but this keeps the regex simple and then trimmed after. The preg_split could be updated with a flag for PREG_SPLIT_NO_EMPTY , however this only catches purely empty items and not items that could have other characters that would be trimmed by trim.
πŸ› Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States recrit

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024