Automatic filter regexp can be optimized

Created on 27 July 2023, over 1 year ago

Problem/Motivation

Currently, the regexp built for N matching patterns of length L has complexity (L+2)*N, but it could be 2+L*N instead.

Steps to reproduce

Consider test case "A foo is a bar" and patterns ['foo', 'bar'] as in MatcherTest::providerHandle.

The regex101 debugger shows the two matches found after 38 steps and the conclusion of no third match 11 steps later, so 49 steps.

Proposed resolution

Move the word boundaries out of the individual matches in the capturing group.

The regex101 debugger shows the two matches found after 30 steps and the conclusion of no third match 6 steps later, so 36 steps vs 49 or a 27% reduction in work done. The saving is likely to be proportional to the number of matches.

Remaining tasks

Apply change, ensure it breaks no case.

User interface changes

None, test only.

API changes

None, test only.

Data model changes

None, test only.

🐛 Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

🇫🇷France fgm Paris, France

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024