oEmbed legacy regex results in 'Catastrophic backtracking'

Created on 27 October 2015, almost 10 years ago
Updated 26 November 2024, 10 months ago

We are currently experiencing problems with oEmbed when the body contains certain HTML.

I have tracked down the problem to line 14 in oembed_legacy.inc:

$text = preg_replace_callback("[regex]", '_oembed_preg_parse', $text);

The above regex seems to be a bit to hard to complete when the body contains something like the following:

<p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">
</p>

(the inline-styling are not ours, this markup comes from embedding an instagram-post)

It seems that it fails because it is not a possible match to have no URL. This means that it has to try to match the string with any given value, backtracking every time a value is not found. This results in a lot of backtracking potentially. Adding a ? after the URL-matching will allow it to pass when no URL is found, avoiding a lof of the backtracking. Also, with a more complex body-field (the entire embed-code) it fails at group 6. The same fix applies there, adding a ? after the [ \n\r\t\)] in that group fixes it.

So the proposed regex is: `(^|<p(?:\s[^>]*)*>|<li(?:\s[^>]*)*>|<br(?:\s[^>]*)*>|[ \n\r\t\(])((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))?([.,?!]*?)(?=($|</p>|</li>|<br\s*/?>|[ \n\r\t\)]?))`i

Also, setting the case-insensitive-switch and adding A-Z seems redundant.

So the improved regex-expression would be:
`(^|<p(?:\s[^>]*)*>|<li(?:\s[^>]*)*>|<br(?:\s[^>]*)*>|[ \n\r\t\(])((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://)([a-z0-9@:%_+*~#?&=.,/;-]*[a-z0-9@:%_+*~#&=/;-]))?([.,?!]*?)(?=($|</p>|</li>|<br\s*/?>|[ \n\r\t\)]?))`i

I will create a patch and attach in a moment.

πŸ› Bug report
Status

Closed: outdated

Version

1.0

Component

Code

Created by

πŸ‡©πŸ‡°Denmark mian3010

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024