Lock down $form_id matching

Created on 2 October 2019, over 5 years ago
Updated 16 August 2024, 8 months ago

Follow-up to #2912575: From with 'research_' in machine name is not protected β†’ .

When honeypot is configured to protect all forms, the following code is used to whitelist some core and contrib forms:

    // Don't protect system forms - only admins should have access, and system
    // forms may be programmatically submitted by drush and other modules.
    if (preg_match('/[^a-zA-Z]system_/', $form_id) === 0 && preg_match('/[^a-zA-Z]search_/', $form_id) === 0 && preg_match('/[^a-zA-Z]views_exposed_form_/', $form_id) === 0) {
      honeypot_add_form_protection($form, $form_state, array('honeypot', 'time_restriction'));
    }

Additionally the following code is used to match webforms:

      // For webforms use a special check for variable form ID.
      elseif ($protect_form_id == 'webforms' && (strpos($form_id, 'webform_client_form') !== FALSE)) {
        honeypot_add_form_protection($form, $form_state, array('honeypot', 'time_restriction'));
      }

Since hook_form_alter() is also called for the base form id, a matching of variable form ids seems unnecessary. Additionally, the first case can be condensed into a single regex pattern.

The goal of this issue is to narrow down form matching to simplify the code and reduce unwanted side effects.

πŸ“Œ Task
Status

Needs work

Version

2.2

Component

Code

Created by

πŸ‡©πŸ‡ͺGermany ciss

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States tr Cascadia
  • πŸ‡ΊπŸ‡ΈUnited States tr Cascadia

    OK, I opened a MR because I was intending to try to re-roll this patch against the current 2.2.x development branch, but there are all sorts of problems.

    I don't think I can make any progress on this because a lot of it is no longer relevant, and the remaining pieces are going to need a lot more discussion because they will definitely change the behavior of this module.

    So the two pieces that I believe are no longer needed are:

    1. Protection for programmatically submitted forms is already turned off - no need to do it again in this patch. See #2677126: Honeypot should not be triggered on programmatically submitted forms. β†’
    2. Webform now does its own hook_form_alter() so I don't think there is any need for the Webform-specific code changes.

    Then there is the replacement of the regular expressions with the check to see if the path is an admin path. I don't really think that does the same thing as the regular expression. But that stuff isn't well documented so I did some research:
    The current patterns for form_ids to exclude from Honeypot protection are:

    /[^a-zA-Z]system_/
    /[^a-zA-Z]search_/
    /[^a-zA-Z]views_exposed_form_/
    1. Initially this was just 'system_', which was added in #1253480: Drush calls to enable or disable modules raise time limit warnings β†’
    2. Then 'search_' was added by πŸ› Disable Honeypot on search forms Fixed
    3. Then 'views_exposed_form_' was added by #1498940: Views exposed auto submit filter doesn't work with Honeypot β†’
    4. And then the simple string matching was converted to regular expressions in #2912575: From with 'research_' in machine name is not protected β†’ .
    5. (In fact this current issue was opened as a follow up to #2912575: From with 'research_' in machine name is not protected β†’ )

    My proposal in #10 was to use:

    $route = \Drupal::routeMatch()->getRouteObject();
    $is_admin = \Drupal::service('router.admin_context')->isAdminRoute($route);

    instead of the regular expressions, which does identify admin paths in D8+ (and I think it's good to exclude those), but I don't think this serves the same purpose as the regular expressions above.

    The patch in #7 suggests:

    // Exclude admin forms, with the exception of node forms.
    $is_system_form = path_is_admin(current_path()) && ($base_form_id !== 'node_form');
    
    if (empty($form_state['programmed']) && !$is_system_form && !in_array($base_form_id, $excluded_forms)) {

    which is also quite different than the regular expressions above. And it adds 'programmed' forms which are not needed as I said because they are handled elsewhere. And it adds 'node_form' which should be discussed.

    What is the current justification for the use of base_path?

    So yes, I do want to get rid of that regexp, but we should be VERY clear and explicit about what we intend to exclude and why. And that means at a minimum writing a test to make sure the correct paths are excluded and nothing else. If we still need to exclude those same forms for the same reasons they're there for, then checking if it's an admin path isn't good enough. A search form can appear on a non-admin page, so if we want to exclude search forms for instance we have to do more than check if the path is an admin patch.

    And unless I'm totally mistaken, the above regular expressions are just plain WRONG. Here's a test script:

    <?php
    
    $form_ids = [
      '/admin/config/system/cron' => 'system_cron_settings',
      '/admin/modules/uninstall' => 'system_modules_uninstall',
      '/node/5/edit' => 'node_article_edit_form',
      '/node/5' => 'comment_form',
      '/planets' => 'solarsystem_form',
      '/moons' => 'solar_system_form',
    ];
    
    foreach ($form_ids as $path => $form_id) {
      // Regular expressions from honeypot.module.
      if (preg_match('/[^a-zA-Z]system_/', $form_id) === 0 && preg_match('/[^a-zA-Z]search_/', $form_id) === 0 && preg_match('/[^a-zA-Z]views_exposed_form_/', $form_id) === 0) {
        print_r("Honeypot protection will be added to $path \n");
      }
    }
    

    My expectation is that only the first two should be excluded from Honeypot protection. Instead, they ALL are excluded, EXCEPT for the last one. This test tells me that the regular expression DOESN'T EXCLUDE what it means to exclude. It's just wrong.

    I really don't see how this could have been so broken for so long (six years!) so I would like a sanity check from the community before I totally redo the regular expressions and replace them with something more sensible.

Production build 0.71.5 2024