Consider to remove exception handling from search_api_views_data

Created on 21 July 2017, over 7 years ago
Updated 2 April 2024, about 1 year ago

Problem/Motivation

Catching exceptions in search_api_views_data leads to incomplete Views data definitions and therefore partially broken handlers. It happens in one of my projects because there is a deadlock with cache_config. The root cause of this issue is clearly not it Search API, but it breaks the whole site. I think its better to not catch the exceptions because it will not write an incomplete definition into the cache system.

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Needs review

Version

1.0

Component

Views integration

Created by

🇩🇪Germany webflo

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇩🇪Germany mrshowerman Munich

    Re-roll.

  • 🇬🇧United Kingdom jacobupal Leeds

    How's this patch working on Drupal 10?

  • 🇫🇷France nicolas bouteille

    Hello,

    I am experiencing this as well.
    What I don't really understand is why "not catching the exceptions" would prevent this from happening…?
    From what I understand, this error happens because of simultaneous calls to search_api_views_data.
    In my case as well, this happens when a cache rebuild is in progress while someone, at the same time is trying to access the page of an indexed entity. search_api_views_data seems to be triggered in both situations, leading to simultaneous inserts into cache_config table, leading to deadlock exception and incomplete definition into the cache system, that can be resolved with a new cache rebuild (provided there is no new conflict this time).
    I don't understand the solution suggested to not catch exceptions… this will not prevent the deadlock from happening nor will it prevent the data from being incomplete in the cache system?
    What do I miss…?

    Personaly, I am happy to receive exceptions that let me know something bad happened.
    What I would want though, is to understand how I could prevent this from happening.

    I don't master yet what search_api_views_data does and when / how often it can be called…
    I sometimes make a full cache rebuild on our production website without putting the whole site into maintenance mode. Until recently, I never had this type of error even with Search API in use. But this new deadlock problem with Search API is quite problematic until one can rebuild caches again so I'd like to understand what I should do or not do to prevent this.

    Thank you!

  • 🇧🇪Belgium nils.destoop

    The solution will not prevent your deadlock from happening. The solution will only make sure that your views data cache is not broken when the exception occurred.
    By catching it, the caching system does not known that something went wrong and thinks the returned data is complete. While in realtime, it is missing all views plugins that still needed to be scanned.

    The deadlock is not caused by views data itself, but just a combination of lot's of config, leading to a very big set of cache data that needs to be written during a full cache rebuild.

  • 🇩🇪Germany Anybody Porta Westfalica

    Patch is >2 years old now. Should someone turn #33 into a MR to push things forward? What do you think how to proceed here @drunken monkey?

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Patch is >2 years old now. Should someone turn #33 into a MR to push things forward? What do you think how to proceed here @drunken monkey?

    I’d still like feedback on my suggestion from #23: Would re-trying the whole function maybe help?
    In any case, seems most people agree that, in the end, we should rather throw an exception than return incomplete Views data, so we can go with that when retrying failed (or right away, if people don’t think a retry is a good option).

    Created an MR with a draft of what I’d have in mind. Would be great if a few people gave it a try (especially people whose site often encounters this problem) to see if just re-calling the function maybe fixes the problem at least some of the time?
    Also, more general feedback on this problem and whether not catching the exception would be a good idea would still be welcome.

  • 🇫🇷France nicolas bouteille

    After digging a bit more:

    It seems that Index::loadMultiple() in search_api_views_data() is responsible for writing into cache_config if the indexes are no longer in cache.

    Also, it seems that search_api_views_data() (like all hook_views_data implementations) will only be called if both views_data:[language] and views_data:views:[language] entries are missing from table cache_default.

    When the caches are being emptied, any visitor accessing a page will thus trigger search_api_views_data(), which will then try to write into cache_config because the indexes are missing from cache.

    Drush cr will also call search_api_views_data() in its process.

    Most probable scenario: drush cr gets here first, puts the lock on cache_config leading to the deadlock found error when search_api_views_data() gets executed by visitors requesting pages. This is no so bad since drush cr will eventually call search_api_views_data() anyway, so we could indeed not throw the exception.

    Worst case scenario : a visitor triggers search_api_views_data() and puts a lock on cache_config preventing drush cr from rebuilding correctly the caches. This is bad and should be avoided.

    Suggested solution:
    I think we should only allow the execution of search_api_views_data() if triggered by drush cr or /admin/flush and stop its execution if triggered by a regular page request.

    Here is the custom code I will be adding right now to my project in order to do so. What do you think?

    <?php
    if (php_sapi_name() === 'cli' && in_array('./vendor/bin/drush', $_SERVER['argv'])) {
      // case drush cr: do not block execution
    }
    elseif (\Drupal::routeMatch()->getRouteName() === 'admin_toolbar_tools.flush') {
      // case /admin/flush: do not block execution
    }
    else {
      // triggered by visitor requesting a page: stop execution to make sure no lock gets in the way of cache rebuilding
      return [];
    }
    ?>
    
  • 🇫🇷France nicolas bouteille

    I would like to suggest one more thing: if Index::loadMultiple is indeed the code responsible for writing in cache_config when the indexes are not in cache (because search_api_index are configuration entities and not content ones…) Then I think maybe we'd better not use loadMultiple to retrieve index data inside search_api_views_data() which is a hook that will be called when caches are rebuilt. More direct database requests would avoid lock conflicts here on cache_config.

    That being said, I still believe our best bet here is to prevent the execution of search_api_views_data() unless it is being triggered by drush cr or admin/flush because even if the lock conflicts are avoided for cache_config, there could be new lock conflicts when multiple simultaneous calls to search_api_views_data() try to write the same data at the same place.

    For the record, I now understand better why we are having these lock conflicts on our production website when we thought rebuilding cache under maintenance mode would prevent them from happening. We have light ajax calls triggered on our pages every 20s to store our student progress. Normally these ajax calls never trigger search_api_views_data() even if all cache tables are emptied.
    However, I noticed that when the site is under maintenance and all caches are emptied, calling one of these ajax urls will trigger search_api_views_data() this time. Normally, search_api_views_data() should be triggered only once... but if multiple ajax calls are requested quite simultaneously on a production website with lots of visitors, they will all trigger search_api_views_data and generate multiple deadlock errors in the logs. This is what we faced during our last deployment. The code I suggested above should protect us from that in the future I hope.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Suggested solution:
    I think we should only allow the execution of search_api_views_data() if triggered by drush cr or /admin/flush and stop its execution if triggered by a regular page request.

    If we do that, won’t Views just cache that empty array as the new Views data for the Search API, breaking all search views?

    Also, I’m pretty sure other code is loading config entities during their hook_views_data() implementation as well, so if Views or Drupal cannot deal with that (reliably) I don’t think us working around that problem makes much sense – it should be fixed directly in Core.

  • 🇫🇷France nicolas bouteille

    If we do that, won’t Views just cache that empty array as the new Views data for the Search API, breaking all search views?

    The most important IMHO is to prevent search_api_views_data to put a lock on cache_config that would prevent drush cr from correctly rebuilding caches. Knowing that drush cr also calls search_api_views_data anyway, this is why I suggested this. But I have to confess that for my suggestion, I assumed that search_api_views_data would be called only when caches are being rebuilt. This is because when testing, the only way I could make a classic page request trigger search_api_views_data was when I emptied caches tables. If there is even a slight chance that hook_views_data are called outside of a cache rebuild situation, what you predict would indeed happen, the empty array could override the search data…
    In that case, we should first try to detect if the caches are actually being rebuilt and only block the execution triggered by classic page requests when it is the case.

    Also, I’m pretty sure other code is loading config entities during their hook_views_data() implementation as well

    I had the exact same thought, but the other way around... I wondered: how come search API is the only one of our module to ever cause deadlock on cache_config in their hook_views_data implementation? So that's why I started to question the implementation and the use of Index::loadMultiple in that context.
    I think this context is pretty tricky indeed. The fact that on a live website with many visitors, even with the site under maintenance hook_views_data can be called many times simultaneously is problematic.
    More generally speaking though, I think maybe ideally the code responsible for rebuilding Search API data available for Views should not be triggered by visitors page call ever. Should all hook_views_data implementations never get triggered by visitors? Or should Search API rely on something else than this hook to rebuild its data for views...?

  • 🇧🇪Belgium borisson_ Mechelen, 🇧🇪

    The most important IMHO is to prevent search_api_views_data to put a lock on cache_config that would prevent drush cr from correctly rebuilding caches.

    I talked to @webflo today about this issue. We think that putting a lock, in addition to removing the return is probably a really good way to ensure that we always have a valid views defnition.

    We never want to have a silently broken site.

  • 🇫🇷France nicolas bouteille

    Ok it seems there might be some confusion about my suggestion here.
    I am not saying that search_api_views_data should not put a lock on cache_config when it needs to write inside that table.
    My main point is that, when you know for sure that someone very important is about to write stuff really important inside a book, you don't want to take away the book and prevent them to do it just so that you can write down something they were gonna write anyway.
    So, when we now that cache are being rebuilt, and that a lot of important stuff is about to be written in cache_config + the cache rebuilding process is about to call search_api_views_data anyway, we should not allow any other process to trigger search_api_views_data as well at the same time, thus blocking any writing inside cache_config, especially not a visitor requesting a simple page, especially while in maintenance mode. Which is what we are facing right now.

    The problem here is not a silently broken site, it is a broken site that you cannot repair, because even in maintenance mode drush cr always fails because of the locks put on cache_config by search_api_views_data triggered by visitors requesting pages. The more visitors we have, the more risks we have of never being able to complete a drush cr on a production site.

    So I agree that my code would need an additional condition that would make sure caches are being rebuilt before preventing any other processes than drush cr to go through simultaneously.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    @borisson_: OK, so what would you (or @webflo) say is the best way forward here?

  • 🇧🇪Belgium borisson_ Mechelen, 🇧🇪

    I have to be honest, I don't really remember all of that conversation, but I hope @webflo does.

  • Pipeline finished with Success
    about 1 month ago
    Total: 459s
    #431927
  • 🇩🇪Germany mrshowerman Munich

    Rebased.

Production build 0.71.5 2024