Taxonomy imports with relationships may hang

Created on 14 August 2023, over 1 year ago
Updated 31 July 2024, 4 months ago

Problem/Motivation

I think I've ended up with two issues under the same umbrella, but am not 100% positive:

  • When two terms with the same name share different parents, the import can become unable to resolve these and appears to hang (only applies to safe import, I believe).
  • Other term hierarchy dependent upon the order of creation of ancestors of a term also may not be handled properly and will hang (test patch included for this case, applies to all import methods, not just safe).

Steps to reproduce

Create a taxonomy tree with two identically-named terms under different parents. Export these to structure_sync.data.yml and then import. After creating all items (except the 2nd one of the same name), it will hang indefinitely. Also reference the test case where the order of generation of ancestors can impact processing.

Proposed resolution

I'm not sure the best way to actually resolve, but I see where the issue is - at least when tracing through importTaxonomiesSafe:

When a term contains a parent that is not yet present, it is added to the $tidsLeft. Once a subsequent pass through completes and adds the first child, all subsequent queries for the other child will find the first one, and prevent it from ever being removed from $tidsLeft. There is also the case if things being added to $tidsLeft, despite them already being present in $tidsDone.

I got the same resulting apparent hang (unending loop) when doing a "Full" run, but did not trace that code path any further - while I see some UUID's can be used, I don't think those were included.

Remaining tasks

Patch to identify/resolve infinite loop situation. It may well be possible to resolve this by adding the parent ID to the taxonomy_term query (resolving the parent first w/$newTids, of course), but I'm not sure of any side-effects of this presently.

At worst, it's simple enough to identify the case where the term was "found" (the one with a different parent) but also still exists in $tidsLeft in order to report the issue and break out of the loop or just remove it from $tidsLeft to let the loop complete.

As for not re-adding things to tidsLeft (when they are in tidsDone) this is already in the patch I am currently using/have attached.

User interface changes

None

API changes

Data model changes

πŸ› Bug report
Status

RTBC

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States wolcen

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @wolcen
  • πŸ‡ΊπŸ‡ΈUnited States wolcen

    Adding likely-related issue report. Sorry - hadn't noticed that one before (wasn't searching for dead lock, doh!)

  • πŸ‡ΊπŸ‡ΈUnited States wolcen

    So - having been bitten by this again, I dug a little deeper. I can now confirm that this will happen on the "full" side as well and have created a patch that will reproduce the issue in your tests.

    This can happen regardless of duplicates, specifically, when the order of parent term id's is not correctly sorted.

    I did review the other referenced patch in 3378470 and we're seeing the same thing: the issue comes down cases such as this where you have to be careful not to add things to tidsLeft when they are already in tidsDone, which is exactly what happens with this test.

  • πŸ‡ΊπŸ‡ΈUnited States wolcen

    Adding patch that basically does the same as #3378470, but without causing as much diff noise to make the change more obvious.

  • Open on Drupal.org β†’
    Core: 10.1.4 + Environment: PHP 7.4 & MySQL 8
    last update 12 months ago
    Waiting for branch to pass
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.x + Environment: PHP 8.2 & MySQL 8
    last update 12 months ago
    5 pass, 2 fail
  • πŸ‡ΊπŸ‡ΈUnited States rwohleb

    Appears to be related to https://www.drupal.org/project/structure_sync/issues/3177106 ✨ Support terms with the same name in a tree structure and multi-parent terms Needs work .

  • πŸ‡ΊπŸ‡ΈUnited States wolcen
  • Status changed to Needs work 12 months ago
  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    Some investigation shows that the error reported in this ticket is the cause of the failing \Drupal\Tests\structure_sync\FunctionalJavascript\StructureSyncMenuLinksTest::testMenuLinksExportImportUsingAdmin() in the 2.x branch. The test times out, but looking at the the artifacts and manually testing shows the batch job seems to just hanging there and never completing on the front-end. Checking the database logs, there are errors.

    Unfortunately, this patch doesn't fully fix the test, and I can confirm the batch job gets stuck when I perform the tests manually.

    The errors I get in the log after applying this patch are...

      • Channel: menu_link_content
      • Severity: Error
      • Location: /batch?_format=json&id=2&op=do
      • Referrer: /batch?id=2&op=start
      • Message: Drupal\Core\Database\IntegrityConstraintViolationException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '48062c1c-132b-476e-bc3c-8511fb8896e4' for key 'menu_link_content_field__uuid__value': INSERT INTO "menu_link_content" ("revision_id", "bundle", "uuid", "langcode") VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3); Array ( [:db_insert_placeholder_0] => [:db_insert_placeholder_1] => menu_link_content [:db_insert_placeholder_2] => 48062c1c-132b-476e-bc3c-8511fb8896e4 [:db_insert_placeholder_3] => en ) in Drupal\mysql\Driver\Database\mysql\ExceptionHandler->handleExecutionException() (line 43 of /app/web/core/modules/mysql/src/Driver/Database/mysql/ExceptionHandler.php)
      • Channel: php
      • Severity: Error
      • Location: /batch?_format=json&id=2&op=do
      • Referrer: /batch?id=2&op=start
      • Message: Drupal\Core\Entity\EntityStorageException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '48062c1c-132b-476e-bc3c-8511fb8896e4' for key 'menu_link_content_field__uuid__value': INSERT INTO "menu_link_content" ("revision_id", "bundle", "uuid", "langcode") VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3); Array ( [:db_insert_placeholder_0] => [:db_insert_placeholder_1] => menu_link_content [:db_insert_placeholder_2] => 48062c1c-132b-476e-bc3c-8511fb8896e4 [:db_insert_placeholder_3] => en ) in Drupal\Core\Entity\Sql\SqlContentEntityStorage->save() (line 817 of /app/web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php).

    I wasn't able to apply the patch in πŸ› Dead lock when importing terms. Active ; it has diverged too much from the latest changes to the 2.x branch.

    I'm promoting this issue to "Major", as it:

    1. Triggers a PHP error through [normal use of] the user interface
    2. Renders one feature [menu link import] unusable with no workaround.
    3. Breaks a regression test on 2.x HEAD

    I'm going to mark the other issue(s) as a duplicate of this one: this patch is cleaner and easier to understand.

    @wolcen I think a merge request would show test results better, but I'm happy to collaborate with patches as well. Do you have a preference? If not, or if you'd prefer a merge request, I can open one.

  • πŸ‡¨πŸ‡¦Canada spiderman Halifax, NS

    I seem to be running across this in a client project as well, where importing a vocabulary that has multiple "Other" terms across a handful of top-level topics. The import taxonomies process (we usually run via drush) seems to all work, but just hangs at the end. We also import a small handful of simple menu links into the "main" menu via Structure Sync.

    Applying this patch appears to resolve the hanging at the end of import, and I *don't* see the errors @mparker17 notes from the DrupalCI. This would seem to indicate there's something distinct about the menu fixtures created in the test scenario, but I'm not sure what that would be.

  • πŸ‡ΊπŸ‡ΈUnited States wolcen

    @mparker17 - sorry for the lag in response here. By all means, a merge request or whatever you prefer would be great - feel free to take over on this!

    Let me know if I can possibly clarify things from my perspective/if you have any questions for me.

    I've not had the time recently to dedicate to it - the patch has also been working for our production use case where it had been hanging, but I'll try to get some time free to bring this up and re-run the tests and see if I can reproduce your results.

  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    Awesome! I've created a merge request!

  • πŸ‡ΊπŸ‡ΈUnited States wolcen

    Seeing that fix on the AJAX CI failure (for which I am grateful - I'd had NO clue how to fix that!) I rebased to re-run CI here.

  • Pipeline finished with Success
    10 months ago
    Total: 210s
    #82929
  • Status changed to Needs review 10 months ago
  • πŸ‡ΊπŸ‡ΈUnited States wolcen
  • Status changed to RTBC 5 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mlncn Minneapolis, MN, USA

    Confirmed that the merge request fixes this! Taxonomy import was running, i presume in circles, for more than an hour before i ended drush, added this merge request, and then it imported fine.

  • Pipeline finished with Success
    5 months ago
    Total: 179s
    #195817
  • πŸ‡¨πŸ‡¦Canada blanca.esqueda

    I was having the 'hang' issue too.
    It ended up being missing parents.

    The same condition needed to be fixed for more than one round, to avoid infinite loops because the terms left array was always having values.
    Added a patch to review https://www.drupal.org/project/structure_sync/issues/3464919#comment-157... πŸ› Taxonomy imports with missing parent creates an infinite loop (hang) Needs review , It takes into consideration one than more level of nesting.

Production build 0.71.5 2024