Taxonomy imports with missing parent creates an infinite loop (hang)

Created on 31 July 2024, 5 months ago

Problem/Motivation

When the structure_sync yml file has a term with a parent that is not on the file under the same vocabulary, the it created a infinite loop and it hangs.

Steps to reproduce

Update the parent id of any record on the structure sync yml file with an id that it is not defined on the file (this apply not only for a second but to any nested level).
Run the taxonomy import and it will cycle with not stop.

Proposed resolution

On the first round of the taxonomy import, listLeft is filled with the records that still need to be imported because the parent needed to be process first.
So on the following rounds (more than one because it can be more than one nesting level):
* Fix the condition that include records to the listLeft
---> Check if the term is on the listDone, because if not it will increment the listLeft causing an infinite loop.
---> Check if the term already failed to be imported, before adding it to the listLeft again to avoid an infinite loop.
NOTE: the infinite loop happens because the batch won't end until the listLeft is empty.
* Include a condition to remove the terms from the listLeft if the parent is missing, put those records on the listFailed array, and add a message of why the record failed.
---> Check it is after the first run, as the first run is when all the no processed items are added to the listLeft array.
---> Check parent is not zero - 0 is never added to the listLeft or listDone arrays.
---> Check if the parent is not on the listDone - if it is then the parent has been processed and the record should stay to be processed on a next round.
---> Check if the parent is not on the listLeft - if it is then the parent has not been processed, and the record should stay until the parent is processed on a next round.

Extra:
This piece of code doesn't make sense (it is not needed and it is doing nothing) it compares progress (#of terms) vs max (# of vocabularies).
And even if the condition is fixed to compare # max vs # progress vocabularies, or # max vs # progress terms, finish is never = 1 with that condition. And even fixing this piece of code finish gets at some point the value of 1, it won't matter as the code will still looping and will be finish until the looping ends and get the finish value of 1 already set after the looping.

$context['sandbox']['progress']++;
if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
   $context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
}

Remaining tasks

Create the patch for review.

User interface changes

none

API changes

none

Data model changes

none

🐛 Bug report
Status

Needs review

Version

2.0

Component

Code

Created by

🇨🇦Canada blanca.esqueda

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @blanca.esqueda
  • 🇨🇦Canada blanca.esqueda

    Patch attached,
    It handles missing ancestor parents and infinite loops when parent hasn't been imported yet.

    ie of notices when a record failed to import because missing ancestor parent:

    [notice] Failed to Import "Journal article(16)" into pub_types - missing parent term 41700
    [notice] Failed to Import "Abstract(23)" into pub_types - missing parent term 41700
    [notice] Failed to Import "Book Chapter(96)" into pub_types - missing parent term 23
    

    * Term 16 failed to import because parent term 41700 doesn't exist.
    * Term 23 failed to import because parent term 41700 doesn't exist.
    * Term 96 failed to import, because even when the parent term 23 exists, the term 23 failed to import because its missing parent --- so the parent for term 96 is invalid too.

  • Status changed to Needs review 5 months ago
Production build 0.71.5 2024