Reduce redundant batch runner

Created on 16 September 2023, about 1 year ago

Problem/Motivation

With the batch processor, I found the batch always runs twice.

Steps to reproduce

Create new translation job and run the translation with OpenAI translator.

🐛 Bug report
Status

Active

Version

1.0

Component

Code

Created by

🇹🇼Taiwan amourow

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @amourow
  • Merge request !5Fix redundant batch process → (Open) created by amourow
  • Status changed to Needs review about 1 year ago
  • 🇹🇭Thailand AlfTheCat

    Hi @amourow, I tried to apply the patch but (and this is perhaps caused by me applying the diff patch of Make the translator able to translate longer HTML content Fixed ) my terminal reports that the patch is already applied.

    When I reinstall the module and try the patch it fails:

    sudo -u www-data patch -p1 < 5.diff
    patching file src/Plugin/tmgmt/Translator/OpenAiTranslator.php
    Hunk #1 FAILED at 351.
    1 out of 1 hunk FAILED -- saving rejects to file src/Plugin/tmgmt/Translator/OpenAiTranslator.php.rej
    patching file tmgmt_openai.install

    The batch process does indeed seem to run in a funny way. I attached a brief recording of it.

    I suppose it will be easier when we have the dev branch available in composer 💬 Can we make 1.0.x-dev available for composer Needs work , thanks for opening that issue.

  • 🇹🇼Taiwan amourow

    @AlfTheCat

    The screen recording is funny.

    Did you make one translation request or you did multiple at once?

    Now the 1.0.x-dev can be installed via composer, would you try only include the MR patch from this issue?

  • 🇹🇭Thailand AlfTheCat

    @aumoro yeah it sure is! I requested translations for a single language but for multiple menu links in that video. I've configured the number of jobs to run at cron to 3, in the openai provider settings. I think that's where the "1 of 3" comes from but what exactly is happening under the hood is hard to tell.

    I'll try this again in a different environment and using the dev branch now that it's available through composer, which is great news!

  • 🇹🇭Thailand AlfTheCat

    I now used the dev branch and then applied this patch, which worked this time. However the batch still seems to run strangely.

    It seem like, if I submit 10 nodes that have 7 translations and 5 translatable fields, the batch shows the progress of translating the fields. So it says 1/5 instead of 1/10 nodes. I think this is why it starts over again before the progress bar completes. I don't know if it's also double-submitting the field values.

    Hope this helps.

  • 🇹🇼Taiwan amourow

    The scope of batch process in this module is for one translation and one node.

    In your case, the batch progress shows you 1/5 is correct to me.
    If you have a long text field with long text in it, the batch process may have more steps, because it will split the text into chunks and call the OpenAI API in separated steps.

    TMGMT job has it's own batch process for multiple translations and nodes, I believe.
    Let's focus on one translation one node here.

  • 🇹🇭Thailand AlfTheCat

    OK thanks, that clarifies things. The batch running the way of my video is a bit counter-intuitive and confusing but I understand now we're not there yet :). I tested this now against the latest dev but I run into 🐛 Drupal\Core\Entity\EntityStorageException: Update existing 'tmgmt_job_item' entity while changing the ID is not supported Active , which also happens without applying the patch so that must be related to the dev version. I rolled back to RC2 and it works again.

  • 🇹🇭Thailand AlfTheCat

    Interesting observation, in my case actually the duplicate jobs are only created if I request translations via the main TMGMT dashboard at admin/tmgmt/sources, using RC2.

    If I request them from the node translation tab, no duplicate jobs are created.

  • 🇹🇭Thailand AlfTheCat

    I did a fresh install and using the latest dev, and the patch from this issue. The duplication no longer occurs when translating either single entities to a single language, or to multiple languages and multiple entities from the main TMGMT Sources UI. It works well from the "Translate" tab on entities too.

    The batch process is a bit confusing when translating multiple entities, and perhaps also a bit when translating a single entity because the progress bar will run all the way back to "Completed 1/*." after first completing all but the final iteration. When it runs again, the job is already marked as finished in the TMGMT Jobs overview, and temporarily in the Sources overview the entities show up as untranslated (not in progress or any other status) and translations do not yet appear until the process finishes the second run.

    I suppose the batch api has its limitations outside the scope of this issue, so the issue itself seems to be resolved :)

  • 🇹🇼Taiwan amourow

    @AlfTheCat, thanks for the updates, good to know it resolved the redundant batch. I finally have time to review the issues now.

    I hope someone can also have a look here and let the maintainer know this is RTBC.

  • 🇹🇭Thailand AlfTheCat

    Hi @amourow, awesome! Also looking forward to additional testers :)

  • 🇫🇷France nicolasgraph Strasbourg

    Hi @amourow, I think the queue worker and the typo fix in the .install should not be in the current merge request as it is not related to the issue. There was an interesting talk from @xjm on code review and good contributing practices in DrupalCon Lille; slides are available here : https://drive.google.com/file/d/1o6wIX75wXI_XE80NFj_QRJwkM578sFnS/view.
    I also suggest you take a look at the tmgmt_deepl module and the way the queue worker is handled as for now you do not create queue items at all as far as I can see. And the module also use the batch API.

  • 🇫🇷France nicolasgraph Strasbourg

    @amourow, I can't find batch processes running twice on 1.0.x-dev.
    Can you give it a try?

  • Issue was unassigned.
  • Status changed to Needs work 3 months ago
  • 🇬🇧United Kingdom scott_euser

    Hmmm it seems the merge request has diverged quite a bit from the issue summary. Not against that but would be helpful to have an updated issue summary so it could be easier tested to eventually get it over to the maintainer for final review.

  • 🇬🇧United Kingdom scott_euser

    Moving this over to https://www.drupal.org/project/ai_tmgmt - I will work on porting your MR over to that and testing. Thanks!

  • 🇬🇧United Kingdom scott_euser

    🐛 Same Job processed/checkout twice by the JobCheckoutManager Needs work is the solution for the original issue

    BUT the MR is useful as it provides a QueueWorker. I have updated the issue summary to reflect the code being proposed here.

  • 🇬🇧United Kingdom scott_euser

    scott_euser changed the visibility of the branch tmgmt_openai-3387691 to hidden.

  • 🇬🇧United Kingdom scott_euser

    Okay I have done the following:

    1. Updated the MR to target the new module.
    2. Switched the job start to first queue items then immediately batch process the queue to address 🐛 "Zombie" job items are left unfinished in the queue if translation doesn't complete Active . See my notes there.

    Would appreciate it if one of you could please review.

  • 🇹🇭Thailand AlfTheCat

    Hi all, great to see this module moved into the ai ecosystem and excited about all the work being done.

    I tried to patch the module again with the plain diff from MR !2 and getting a partial failure:

    sudo -u www-data patch -p1 < 2.diff
    patching file .cspell-project-words.txt
    Hunk #1 FAILED at 8.
    1 out of 1 hunk FAILED -- saving rejects to file .cspell-project-words.txt.rej
    patching file src/Plugin/QueueWorker/AiTranslatorWorker.php
    patching file src/Plugin/tmgmt/Translator/AiTranslator.php
    
  • 🇬🇧United Kingdom scott_euser

    Resolved merge conflict to 1.0.x branch

  • Assigned to scott_euser
  • Status changed to Needs review 23 days ago
  • 🇬🇧United Kingdom scott_euser

    Okay this is now ready, I can confirm after a few minutes cron picks it up. TMGMT cron has to run first, then subsequently queue from this module. Queue can be controlled via for example Simple Cron module (has a setting to enable more fine-grained control of queue workers).

    Updated issue summary

  • 🇹🇭Thailand AlfTheCat

    Hi Scott,
    I've applied the patch from !MR2 again but I'm still getting duplicate jobs.

    I've tested as follows:

    1 - I requested 5 translations for a single node. I checked the AI logger log, and it looked good, no duplicates.

    2 - I then checked the Sources page, and it showed there were completed translations for the node, but also inactive jobs for each language. So for each language there are 2 jobs.

    3 - I ran TMGMT cron, and the sources page showed me that now the first language had its second job marked as "in progress". That progress is stuck, it doesn't complete.

    I attached a screenshot of the final result.

    Thanks for the hard work on this, it will be amazing when this is working.

  • 🇬🇧United Kingdom scott_euser

    Thanks for testing! The duplicate Jobs is from TMGMT core https://www.drupal.org/project/tmgmt/issues/3198609 🐛 Same Job processed/checkout twice by the JobCheckoutManager Needs work - to avoid duplicate jobs you should feed back there.

    I haven't tried with more than 1 translation languages actually, looking your screenshot 1 language got translated, others not. I'll take a look at that one.

  • 🇬🇧United Kingdom scott_euser

    Tried it now with many languages, many chunks.

    One issue fixed
    I did spot 1 issue: the cron was not able to get started for quite a while which can lead to the job appearing stuck. This is because continuous jobs were claiming the queue items yet could not process themselves. Eventually they'd expire and cron could take over.

    I fixed that so any continuous job just adds the item to the queue for processing.

    Frequency of cron
    BUT if you don't use something like Simple Cron to greatly increase the frequency, it'll take a long time to process all the chunks from all the languages. So eventually all your items would have been processed, but if you use drupal core defaults, e.g. 1 or 3 hours for cron, it can take a long time.

    So steps here to have greater control of the frequency:

    1. Enable Simple Cron module
    2. Under Simple Cron settings tick 'Queue workers override'
    3. Change the frequency of the cron jobs for 'The Translation Management Core module cron' and 'Queue worker: AI TMGMT translate queue worker'

    You can also manually run the jobs using the 'Run' button simple cron provides.

    1. Run 'The Translation Management Core module cron' once
    2. Run 'Queue worker: AI TMGMT translate queue worker' as many times as needed until all chunks are processed. A number of chunks are run each time, but cron decides that based on how long its taking

    And of course applying the patch at https://www.drupal.org/project/tmgmt/issues/3198609 🐛 Same Job processed/checkout twice by the JobCheckoutManager Needs work will mean half the amount of work to do.

  • 🇬🇧United Kingdom scott_euser

    Keen to get this in: if someone can do a final review would be much appreciated!

  • 🇹🇭Thailand AlfTheCat

    Hi Scott, been very busy the past few days but will test tomorrow morning. Been on top of my list, so looking forward to it :)

  • 🇬🇧United Kingdom scott_euser

    Awesome thank you!

  • 🇹🇭Thailand AlfTheCat

    Hi Scott, I've tested again and I'm still getting the same result. When requesting a manual translation for a content type that is also covered by a continuous job (the continuous job being created after the node was created), I get a duplicate job that remains in the "processing" stage.

    Just to make sure I have the right code, I'm using the "plain diff" link at the top of this issue to generate a patch (https://git.drupalcode.org/project/ai_tmgmt/-/merge_requests/2.diff).

    I noticed is that I don't have the "AI TMGMT translate queue worker" available. I use the ultimate cron module. I can run the Translation Management Core module cron but I don't have an option of running the other one. I'm using the latest dev version of the module.

    Hope this helps, thanks for the quick updates, I'm ready to test further.

  • 🇬🇧United Kingdom scott_euser

    Yeah it's simple cron not ultimate cron that has the ability to manually trigger the queue worker. Hope that helps!

  • 🇬🇧United Kingdom scott_euser

    Updating issue summary to cover 🐛 Translating more than one source mixes up translations Active as there is too much code overlap to do that in two separate MRs.

    Screencast of this solved now in this MR:

  • 🇹🇭Thailand AlfTheCat

    Hi Scott, great work, the 🐛 Translating more than one source mixes up translations Active issue is resolved on my side with the latest patch. The continuous job is still triggered however, so after manual translation is finished, a new job is still created for each translation.

  • 🇬🇧United Kingdom scott_euser

    Thanks for retesting!

    The continuous job is still triggered however, so after manual translation is finished, a new job is still created for each translation.

    I suspect that that is also TMGMT core; if you have a continuous job set in Deepl and you manually create a job, do you have both? I think yes, because if I understand the logic in TMGMT right, the Jobs are independent of each other:

    1. You have a continuous Job that the content item matches the criteria for: it gets Job Items created
    2. You created a new independent Job, so it gets Job Items attached as you selected them

    Possibly helpful:

    But I don't think its something we can address here; the Job and Job Item creation happens before ai_tmgmt module kicks in, we only respond it once TMGMT core has done its thing.

  • 🇹🇭Thailand AlfTheCat

    Thanks very much, that clarifies it. I'll run a test with Deepl but I guess it's a TMGMT core issue, indeed.

    I didn't know the TMGMT Extension Suite module existed, I'll check that one out too.

    Another interesting option might be to use ECA instead of continuous jobs for automated translations as it would cover the edge case of 🐛 WSOD when inserting content governed by a continuous job for an entity that uses Automators running via cron Active as well, because with ECA it's possible to add an "field is empty" condition, and the "field value has changed" condition is useful as too. ECA models can run on queue so that might be a better approach for continuous translations that require extra logic.

    Currently ECA is not yet able to create translations, but ECA Content: Create translation of an existing entity Active is in the queue and perhaps it can make it into ECA 2.1.

  • 🇬🇧United Kingdom scott_euser

    I don't know much about ECA myself but sounds sensible. If there is anything this module needs to change to be ECA compatible (that isn't TMGMT generic, otherwise should be raised there) I think we should keep it as a separate issue.

    So I believe that means all bits you have raised are either resolved or not related to this particular module and this is now ready to merge, correct?

    In any case thank you for all the time you put in to testing this. It's very valuable and much appreciated contribution!

  • 🇹🇭Thailand AlfTheCat

    Thank you Scott for the kind words and certainly for the brilliant work. It's been my pleasure to do the testing.

    Just yesterday I was able to run translations on 10 nodes at a time, each with 7 languages and it worked perfectly. This is really a game changing feature when it comes to translating sites with a lot of content and a lot of languages. The fact that AI can be leveraged to translate content is one thing, but to actually be able to do it in a scalable fashion is a huge challenge in and of itself. Your work here and that of the others involved is a massive contribution to Drupal's future in my opinion.

  • Pipeline finished with Skipped
    2 days ago
    #335970
    • scott_euser committed 5330f512 on 1.0.x
      Issue #3387691 by scott_euser, amourow, alfthecat, nicolasgraph: Convert...
  • 🇬🇧United Kingdom scott_euser

    Great thanks for confirming, and yes great momentum all around!

Production build 0.71.5 2024