Postpone URL generation until later in context upload process

Created on 23 October 2024, 3 months ago

Problem/Motivation

The context upload queue generates a URL at the time of creating the translation job. This URL is an absolute URL that only really works on the prod/live site. If we grab the DB from the prod site and pull it down to localdev, the absolute URL won't work. Is there any way to generate the absolute URLs when processing the queue item?

Right now, we get [error] Got empty context for https://example.com/en/node/9376 url.. Because we can't authenticate across from localdev to the live site and retrieve the relevant context.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Active

Version

9.0

Component

Code

Created by

heddn Nicaragua

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @heddn
  • What is your process of submitting content? Context is generated when you submit content from Drupal to Smartling. Why is it important to submit content from prod/live but submit context from local/dev env? Just want to understand your case.

    If you need to test context then just submit content from your local/dev env, it will generate link to your env. If you have local/dev env pointing to the same prod/live envs Smartling projects (and if your smartling bucket names (can be checked at provider config page at the bottom) are the same on those envs and they are likely the same as you copy database as I get) then you have 2 problems:
    1. You have files with different content but with the same names being pushed to Smartling resulting in overriding content (Smartling project may contain content from local/dev env mixed with prod/live content).
    2. You local/dev instance may run cron and download translated content to your local/dev env resulting in the issue when prod/live never downloads translated content because it was already downloaded on another instance.

    Make sure your local/dev env uses different smartling project. I mentioned this because your flow with context sounds like this case.

    > If we grab the DB from the prod site and pull it down to localdev, the absolute URL won't work. Is there any way to generate the absolute URLs when processing the queue item?

    We could rewrite URL generation, and do it while processing queue item but for me, it sounds like a misuse of connector. I mean I don't see a reason/benefit in doing this because submitting content and its related context must happen from one env.

  • heddn Nicaragua

    Queued items execute in a cron back queue. If all of the items aren't processed successfully before we grab a db dump from the live site, the queued items get pulled down locally when we grab the db.sql.gz file. At that point, the URL and everything is different. The absolute URL that was generated when setting up the items in the queue is now wrong.

  • I wouldn't consider this an issue. It doesn't affect your translation process in any way. You just have error logs, which is expected because you run prod data processing on a completely different environment.

    If you look at the issue from another angle - let's say we implement this change, then it would be incorrect that you will get url from prod, "switch" domain to your local one, grab this context, and upload to some completely different smartling project (because you need to have different projects per different envs). Of course, an annoying error log would disappear but the whole process would have no logic at all.

  • heddn Nicaragua

    Always pulling context from the local environment seems to make more sense but if we just need to ignore errors around this when syncing data from live back to uat, we can do that.

  • heddn Nicaragua

    Another thought about this whole URL generation thing. I just opened an internal issue for us to work on Workspaces support. Since we are actively using workspaces to create our content, we can't just use the URL of the node to get context in cron. You also have to have the active workspace set. Instead of postponing all the data collection for context until a cron job, what if this was all done at time of job submission? You wouldn't even need to worry about running any of this context user logic at all. Because the logged in user can see his/her content.

  • heddn Nicaragua

    Stream of conscience, we might still have an issue. TMGMT lets you support add to cart. And later checkout a bunch of items. Comment in #6 assumes that all content is in the currently active workspace, which might or might not be the case. We almost want to do the context gathering at time of "add to cart", not at time of job submission.

  • heddn Nicaragua

    https://www.drupal.org/project/tmgmt_workspaces now exists as a thing. This means we'll need to have the context much _earlier_, not later.

  • > Always pulling context from the local environment seems to make more sense

    It makes sense with queue items generated from local env. It doesn't make sense with data generated on another env. Context still works if you submit content from your local env. You can always remove context related queue items from the queue table during your db sync process I think.

    > Instead of postponing all the data collection for context until a cron job, what if this was all done at time of job submission?

    This was specifically done through the queue to avoid slowing down the request translation process (which is already quite slow due to amount of API calls). In theory, can be done right away. You can decorate `RequestTranslationSubscriber` and override `onUploadRequest` so that it uploads right away instead of putting to the queue. But if you have TMGMT Job with 100500 TMGMT Job Items then you will sequentially upload those contexts and you will fail 100% due to max execution timeout. Or it will require rewriting the whole submitting process to run drupal batch operation, don't know.

    > You wouldn't even need to worry about running any of this context user logic at all. Because the logged in user can see his/her content.

    In your particular case, yes. But I can imagine a separate user with a special role that can only submit context for translation (different clients have different flows). So, again, this user switching logic was implement with intention that clients will create separate user with special permissions assigned for context.

    > We almost want to do the context gathering at time of "add to cart", not at time of job submission.

    Context is being assigned to a smartling source file. I think when content is added to a cart then those tmgmt jobs do not have translators assigned yet thus we can't get a filename yet. At least this is what I see for now.

    This looks like a patch to TMGMT to be honest, why separate module?

Production build 0.71.5 2024