Prevent creating new revisions each time a

Created on 12 September 2024, 3 months ago

Problem/Motivation

SchedulerManager::publish() uses hook_scheduler_publish_process() to allow other modules to handle the publishing of a scheduled entity.

According to the documentation a hook implementation should return -1 if an error has occurred and Scheduler should abandon processing this entity with no further action and move on to the next one.

scheduler_content_moderation_integration_scheduler_publish_process() returns -1 if the scheduled content moderation state transition does not (any more) exist.

SchedulerManager::publish() however still saves the entity, resulting in a new revision any time the scheduler cron runs; this new revision is still scheduled to be published...

If you run the scheduler cron each minute, you get (60 * 24) 1440 obsolete new revisions. If the content manager fails to notice that the scheduling failed for 1 month, you get 43.200+ obsolete new revisions. On our servers when a node has 5000+ revisions and you try to edit that node, or that node is reindexed by search_api, things start to fail.

Steps to reproduce

  1. Enable scheduler and scheduler_content_moderation_integration
  2. Enable a content moderation workflow for a content type (e.g. article) with a draft, pending_review and published state. First allow transition to the published state from both the draft and the pending_review states.
  3. Enable scheduler for the article content type
  4. Create an article and save it as draft while scheduling it for publishing
  5. Before the scheduled timestamp (or at least before the next scheduler cron) update the content moderation workflow and remove the draft->published from the allowed transitions.
  6. Run the scheduler cron
  7. Notice that every time you run the scheduler cron, the article gets a new revision

Proposed resolution

Stop saving the entity if a hook_scheduler_publish_process implementation returns -1.

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Active

Version

2.0

Component

Code

Created by

πŸ‡³πŸ‡±Netherlands casey

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @casey
  • πŸ‡³πŸ‡±Netherlands casey

    While looking at SchedulerManager::publish(), I also noticed that if multiple hook_scheduler_publish_process() implementations exist and the first return a "-1" as an error occured, the processing is not stopped. This would mean that any next hook implementation can (and likely will) publish/save the entity.

    I am not sure if this is intended/wanted behavior. Maybe this should be altered into some kind of chain-of responsibility pattern; as soon as a publish_process implementation returns 1 or -1, the processing is considered to be finished.

  • πŸ‡³πŸ‡±Netherlands casey
  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    Thanks for reporting this and giving such a detailed description and steps to reproduce. I follow them and see what I get, then work from there.

  • Pipeline finished with Failed
    3 months ago
    Total: 661s
    #284197
  • Status changed to Needs review 3 months ago
  • πŸ‡³πŸ‡±Netherlands casey

    I've created a MR that aborts (un)publishing process hook invocations on success or failure result and stops saving entity on failure.

    Note that this is a breaking change, I do however think it wonΒ΄t break anything in practice, and rather fixes issues due to unexpected behavior.

  • Status changed to Needs work 3 months ago
  • πŸ‡¬πŸ‡§United Kingdom jonathan1055

    Thanks for working on this. By "breaking change" I presume you are referring to the failures in the API tests.

    I have not actually been able to reporduce the problem. Initially I did not have the Scheduler option "create new revision on publishing" ticked, so I added that and re-did the steps. But I still only get one revision (two in total) even after running cron several times. The log has the error "Publishing failed for 'the title'. scheduler_content_moderation_integration_scheduler_publish_process returned a failure code." after each cron, but no extra revision is created. There must be one other thing in the STR that is missing.

  • Pipeline finished with Success
    3 months ago
    Total: 972s
    #284413
  • Status changed to Needs review 3 months ago
  • πŸ‡³πŸ‡±Netherlands casey

    By "breaking change" I presume you are referring to the failures in the API tests.

    I meant that aborting hook invocations once one implementation returns "1" or "-1" is a (breaking) API change; with semantic versioning in mind, this change should introduce a new major version. I argued however that this change in practice should not be a problem, and even/rather fixes issues due to unexpected behavior (for sites that have multiple publish_process hook implementations).

    The failing tests are however also a result of these API changes; the MR alters SchedulesManager::publish() and SchedulesManager::unpublish() so it no longer saves (and therefor no longer enforces the (un)publish_on to be restored) the entity on failure of a publish_process hook implementation.

    I do think the (un)publish_process hook implementation should ensure the (un)publish_on is kept/restored when it fails to publish the entity (if that is the intended behavior). I altered the failing test so it passes; actually it were the (un)publish_process hook implementations of the test module that caused to tests to fail, so I altered the hook implementations.

  • πŸ‡³πŸ‡±Netherlands casey

    I have not actually been able to reproduce the problem.

    Hmm, that's too bad. Not sure what is missing. To easily reproduce the problem you could alter the scheduler_content_moderation_integration_scheduler_publish_process (or any other publish_process hook implementation) so it always returns "-1". This way you should get a new revision every time you run the scheduler cron.

  • πŸ‡§πŸ‡ͺBelgium fernly

    We have the exact same issue. Only our transitions still exist. It's some nodes that have an unpublish_state (from scheduler_content_moderation_integration) which contains NULL for an unknown reason. That results in many many revisions.

    Whether or not this is reproducable, the issue remains that when -1 is returned in hook_scheduler_publish_process(), the entity is still saved. You can always test this issue by forcing the entity to have an unpublish_state with value NULL, for example setting that value in hook_node_presave() for every node.

Production build 0.71.5 2024