commerce_log table is not cleared up

Created on 31 May 2023, almost 2 years ago

Describe your bug or feature request.

I've noticed that the commerce_log table just grows and grows, nothing seems to clean it up.

Specifically, if you delete orders, then the corresponding log entries are not removed either.

Is this 'by design' or just an oversight? I mean, the standard Drupal watchdog wouldn't get cleaned up either, for example: if something that was logged in it was deleted the logs relating to that entity wouldn't get deleted. In fact, you'd want to log the fact that it was deleted :)

I guess, is the commerce_log table essentially a big textual log, that might reference orders etc. or is it actually part of the order?

I'm happy to either provide a patch that'll (optionally) remove log entries on entity deletions or make a contrib project that can add this functionality for sites that want it, but wanted to see if you'd accept it as part of commerce core first?

If a bug, provide steps to reproduce it from a clean install.

Add an item to a cart, delete the order/cart, observe that the log entry for adding something to the cart is now in the commerce_log table.

โœจ Feature request
Status

Active

Version

3.0

Component

Log

Created by

๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom steven jones

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @steven jones
  • ๐Ÿ‡ท๐Ÿ‡ธSerbia bojanz

    Feels like an oversight from my perspective.

  • ๐Ÿ‡ฎ๐Ÿ‡ฑIsrael jsacksick

    I think it's an oversight, the log table grows really quickly on many of the projects I work on I disabled the cart related logging (add to cart, remove from cart etc..) otherwise on sites with high traffic the table can quickly reach GBS.

    Additionally, I also have cron jobs responsible for cleaning up the table from time to time (e.g events older than x etc)...
    I believe we need to expose several settings from the UI (the ability to toggle the logging of events for example).

    There's actually quite some data that's not cleaned up whenever an order is deleted, I think we need to add this for logs... But also a UI for configuring for how long logs are kept.

    The "challenge" might be that you can't just rely on the log created time as you may have log events created a month ago related to an order that is still a cart for instance?

  • Status changed to Needs work almost 2 years ago
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom steven jones

    I've added a start on this here:

    https://git.drupalcode.org/issue/commerce-3363849/-/tree/3363849-commerc...

    At the moment, simply implementing the hook to remove entries when deletes happen, and adding an update hook to remove already orphaned log messages.

  • ๐Ÿ‡ฎ๐Ÿ‡ฑIsrael jsacksick

    This could potentially hang on sites with millons of orders though.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom steven jones

    This could potentially hang on sites with millons of orders though.

    Yeah, agreed, it's not great, and it doesn't correctly invoke any hooks when deleting the entries, because they are technically entities too.

    Alternatives that I can think of:

    • Provide a Drush command to perform one-off maintenance to the commerce_log table
    • Provide documentation to perform one-off maintenance to the commerce_log table
    • Do the deletes on cron, removing a maximum number of entities per cron run

    Any other ideas or thoughts?

  • ๐Ÿ‡ฎ๐Ÿ‡ฑIsrael jsacksick

    Maybe the drush command makes sense here as the cron job isn't going to be necessary anymore once this is fixed as there shouldn't be oprhaned log entities anymore once this is fixed right?

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States rszrama

    If we wanted a future proof solution here, I think we should at least delete logs for entities that get deleted, but it would also be nice to add the following:

    1. Provide a UI for disabling certain logs (e.g. add to cart logs) if they aren't important to a site.
    2. Provide a way to delete logs based on category after X time (e.g. maybe I want add to cart logs but can happily delete them 1 week after an order is placed).
    3. Provide a way to "archive" but not delete logs - essentially, denormalize them and save them in a JSON blob after X time.

    The challenge with archiving, of course, is we then need an alternate view of the activity log on orders once we've denormalized and archived the log entries. We also have to decide if admin comments on an order should still be permitted after that point ... perhaps it's fine to continue accruing new logs even after a batch has been archived, and we'd just need to be able to move the new ones to the archive after a period of time?

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom steven jones

    How about we make the scope of this issue really narrow, so that we stop commerce_log messages being completely orphaned, as they are today.

    I'll remove the update hook that retrospectively removes orphans, because you can't easily solve that problem generally, and if you can make some assumptions you can do it all with a single (albeit long running) SQL query all by yourself.

  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Core: 9.5.x + Environment: PHP 8.0 & MySQL 5.7
    last update almost 2 years ago
    783 pass
  • @steven-jones opened merge request.
  • Jumping in with a ping so that this might float to the top. I think the default setup should be to choose what entities you want to log.

    I agree, certainly log history for deleted entities should be deleted.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States TomTech
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States rszrama

    Encountered this on another client project who thought they'd sanitized a data set but didn't realize their log table was full of information. I agree, we can at least get the entity delete hook in place, but I think we could also write a more intelligent update hook that executes by default unless disabled by a setting (like we've done for expensive updates in the past), leaving it to sites that suppress the update to clean up their log table on their own time.

    A single DELETE query could use a WHERE NOT EXISTS clause vs. trying to construct an array of unknowable large numbers of order IDs. Something like:

    DELETE FROM {commerce_log} WHERE NOT EXISTS(
        SELECT * FROM {commerce_order} co WHERE co.order_id = {commerce_log}.source_entity_id
    );
    
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom steven jones

    The MR has this already: https://git.drupalcode.org/project/commerce/-/merge_requests/163/diffs?c... I reverted it, but I suppose it could be kept and wrapped in a flag as you suggest.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States TomTech

    The reverted database update selects all order ids, then uses it in the delete statement.

    That would work with a nominal number of orders, but on sites with hundreds of thousands, or millions of orders, this would not be performant, presuming it completed at all.

    The proposed singular not exists delete statement in #3363849-13: commerce_log entries are not removed on source entity delete โ†’ is much more performant, and could handle millions of rows.

    For some sites that do have a larger number of orders, they might prefer to handle this manually in a separate task than during database update execution. This is where the flag that skips the execution is useful.

    The preferred approach would have the not exists statement wrapped in the execution flag.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom steven jones

    The version in the MR builds the same sort of single SQL query as your proposed NOT EXISTS query, does it not? Itโ€™s a single, nested query, not two queries.

    Iโ€™ve not put both statements through a query planner but would be surprised if they donโ€™t basically boil down to doing the same thing.

Production build 0.71.5 2024