- Issue created by @steven jones
- ๐ฎ๐ฑIsrael jsacksick
I think it's an oversight, the log table grows really quickly on many of the projects I work on I disabled the cart related logging (add to cart, remove from cart etc..) otherwise on sites with high traffic the table can quickly reach GBS.
Additionally, I also have cron jobs responsible for cleaning up the table from time to time (e.g events older than x etc)...
I believe we need to expose several settings from the UI (the ability to toggle the logging of events for example).There's actually quite some data that's not cleaned up whenever an order is deleted, I think we need to add this for logs... But also a UI for configuring for how long logs are kept.
The "challenge" might be that you can't just rely on the log created time as you may have log events created a month ago related to an order that is still a cart for instance?
- Status changed to Needs work
almost 2 years ago 1:54pm 5 June 2023 - ๐ฌ๐งUnited Kingdom steven jones
I've added a start on this here:
https://git.drupalcode.org/issue/commerce-3363849/-/tree/3363849-commerc...
At the moment, simply implementing the hook to remove entries when deletes happen, and adding an update hook to remove already orphaned log messages.
- ๐ฎ๐ฑIsrael jsacksick
This could potentially hang on sites with millons of orders though.
- ๐ฌ๐งUnited Kingdom steven jones
This could potentially hang on sites with millons of orders though.
Yeah, agreed, it's not great, and it doesn't correctly invoke any hooks when deleting the entries, because they are technically entities too.
Alternatives that I can think of:
- Provide a Drush command to perform one-off maintenance to the commerce_log table
- Provide documentation to perform one-off maintenance to the commerce_log table
- Do the deletes on cron, removing a maximum number of entities per cron run
Any other ideas or thoughts?
- ๐ฎ๐ฑIsrael jsacksick
Maybe the drush command makes sense here as the cron job isn't going to be necessary anymore once this is fixed as there shouldn't be oprhaned log entities anymore once this is fixed right?
- ๐บ๐ธUnited States rszrama
If we wanted a future proof solution here, I think we should at least delete logs for entities that get deleted, but it would also be nice to add the following:
- Provide a UI for disabling certain logs (e.g. add to cart logs) if they aren't important to a site.
- Provide a way to delete logs based on category after X time (e.g. maybe I want add to cart logs but can happily delete them 1 week after an order is placed).
- Provide a way to "archive" but not delete logs - essentially, denormalize them and save them in a JSON blob after X time.
The challenge with archiving, of course, is we then need an alternate view of the activity log on orders once we've denormalized and archived the log entries. We also have to decide if admin comments on an order should still be permitted after that point ... perhaps it's fine to continue accruing new logs even after a batch has been archived, and we'd just need to be able to move the new ones to the archive after a period of time?
- ๐ฌ๐งUnited Kingdom steven jones
How about we make the scope of this issue really narrow, so that we stop commerce_log messages being completely orphaned, as they are today.
I'll remove the update hook that retrospectively removes orphans, because you can't easily solve that problem generally, and if you can make some assumptions you can do it all with a single (albeit long running) SQL query all by yourself.
- last update
almost 2 years ago 783 pass - @steven-jones opened merge request.
Jumping in with a ping so that this might float to the top. I think the default setup should be to choose what entities you want to log.
I agree, certainly log history for deleted entities should be deleted.
- ๐บ๐ธUnited States rszrama
Encountered this on another client project who thought they'd sanitized a data set but didn't realize their log table was full of information. I agree, we can at least get the entity delete hook in place, but I think we could also write a more intelligent update hook that executes by default unless disabled by a setting (like we've done for expensive updates in the past), leaving it to sites that suppress the update to clean up their log table on their own time.
A single
DELETE
query could use aWHERE NOT EXISTS
clause vs. trying to construct an array of unknowable large numbers of order IDs. Something like:DELETE FROM {commerce_log} WHERE NOT EXISTS( SELECT * FROM {commerce_order} co WHERE co.order_id = {commerce_log}.source_entity_id );
- ๐ฌ๐งUnited Kingdom steven jones
The MR has this already: https://git.drupalcode.org/project/commerce/-/merge_requests/163/diffs?c... I reverted it, but I suppose it could be kept and wrapped in a flag as you suggest.
- ๐บ๐ธUnited States TomTech
The reverted database update selects all order ids, then uses it in the delete statement.
That would work with a nominal number of orders, but on sites with hundreds of thousands, or millions of orders, this would not be performant, presuming it completed at all.
The proposed singular not exists delete statement in #3363849-13: commerce_log entries are not removed on source entity delete โ is much more performant, and could handle millions of rows.
For some sites that do have a larger number of orders, they might prefer to handle this manually in a separate task than during database update execution. This is where the flag that skips the execution is useful.
The preferred approach would have the not exists statement wrapped in the execution flag.
- ๐ฌ๐งUnited Kingdom steven jones
The version in the MR builds the same sort of single SQL query as your proposed NOT EXISTS query, does it not? Itโs a single, nested query, not two queries.
Iโve not put both statements through a query planner but would be surprised if they donโt basically boil down to doing the same thing.