Batchify and optimize field scan (dangerous tags in content)

Issue created by @eelkeblok
Comment over 1 year ago →
🇳🇱Netherlands eelkeblok Netherlands 🇳🇱
Comment over 1 year ago →
🇳🇱Netherlands eelkeblok Netherlands 🇳🇱
Pushed some work in progress, not functional ATM.
Merge request !62Implement field scan in a batch → (Closed) created by eelkeblok
Pipeline finished with Success
over 1 year ago
Total: 238s
#116357
Status changed to Needs review over 1 year ago10:39am 11 March 2024
Comment over 1 year ago →
🇳🇱Netherlands eelkeblok Netherlands 🇳🇱
This refactors the field scan into a batch process, doing the fields we want to scan 1000 rows at a time.

I've combined the querying of the database to do all columns at once (it does now ask the field processing method whether it would like to scan the ID column as well, but that seems to be a small price to pay for efficiency, as it does return quickly because the ID is not a text column).

The progress reporting is a bit wonky, as it counts every entity type equally, as well as every field within each entity; the progress is calculated as a simple fraction of the total numbers. This means that an entity without any scannable fields counts as heavy as an entity with many scannable fields. In practice, this means it is quite choppy; it can make huge jumps when it gets a bunch of entities that have noting of interest, and then seem to get stuck for a while, when scanning a text field that has a lot of data (the percentage with a decimal position we added for the individual scan progress does help there). More accurate would be to find out which fields are scannable up front and see how many rows there are to scan, and then keep a grand total of scanned rows. Still, this is a huge improvement with my "site of interest", which has a lot of user generated content.
Comment over 1 year ago →
🇳🇱Netherlands eelkeblok Netherlands 🇳🇱
BTW, I don't think this is a must-have for 3.0, could easily wait for a 3.1.
Comment over 1 year ago →
System Message

smustgrave → committed fc6cd236 on 3.0.x
Issue #3422990 by eelkeblok: Batchify and optimize field scan (dangerous...
Comment over 1 year ago →
🇺🇸United States smustgrave
Tested locally and still appears to be functional. Thanks!
Status changed to Fixed over 1 year ago7:39pm 29 May 2024
Comment over 1 year ago →
System Message
smustgrave → closed merge request !62
Comment over 1 year ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Batchify and optimize field scan (dangerous tags in content)

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!62Batchify and optimize field scan (dangerous tags in content)
Closed

Comments & Activities

Batchify and optimize field scan (dangerous tags in content)

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!62Batchify and optimize field scan (dangerous tags in content)Closed

Comments & Activities

!62Batchify and optimize field scan (dangerous tags in content)
Closed