- Issue created by @berdir
- 🇨🇭Switzerland berdir Switzerland
Implemented an initial version and seems to be working reasonably well.
TODO:
* Optimize cache invalidation. Right now I just used redirect_list, but on systems that add a lot of redirects, this would be frequently invalidated. I think it should be possible to check the source path against the prefix list on save and invalidate only if a relevant change is detected.
* Still want to do some checks against access logs on real sites to get an idea on how many
* Probably an opt-out settings flag for sites using the alias patch or where request patterns cause a huge amount of variations. - 🇨🇭Switzerland berdir Switzerland
I analyzed the last 100k/400k entries in php.access.log in a single fairly large production project.
For the last 100k, I found 116 unique prefixes after resolving aliases and ignoring those that are only one part. In our case, 80k of those are node and this project does have node/... prefixed redirects, so won't be a huge help. About 59 of those prefixes don't have prefixes, but because most of the frequent prefixes do, it would only save around 1800 out of those 100'000 queries.
For the last 400k, the list grows to 159. still quite manageable, but it would be easy enough to add a hard cut off and not store more patterns, seems like a fair assumption that after 500 unique patterns or something, we're not going to find many frequently requested ones anymore.
Also checked a few different projects, but most actually have at least one node/something redirect for various reasons. So this is for many projects actually not as beneficial as I thought.
I found one projects where I got a success rate (no redirects on that prefix) of 83% out of 100k redirects, with a small tweak (checking for redirects that start with "$prefix/", because that actually has a redirect on /node). On that project I also had access to ~1M log entries, parsing that resulted in 400 prefixes (and 11'000 unique single-component urls). The requests per prefix quickly go below 10 (out of 1M), so I think it's a good idea to cut off this prefix list after 100 or so to save on cache size.
So, quite interesting for *some* projects, but many likely won't benefit (much) from this.
(I may have spent quite a bit more time on this analysis than I expected)
- 🇬🇧United Kingdom catch
Right now I just used redirect_list, but on systems that add a lot of redirects, this would be frequently invalidated. I think it should be possible to check the source path against the prefix list on save and invalidate only if a relevant change is detected.
Could implement ::set() on the cache collector class itself, and merge the prefix from the new redirect into the existing cache item then save it (if it's not already in there). Probably easiest to ::get() first and only change anything if that returns FALSE.
- 🇨🇭Switzerland berdir Switzerland
Yes, set() is a good idea, much better than just invalidate.
This also needs tests, our existing tests pass, but they aren't really good examples, mostly just use URL's that aren't really a path. An actual performance test that lets us inspect if there were redirect queries would be neat.
- 🇬🇧United Kingdom catch
Was going to suggest a performance test, to my knowledge it would be the first ever performance test for a contrib module (outside core or Drupal CMS) - could hit one page where the redirect query is expected, then another page where it's not?
- 🇨🇭Switzerland berdir Switzerland
Yes, that's exactly the idea. My plan is to filter and only assert on redirect queries. Anything else seems way too much pain to keep up with and support various different minor and major core versions.
Also considering to add one to redis, but similar problems there. I could for example assert that there are no semaphore queries, but we're already testing that in the existing full web test that simply asserts that no semaphore, queue, cache, .. tables exist at the end.
- 🇬🇧United Kingdom catch
Yeah should be able to do array_diff() or array_intersect() on a specific list of redirect queries, so it only fails if they're there or missing.My plan is to filter and only assert on redirect queries.