Add static caching to scheduled_publish_get_node_fields() to improve performance when displaying multiple nodes

Created on 6 October 2022, about 2 years ago
Updated 31 August 2024, 3 months ago

Problem/Motivation

When profiling a views data export view which exports 1000 nodes, I found this function scheduled_publish_get_node_fields() which ranked #15 in exclusive wall time, and the functions that it calls take up a non-trivial amount of time as well.

See the below screenshot for the top-level xhprof report, sorted by exclusive wall time descending.

Proposed resolution

Add static caching to this function, since this data shouldn't be changing partway through a request while rendering nodes, and if the data does change, we should probably decide to keep the original/first data.

See the below screenshot for the top-level xhprof diff report, sorted by exclusive wall time descending showing the potential improvement.

With the profiler off I also timed some requests via the web inspector in my browser. For this testing, I changed the export to 5000 nodes to provide a bigger scale to work with. With those numbers I see about a 1.5% improvement, with 4 runs each before/after the patch averaging out to 10.76s before, and 10.6s after. A marginal improvement in real world numbers for sure, but an improvement.

Remaining tasks

User interface changes

Should be none, there may be rare cases/race conditions that would previously produce different results if node fields were changed partway through a request, but that should probably be considered a bug anyway. (A bug that this patch would fix.)

API changes

None

Data model changes

None

📌 Task
Status

Needs work

Version

3.0

Component

Code

Created by

🇨🇦Canada star-szr

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇮🇪Ireland lostcarpark

    Sorry this issue has been left hanging a long time. It dates from before I took over maintainership of the module.

    This is an interesting topic, and worthy of further investigation. However, I wonder is static variable caching the best approach to take?

    The patch needs to be moved into a merge request.

    Static variables cannot persist across PHP instances (i.e. a single HTTP request), so the caching is only effective where the same piece of data is used multiple times within the same instance of PHP. I wonder would it be more effective to use one of Drupal caching mechanisms to speed up access across all PHP instances, and rely on Drupal Cache invalidation to discard stale cache values.

    I know there are several mechanisms for keeping a persistant instance of PHP running. I have a slight concern that there might be cases where static variables could be persistent across multiple requests. If that were the case, there would be no mechanism for invalidating the cached values, so old data would be displayed. I'd want to confident there aren't any such mechanisms in use for Drupal hosting before merging this.

    Finally, it would be nice to have test coverage for this. I'm not sure how you would test in this case, however.

    Setting back to "Needs work" for now.

Production build 0.71.5 2024