Service worker caches assets (and the site itself) on excluded pages

Created on 12 September 2023, about 1 year ago
Updated 13 September 2023, about 1 year ago

Problem/Motivation

The service worker caches assets (and the site itself) on excluded pages. Meaning "urls_to_cache" has NO impact, the default values:

admin/.*
user/.*

Won't do anything. The entire admin "backend" is still being cached.

Steps to reproduce

  • Leave "urls_to_cache" empty or add "/".
  • Exclude a node url.
  • Go to the excluded node
  • Make sure to update / activate the new service worker in the "Application => Service Workers" Tab in Chrome Dev-Tools (you can also unregister and register the service worker again to be 100% sure).
  • Go offline.
  • The page together with all its assets is still being cached.

This also affects the "admin" and "user" pages, if we keep the default installation config.

Proposed resolution

Fix the issue and make the exclusion work as expected. We might come to a point, where the site isn't cached anymore, but the assets of the page are still being cached. If we reach this point, we should think about returning early somehow, if we are currently on an excluded page.

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Active

Version

2.0

Component

Code

Created by

🇩🇪Germany Grevil

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Grevil
  • 🇩🇪Germany Grevil

    Basically, "notExcludedPath" always returns true:

    Console output:

    Most of these are asset urls, where I do understand, that they are not excluded, since they do not point to the node url but rather "/core/...". But the page itself is obviously cached itself.

    Furthermore, most code inside the "fetch" method seems to only target assets. Should we instead exclude these urls on service worker "install"?

  • 🇩🇪Germany Anybody Porta Westfalica

    Indeed, this is an issue!

    • The assets from blacklisted URLs should not be handled at all (not even be crawled)
    • The admin pages should not be crawled with this default pattern present

    Let's try to find out why and how... and how to fix it! Perhaps there are service worker / fetch best practices we can use... and eventually a regex issue on the other hand.

  • 🇩🇪Germany Grevil

    Inside the PWAController's "fetchOfflinePageResources()" method, we should simply return an empty array early, if the current page is on the "exclude" list. That would fix the problem with the assets being cached on excluded urls.

    BUT, the page itself is also getting cached and this is usually done inside the "install" call of our service worker script through "addResourcesToCache()", see here. But it looks like, our script is doing it another way.
    Meaning, once we fixed the asset caching problem, we still need to find out, how the pages itself are cached.

  • 🇩🇪Germany Grevil

    Here is another good example, on how this could be handled:
    https://wordpress.org/support/topic/prefetch-a-list-of-urls/

Production build 0.71.5 2024