Plan for Statistics 1.x

Created on 28 March 2024, 8 months ago
Updated 16 July 2024, 4 months ago

Problem/Motivation

This module is the one that used to be in core until Drupal 10.x. With the module being removed in Drupal 11.0 and deprecated in 10.3, this module starts in its 1.0 version as a direct equivalent of the core version, to enable users of the existing core module to upgrade without any hassle.

Later versions will evolve following these directions:

  • Basic goals:
    • Track any "view" of any entity "page", possibly also any page, and some sub-totals/grand total, including Views "page" displays (and others)
    • Configure which of the countable view kinds are actually tracked
    • Track "transitions", i.e. navigation from one site page to another
    • Operate as an extensible statistics API, to support developers wishing to create additional value on top of this basic module features
    • Views integration exposing those data
    • Basic graphical user flow view par "page": origin distribution, destination distribution
    • Per-language/locale statistics
    • Updated documentation, possibly using Gitlab pages instead of d.o. nodes
  • Stretch goals, meant to be implemented by extensions, not this module itself:
    • Longitudinal (over time) statistics, with configurable granularity
    • Integration with entity events for longitudinal statistics, starting with revisions creation
    • Basic graphical UI for longitudinal statistics, not depending on an third-party services
    • Acquisition statistics
    • Bots detection / filtering
    • Tracking average page view duration (JS onfocus/onblur) as [sum(time), view count]. See Stack Overflow 7389328
    • Statistics for non-rendered pages, like Queue workers, CLI commands, API responses (JSON controllers, JSON API, GraphQL, gRPC...) or SPAs
  • Non-goals:
    • User tracking. The module is meant to store zero PII

    Proposed resolution

    • Version 1.0 of this module is essentially identical to the core version, except for some bug fixes and Drupal 11 / PHP 8.3 support.
    • Versions 1.x will preserve compatibility with earlier versions, and for that reason are not expected to receive significant changes
    • Versions 2.x will be the ones being actively evolved, and this plan details what these changes are expected to be

    Timeline

    • 2024-05-07 - release 1.0.0 β†’
    • 2024-04-08 - release 1.0.0-beta1 β†’
    • 2024-04-01 - working contrib version, all tests passing
    • 2024-03-31 - core issue triage, mostly thanks to @vanessakovalsky and @hellosct1
    • 2024-03-28 - we brainstormed at DrupalCamp Rennes to define goals and non-goals
    • 2024-01-28 - project created, initial plan at 🌱 Plan for statistics in contrib Active

    User interface changes

    • 1.x: add a default view at some point after 1.0.0
    • Many for 2.x including configuration pages, default Views, user flow, longitudinal stats

    API changes

    • 1.x: none
    • TBD

    Data model changes

    • 1.x: none
    • TBD
🌱 Plan
Status

Active

Version

1.0

Component

Miscellaneous

Created by

πŸ‡«πŸ‡·France fgm Paris, France

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @fgm
  • πŸ‡«πŸ‡·France opi
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France nod_ Lille
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France
  • Track any "view" of any entity "page", possibly also any page

    Would this include visits to a view's "page" display? I'd like to see that.

  • πŸ‡«πŸ‡·France fgm Paris, France

    A View "page" display is technically not an "entity view" for Drupal, because it does not so much display the View config entity than it executes it within some context, which runs a plugin happening to be called a "page", so it is really a different thing altogether in terms of mechanism.

    It also has the issue that in most cases it will actually display other (content) entities, sometimes possibly in "page" mode themselves, multiplying the view count, preventing the association of a statistics "view" to the total number of pages seen.

    But that's still an interesting question indeed. Part of the "possibly any page" first item. Editing the IS accordingly.

  • πŸ‡«πŸ‡·France fgm Paris, France
  • Certainly, if it's not feasible or desirable later on, it's possible to convert the views to use blocks and put the blocks on nodes instead.

  • πŸ‡ΊπŸ‡ΈUnited States cosmicdreams Minneapolis/St. Paul

    Is the goal of the statistics module to provide useful tools to augment a Google Analytics-type data analysis or is it to provide telemetry of use of the site to augment a Splunk-like data analysis?

  • I don't use GA. I use statistics on its own for internal standalone metrics with views integration.

  • If you want more detailed feedback, you could try creating a survey of some kind.

  • πŸ‡¨πŸ‡¦Canada xmacinfo Canada

    I use statistics on its own for internal standalone metrics with views integration.

    That's my use case as well.

  • πŸ‡«πŸ‡·France fgm Paris, France

    The goal is pretty much summarized by the bit you quoted : anonymous stats. So it is definitely not an extension to GA, Matomo, and the like, as it will not rely on anything outside the site itself (no cookies, local storage, service workers, etc).

    The basic idea is to focus on stats about what has been put on the screen : currently, the core version is just "nodes in full page mode" but if you look at the plan, the general idea is more about any entity in the modes chosen for counting, and possibly more (e.g. routes, routes+params) and also click-tracking, since it can actually be performed entirely anonymously.

    Also, part of the idea is that the module should be a basis for third-party extensions, not trying to do everything on its own, but offering an integration point on which other services may build. A typical case for this would be stats over time, as the current core (now 1.0.0 in contrib too) only provides snapshot counts. An external service can get data from there by periodic sampling and build a history DB.

    One point that has to keep in mind is that keeping stats, especially over time is costly in terms of DB writess, to an extent some users do not realize. Currently, a very coarse approximation of the cost of stats is just basically O(n) where n is the number of nodes. Now if you add click tracking it jumps to O(n^2). If you add all entity types, the multiplier for O(n^2) jumps by the number of entity types. If you add all view modes, the multiplier jumps again by the average number of bundles per entity type. And if you add history, suddenly you get to O(n^2 * s) where s is the sampling rate.

    Just a typical sample on a small site with a lot of traffic: 10k nodes -> 10k*10k = 100M possible transitions in clicktracking. Keep sampling at one per minute: 1440*100M = 1.44G data points per day. Now, obviously most of these data points will be empty and will not have to be stored (assume blank = 0), but that's still a lot, with a very high maximum theoretical limit.

    Most small and medium sites are not ready for this kind of DB storage. If you look at the historical issues on the core module, you'll find some users finding that just the extra load of the request rate doubling due to statistics was enough to overload their server. Hence the idea to provide an extension point so that these big data sinks can be plugged into the system instead of remaining within it.

  • πŸ‡«πŸ‡·France fgm Paris, France

    @xmacinfo that's how I do it currently. FWIW, here is a View I'm using which you can adapt to your own needs. Beware the renaming that d.o. applied to it, and customize as you see fit.

  • πŸ‡«πŸ‡·France fgm Paris, France
  • πŸ‡«πŸ‡·France fgm Paris, France

    added suggestion from πŸ’¬ can it count the total counts of all pages? Active

  • We could also look at the Visitors β†’ module and see if there are any features it has that are desirable. I've never used that module, but I see that it has

    • View pages count per month.
    • View pages count per day of the month.
    • View pages count per day of the week.
    • View pages count per hour.

    I think some of these would be pretty useful. Though, if you're going to do this, it's certainly best if you don't track each individual view as a page+timestamp. It would be better to only track at the desired level of granularity, such as daily views, and then only keep the count for each day for the last month's worth of days. Older counts could be kept at a month granularity, by adding the daily view counts together and saving it for that month, then starting at zero for the new month.

Production build 0.71.5 2024