Add tracking for when recipes from Drupal.org are applied

Created on 21 November 2024, 4 months ago

Problem/Motivation

Drupal.org will have recipe browsing and a recipe browsing API at some point. For the results to be good, we need to know what recipes are popular.

Right now, we have download numbers for recipes, like https://packagist.org/packages/drupal/events_recurring/stats. Those numbers aren’t necessarily great since it measures when composer is run with the recipe as a dependency. And the key action for a recipe is its application, not download.

It would be better to have an API, similar to update status. So we have metrics for when recipes are applied.

Proposed resolution

The server-side is ready: https://updates.drupal.org/recipe-applied. Our CDN returns a static synthetic response immediately. Like update status, the data will be in query parameters and then we do log analysis to get useful data.

Conditions for sending data:

  • Respect opt in/out options
  • When the recipe is applied
  • Only if the recipe is from Drupal.org, in the drupal/ namespace on Packagist.org, like https://packagist.org/packages/drupal/events_recurring
  • If there is a way to know if GitLab CI is being used, not sending for CI would be ideal

Data to send:

  • name recipe name, like events_recurring
  • site_key same arbitrary site key used by update status module
  • anything else we need?

The final request will be like https://updates.drupal.org/recipe-applied?name=events_recurring&site_key...

Drupal does not need to wait for a response.

Remaining tasks

Decide on any opt in/out options. This is the same privacy policy as update status data. We don’t collect the site URL, don’t share logs, only aggregate, anonymous summaries.

Finalize any other data collected.

Once the final query parameters are set and in core, we can start on the server-side log analysis. The math will be a bit different since recipes are applied once, not installed.

User interface changes

If there is a new opt in/out UI.

Introduced terminology

None

API changes

Not for Drupal as a client.

Data model changes

n/a

Release notes snippet

To be determined

Feature request
Status

Active

Version

11.0 🔥

Component

recipe system

Created by

🇺🇸United States drumm NY, US

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @drumm
  • 🇺🇸United States drumm NY, US

    Add version data to send

  • 🇺🇸United States chrisfromredfin Portland, Maine

    I'm not sure how the telemetry will work for doing the reporting, but I can say that there's a core event when a recipe is applied that is where we can know in code if it's happened and report it. Project Browser listens to that event here:

    https://git.drupalcode.org/project/project_browser/-/blob/2.0.x/src/Reci...

  • 🇬🇧United Kingdom catch

    Tagging as a Drupal CMS release target. If we were to implement this in update module, would that save an additional opt-in? It's almost exactly the same data that we already send from update module so might be fine. Then update module needs to do something with the event - would suggest adding it to a queue item, and then having the queue runner actually send the data so it's as light as possible when applying a recipe.

  • 🇺🇸United States phenaproxima Massachusetts

    I would suggest that we simply add this to the Update Status module as an event subscriber, respecting its opt-in/out.

  • 🇺🇸United States joshuami Portland, OR

    +1 to recipes calling home for update status.

    I was thinking about this a bit when the site templates concept was described. D.o gets directionally accurate stats about module and theme installs based when the available updates call home. That's a little more accurate than composer downloads, and it would align from an accuracy standpoint to our other metrics.

    We kinda need this sort of telemetry for Drupal CMS install metrics as well. If recipes called home about updates, we could use drupal_cms_starter as an indicator that at least part of the site was based on Drupal CMS. As it stands now, we can only guess at Drupal CMS installs based on one of the dependent feature modules with drupal_cms_ in the name.

    One drawback to recipes calling home is that it kinda assumes that recipes will be added to a site and not removed. That might not be a best practice as many recipes will not continue to apply as a site develops over time.

  • 🇦🇺Australia jannakha Brisbane!
  • 🇬🇧United Kingdom catch

    One drawback to recipes calling home is that it kinda assumes that recipes will be added to a site and not removed. That might not be a best practice as many recipes will not continue to apply as a site develops over time.

    I think it would only every be sent to d.o once, not like modules which are every update status check. So it would be possible to track the cumulative times a recipe has been applied, and month by month comparisons, but different from current project usage stats.

  • 🇳🇴Norway zaporylie

    I wonder about a scenario in which a recipe is applied multiple times because it might be a dependency for many other recipes. That would be a common case for starter/base recipes. Said recipes can be deduplicated, which is an approach the recipe installer kit is promoting, or simply applied multiple times. The main issue I see here is inconsistency, which results in slightly off telemetry data.

    Re #5: while I agree that the proposed approach is clean, I wonder how this could be respected if the recipe is applied via installer (no option to opt-in/out) or the site is installed from the recipe (drush si ../recipes/drupal_cms_starter).

    Re #7: Recipe Tracker (thanks for mentioning it here) is meant to track the application of recipes locally, within your Drupal instance, and never sends any data outside the Drupal instance context.

  • 🇺🇸United States drumm NY, US

    The issue summary mentions including the update status site key, which is used to de-dupe update status data per-site.

    We will have to decide on a new algorithm for translating the recipe application numbers into popularity for ranking. Something along the lines of including the last N weeks, potentially with some decay function so the last week contributes more to the rank, and each earlier week contributes less. Until we have some real data, it isn’t worth much speculation about the specific implementation. The earlier we have data, the better.

    Additional metadata that might help is welcome, as long as it doesn’t send identifying information about the site, or delay initial implementation in core. Adding method of installation to the query string with the request would be useful - installed via installer, project browser, drush, as a dependency, etc.

  • 🇬🇧United Kingdom catch

    @zaporylie it might be possible to add recipes to a queue in the installer, and then run through the queue on first cron run when updates status is either applied or not.

    I don't think we want recipe application itself to trigger an http request to Drupal.org, so also going via a queue might be a good idea anyway. We can make it clear in the documentation for the hook/event that it's triggered an indeterminate amount of time after the recipe is applied.

  • 🇳🇴Norway zaporylie

    I appreciate the feedback in #10 and #11. Routing recipe application d.org update requests to the queue, with the update module acting as the queue consumer, sounds like a clean approach that respects global tracking settings.

    I’m curious about one more thing — the issue summary mentioned the name as one of the properties featured in the request. I’d like to clarify two points:
    - The recipe directory name—which is essentially what the recipe name comes down to—has only a loose connection with the project on d.org. I believe we should only send an application update if the recipe includes a composer.json file, so we can verify that it belongs to the drupal/ namespace as outlined in one of the conditions in the issue summary. This would filter out all custom recipes added at the project level (i.e., not available on d.org even if the recipe name matches a general project on d.org), as well as recipes under other vendor namespaces.
    - This also means that all core recipes would never be tracked. Are we okay with that, or should we support core recipes too? If so, we’d need to collect version information from the drupal/core package.

  • 🇺🇸United States drumm NY, US

    - The recipe directory name—which is essentially what the recipe name comes down to—has only a loose connection with the project on d.org. I believe we should only send an application update if the recipe includes a composer.json file, so we can verify that it belongs to the drupal/ namespace as outlined in one of the conditions in the issue summary. This would filter out all custom recipes added at the project level (i.e., not available on d.org even if the recipe name matches a general project on d.org), as well as recipes under other vendor namespaces.

    That sounds correct to me.

    - This also means that all core recipes would never be tracked. Are we okay with that, or should we support core recipes too? If so, we’d need to collect version information from the drupal/core package.

    Yes, we do want core recipe information, so that will need to be a special case.

Production build 0.71.5 2024