Provide metadata/changes.json endpoint to track updates

Created on 7 November 2023, about 1 year ago
Updated 10 January 2024, 11 months ago

Problem/Motivation

The Composer repository interface allows for a changes.json endpoint to be advertised in packages.json as on packagist.org:

metadata-changes-url: "https://packagist.org/metadata/changes.json",

You can see how it works at https://packagist.org/apidoc#track-package-updates

The purpose of this endpoint is to fetch a list of all packages that were added, deleted or modified since the passed in timestamp. This allows mirroring repositories like https://packagist.com or any other private Composer repositories, to efficiently only fetch metadata for modified packages, rather than having to regularly poll the metadata for all packages stored in drupal.org just to find out which ones may have changed. For packagist.org this allows mirrors to provide new versions essentially immediately. Users of mirrors/proxies currently get data on new versions or new packages from drupal.org with a significant delay.

To limit the amount of data that needs to be stored for this purpose, the endpoint may return a resync response (see packagist.org docs) if the timestamp is too far in the past, e.g. more than 24 hours on packagist.org.

Proposed resolution

Implement a changes.json endpoint for packages.drupal.org.

This endpoint just returns a list of names and timestamps. The proxy/mirror should then still retrieve data from the corresponding real metadata endpoints for the packages, rather than getting any metadata from this endpoint directly. Composer itself doesn't access this endpoint in its regular use. So I don't see a point in signing the contents delivered by this endpoint. This should make implementation easier.

Feature request
Status

Fixed

Version

1.0

Component

Code

Created by

🇩🇪Germany naderman

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @naderman
  • 🇺🇸United States drumm NY, US

    Our rare deletions are a manual process, so 📌 Handle release/project deletion Active should either be done first, or update that issue to include recording the deletion so it can show up in this API later.

    We shouldn’t try to figure out when Composer metadata was updated from release update time or anything else, since that would be error-prone and there is some lag after that for metadata to be written. project_composer_write_json() should record what’s updated when, so the API implementation can pretty much just return data from that table.

    • drumm committed c98b8fb4 on 7.x-1.x
      Issue #3399867: Start tracking composer metadata file updates
      
  • Assigned to drumm
  • 🇺🇸United States drumm NY, US

    We’re now tracking these updates, so the API can be available some time after 24h from now

    • drumm committed bf122afe on 7.x-1.x
      Issue #3399867: Provide metadata/changes.json endpoint to track updates
      
  • Status changed to Needs review about 1 year ago
  • 🇺🇸United States drumm NY, US

    https://packages.drupal.org/8/metadata/changes.json is now available.

    naderman - review for this behaving as expected would be appreciated.

  • 🇩🇪Germany naderman

    Implementation looks fine to me as far as I can tell that without knowing all the surrounding code, and API looks to work as expected. Thanks very much for the speedy implementation here as well!

    Only question I have is in how far you have strong ordering guarantees there, e.g. when/how does the last updated timestamp for a package get set? If I get a response with a specific timestamp from the changes json, how likely is a change going to be written for that same or the previous second right afterwards due to either multiple machines processing this with slightly different clocks, or a process getting a timestamp at the beginning but running for a second or multiple before saving the data?

    • drumm committed d93994ba on 7.x-1.x
      Issue #3399867: More-accurate timestamp, track ~dev separately
      
  • 🇺🇸United States drumm NY, US

    I spotted 2 issues:

    • Dev metadata was not being logged.
    • The timestamps in this were a time after the last-modified time which would be reported. That could leave clients thinking there was an update that was impossible to get. Now the actual file mtime is reported.

    This was just deployed, so it will take 24h for the metadata to fully flush through.

  • Status changed to Fixed 12 months ago
  • 🇺🇸United States drumm NY, US

    This was deployed well over 24h ago, and there are no known issues. The timestamps should be reported exactly as the files’ mtime on disk and last modified HTTP header.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024