HtmlLink tracker does not identify File entities from link urls

Created on 23 July 2025, 13 days ago

Problem/Motivation

If there is a link to a managed file in a text field, e.g. href='/sites/default/files/foo.pdf', the HtmlLink tracker will not identify this as a file entity. So there is no usage record of this file entity usage. If you are using usage info to fully remove references to a file, this will be missed and leave a 404 or other error.

I believe the problem is that the UrlToEntity code depends on converting urls to routes and file urls are direct links that are not routes. So they are not identified.

Steps to reproduce

  • Files/Content should be allowed as targets/sources
  • Upload or use an existing managed file
  • Get the direct url to the file and the fid.
  • Add a link to the file as a manually entered url, e.g. <a href='/sites/default/files/foo.pdf'>Test link</a>
  • Note the node id
  • Save the page.
  • Query the usage database table for target_id = the fid and source_id = the node id
  • There will be no entry

Proposed resolution

The UrlToEntity class should check if:

- file entities allowed as targets and
- the url starts with the public file system path.

If true, then convert the url to a public://(path sans public file path) Uri that can be used to query the uri field in the managed_files table to find the fid.

Return this file entity info.

Remaining tasks

Proposed plan discussion

Write the code.

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Active

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States cgmonroe

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @cgmonroe
  • πŸ‡§πŸ‡ͺBelgium chewi3

    I had a similar issue after updates where direct file links in body texts were no longer tracked for usage through HtmlLink. After debugging this issue, it turned out that the public file regex pattern in PublicFileIntegration was changed as part of https://www.drupal.org/project/entity_usage/issues/3514883 β†’ and now no longer matches file paths like "/sites/default/files/imagename.jpeg".

    I made a patch that re-adds the optional leading slash which was present in earlier versions. I also made sure we never get a double trailing slash. (which I got at one point during testing).

    This new pattern should now handle both external and local file systems.

Production build 0.71.5 2024