Letting Drupal know about new/updated/deleted external entities

Created on 30 June 2025, about 2 months ago

I was wondering how everyone deals with letting Drupal know about new, updated and deleted external entities.

Out of the box (AFAICT?), quite common hook implementations such as hook_entity_presave(), hook_entity_insert(), hook_entity_update() and hook_entity_delete() are not automatically invoked. A lot of core functionality and modules listen to these hooks (cache invalidations, Search API, ...).

This issue could serve as a place to share experiences, perhaps provide at least some code than assist in this, ...

πŸ’¬ Support request
Status

Active

Version

3.0

Component

Code

Created by

πŸ‡§πŸ‡ͺBelgium rp7

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @rp7
  • πŸ‡§πŸ‡ͺBelgium rp7

    Here's my experience on a project we're working on.

    We have an external system that notifies us (through a custom API endpoint) about changes on our external content. Once we receive a message (operation, type, ID), after some trial and error we ended up with the following code to invoke entity lifecycle hooks:

    if ($action === 'delete') {
      $entity::preDelete($entity_storage, [$entity]);
      $this->moduleHandler->invokeAll($entity->getEntityTypeId() . '_predelete', [$entity]);
      $this->moduleHandler->invokeAll('entity_predelete', [$entity]);
      $entity::postDelete($entity_storage, [$entity]);
    }
    else {
      $entity->original = clone $entity;
      $entity->preSave($entity_storage);
      $this->moduleHandler->invokeAll($entity->getEntityTypeId() . '_presave', [$entity]);
      $this->moduleHandler->invokeAll('entity_presave', [$entity]);
      $entity->postSave($entity_storage);
    }
    
    $hook = $action === 'create' ? 'insert' : $action;
    $this->moduleHandler->invokeAll($entity->getEntityTypeId() . '_' . $hook, [$entity]);
    $this->moduleHandler->invokeAll('entity_' . $hook, [$entity]);
    

    This has worked more or less OK for quite a while. It does have some gaps, however. Thinking about the entity translation-specific hooks, for example.

  • πŸ‡§πŸ‡ͺBelgium rp7

    After some thinking, I think the most proper way is to be able to call $external_entity->save() or $external_entity->delete(). This comes very close (if not completely) to Drupal's default way of working with entities - but doing this could (if your external entities are not marked as read-only) send a write to the external API - which is undesired in this scenario (since we only want to call it to make Drupal do its thing on the new/updated/deleted entity).

    I was wondering if it's a good idea to introduce something so that we can (temporarily) mark external entities on which save/delete operations are performed, to not push through to the external API. See patch attached. It's based on a similar functionality Drupal core (setSyncing and isSyncing).

    With this patch, the code above could be changed to:

    $entity->skipExternalStorageMutation();
    $entity->save();
    $entity->skipExternalStorageMutation(FALSE);
    

    and

    $entity->skipExternalStorageMutation();
    $entity->delete();
    $entity->skipExternalStorageMutation(FALSE);
    

    Any thoughts? How is everyone else tackling this problem space?

  • πŸ‡«πŸ‡·France guignonv Montpellier

    I'm not there yet but I'll keep an eye on this issue. ;)

  • πŸ‡ΊπŸ‡ΈUnited States mortona2k Seattle

    Can anyone shine some light on how/why external entities are bypassing those hooks?

    Do we need some way to determine what to do in different scenarios? IE if an API call for data with an ID returns nothing, do we assume it's deleted, or something else?

  • πŸ‡«πŸ‡·France guignonv Montpellier

    if an API call for data with an ID returns nothing, do we assume it's deleted, or something else?

    You can't "assume" things in my opinion, since you may have a service temporarily unavailable for instance, or a bug in a remote source. Maybe, it could be a setting on the storage client config, to tell how to behave...

  • πŸ‡«πŸ‡·France guignonv Montpellier

    OK, I'm quite there now.

    @rp7, I had a deeper look into your code. I think, what you are trying to implement is already there (but not interfaced yet). Look at the code of current GroupAggregator: there is a set of constants STORAGE_CLIENT_MODE_* (and related STORAGE_CLIENT_FLAG_*). These are used to control how storage clients are used on a per-storage-client basis. You can find this setting and on the storage client aggregation settings:

    So, you can have an external entity that is not read-only (ie. triggers save/update/delete events), but does not perform anything on its storage clients (if their modes are set to STORAGE_CLIENT_MODE_READONLY). It is more powerful to manage that for each client rather than globally. For instance, I have an external entity that uses 2 storage clients, one "TSV file" storage client, and one "SQL database" storage client. It would be possible to have the TSV file storage client in STORAGE_CLIENT_MODE_READONLY while the SQL database storage client would be in STORAGE_CLIENT_MODE_WRITEONLY. If you have a data aggregator that supports field mapping by storage client, then you can convert a TSV file into a database record just by saving an external entity!

    That's just a very simple example of how it would be useful to manage the "read/write mode" on a per-storage-client basis. I'm currently implementing a more powerful data aggregator ( data model aggregator β†’ ), based on the group aggregator, that would allow very powerful and useful things in my business. It would allow to aggregate data from several source as well as converting/transferring data from one source to others (while using custom data processors sometimes, to re-work the source data).

    Now, I would like to go a bit further in this discussion and consider the other side! :-) From the Drupal side, we can manage what's going on. What about having a Drupal REST service provided by external entities, that could get triggered when external data is modified? For instance, I have an external source that updates its data and I would like to let my Drupal site know that its corresponding external entity cache needs to be invalidated/updated. Then, it would mean that I could set my local Drupal cache "permanent" and it will only be updated when the data actually changes! That would be efficient. But maybe, it could be created in a separate module.

Production build 0.71.5 2024