Include dependencies from processed text fields

Created on 23 March 2023, over 1 year ago
Updated 12 February 2024, 5 months ago

Problem/Motivation

When exporting content using the "Content of the entity type with reference" mode, it would be nice to include dependencies from processed text fields. For example, the body field of a node using CKEditor has a link (using linkit) to another node, this other node should be included in the export.

I would like to support the following:

  • Linkit - link to content using UUID
  • Media - export the media entity and associated file entity
  • Entity embed - embedded node

Proposed resolution

Add a method to Export.php that will gather the processed text dependencies. Call this method from getEntityReferencesRecursive() so this works recursively.

Feature request
Status

Fixed

Version

2.0

Component

Code

Created by

🇨🇦Canada smulvih2 Canada 🍁

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @smulvih2
  • Status changed to Needs review over 1 year ago
  • 🇨🇦Canada smulvih2 Canada 🍁

    This patch implements a new method in Export.php to get dependencies within processed text fields.

    In the example below, I export one node (nid 1), which has a linkit link to node 2, a media image, and an entity_embed to node 3. As you can see, all three nodes are exported, as well as the media entity and file entity.

  • 🇨🇦Canada smulvih2 Canada 🍁

    I'm using this patch (#2) to export migrated content and it's working great. I noticed some instances where referenced entities are not being considered as dependencies, mainly with the data-entity-type="file"</code attribute. For example, I have a few pages with links to PDF files, and the <code><a> tag references the UUID of the file entity. I have updated the patch to take this into account.

  • 🇨🇦Canada smulvih2 Canada 🍁
  • Status changed to Needs work 5 months ago
  • 🇩🇪Germany mkalkbrenner 🇩🇪
    1. +++ b/src/Exporter.php
      @@ -673,6 +674,83 @@ class Exporter {
      +        foreach ($xpath->query('//a[@data-entity-type="node"]') as $node) {
      

      I suggest to put the xpath expressions into an array and to loop over it instead of having four redundant code blocks.

      Ideally we fire an event, so that others could add or remove expressions.

    2. +++ b/src/Exporter.php
      @@ -692,7 +770,8 @@ class Exporter {
      +    $entity_processed_text_dependencies = $this->getEntityProcessedTextDependencies($entity);
      

      The feature should be configurable.
      If you do a full export, it, the dependencies are included anyway. And this parsing will slow down the export significantly.
      So the user should be able to turn it off. And it should be off automatically in a full export.

    But in general, this is a great addition.

  • Status changed to Needs review 5 months ago
  • 🇨🇦Canada smulvih2 Canada 🍁

    @mkalkbrenner thanks for the feedback! I have attached a new patch that does the following:

    • Made the xpath logic generic to work for all embedded entities. I needed this in my project as we have multiple entity types being embedded beyond the original list provided in ticket description.
    • Made the export process text dependencies configurable, with a checkbox on the settings page.
    • Added support for translations, so if a non-English language has an embedded entity it will also be exported. Found this issue with media items, since users are not translating the English images, but instead added a unique French image and embedding it.

    As for your point 2, when exporting the entire site using prepareToExportAllContent(), the getEntityReferencesRecursive() method is not called, so no issue with performance here.

  • Status changed to Needs work 5 months ago
  • 🇩🇪Germany mkalkbrenner 🇩🇪
    1. +++ b/src/Exporter.php
      @@ -681,6 +715,85 @@ class Exporter {
      +      $entity = $this->entityRepository->getTranslationFromContext($entity, $langcode);
      

      I think, this is wrong.
      At least you should check if the entity hasTaranslation() and skip missing translations.

      Ideally you don't iterate over the systems' languages, but the entity translations.

    2. +++ b/src/Exporter.php
      @@ -681,6 +715,85 @@ class Exporter {
      +        if (!$field->isTranslatable() && $langcode != $default_language) {
      

      This is wrong. An untranslatable entity could be saved in a language different from the default language.

    3. +++ b/src/Exporter.php
      @@ -698,9 +811,18 @@ class Exporter {
      +    $text_dependencies = $config->get('text_dependencies');
      

      We also need a command line option for drush.

      This option could override the config.

  • Status changed to Needs review 5 months ago
  • 🇨🇦Canada smulvih2 Canada 🍁

    @mkalkbrenner thanks for the feedback, these are valid points! I have included a new patch here to address your feedback.

    1. I have changed this to iterate over the entity translations:
    $entity->getTranslationLanguages();

    2. I have removed this logic as it is not needed as you pointed out.

    3. I have added an option to the drush dcder command. You can now pass in --text_dependencies=TRUE/FALSE to change this behavior despite the configuration option in the UI.

  • 🇨🇦Canada smulvih2 Canada 🍁

    Added filter_var() to user input for Drush command to account for user passing in any of these to the --text_dependencies option: [1, TRUE, true, 0, FALSE, false]

  • Status changed to RTBC 5 months ago
  • 🇩🇪Germany mkalkbrenner 🇩🇪
    +++ b/src/Exporter.php
    @@ -681,6 +753,79 @@ class Exporter {
    +          $xpath = new \DOMXPath($dom);
    

    The rewuirement for ext-dom has to bedded to composer.json.

    But I'll do that on commit.

  • Status changed to Fixed 5 months ago
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.69.0 2024