Created on 28 June 2025, 29 days ago

Problem/Motivation

This is a high-priority follow-up to ✨ Add a command-line utility to export content in YAML format Active , which lays the foundation for exporting default content in core.

In that issue, we added the ability to export a single entity. Its dependency information is collected, but dependencies are not automatically exported. In this issue, we should add the ability to export a content entity to disk, along with all of its dependencies (and any attachments, like files) in a coherent folder structure. Similar to what the Default Content module does in contrib.

Proposed resolution

Add a --with-dependencies|-W option to the content:export command. If this option is passed, the content entity and its dependencies are recursively exported to a directory on disk, which defaults to public://content (but can be specified with a --dir|-d option). File entities are exported along with the physical file they refer to, which is dumped next to the exported file entity (exactly the way Default Content does it).

API changes

Probably.

Data model changes

None expected.

Release notes snippet

TBD

✨ Feature request
Status

Postponed

Version

11.0 πŸ”₯

Component

default content system

Created by

πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @phenaproxima
  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts
  • πŸ‡¦πŸ‡²Armenia murz Yerevan, Armenia

    To make it work well, we also need to allow configuring the depth of the dependencies, and the list of reference fields to skip from exporting, because without this can produce a huge amount of unnecessary entities and even circular redirects. As an example of how to resolve this issue, we can look at this implementation: ✨ Allow to partially export content Active

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Good point about the circular dependencies, we should ensure we specifically test that.

    I think allowing fields to opt out of being exported should be admin configurable and happen in a follow-up, as it might have some additional complexity best handled separately.

    As for dependency depth control β€” why? What would the use case be for such a thing?

  • πŸ‡¦πŸ‡²Armenia murz Yerevan, Armenia

    About the depth - let's imagine a "page" node which has a reference field "related_articles" with several articles, and each article has a field "related_pages" linked to some pages - by exporting this, we can get a pretty long nested structure, and even loops.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Was thinking about the point you raised, @murz.

    I don't love the idea of thinking in terms of "depth", since someone using this command might not be entirely familiar with how that maps out in the data structure.

    I would instead suggest we add the following command-line options: --with-dependencies and --exclude-fields.

    --with-dependencies would cause the exporter to recurse. It'll recurse as far down the tree as it can. So how do you prevent very deep recursion? By combining it with --exclude-fields.

    So, to imagine this with your example in #5:

    php core/scripts/drupal content:export node ID_OF_A_PAGE --with-dependencies --exclude-fields=field_related_articles
    

    This will export that page, with all of its dependencies, except for the related articles. This makes sense; it doesn't make sense to export the page dangling references.

    As for the possibility of circular loops, the exporter will guard against that when handling dependencies, and refuse to re-export something it already exported. But that is definitely something we should have a test case for!

  • πŸ‡¦πŸ‡²Armenia murz Yerevan, Armenia

    @phenaproxima, yeah, agree, the loops can be resolved by --exclude-fields parameter, it's better than understanding what the "depth" term is.
    Also, implementing ✨ [PP-1] Allow fields to be marked as non-exportable Active should automate preventing exporting unwanted dependencies.

  • @phenaproxima opened merge request.
  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Although postponed, I've written the MR on top of the current (or near-current) iteration of ✨ Add a command-line utility to export content in YAML format Active so that we can land this quickly after that one goes in. It's reviewable.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Had a quick look with a manual git diff against the other branch to see what you plan for this.

    * The command is still limited to a single entity. With a directory, it's easy to also support all entities of a given type, which would be very useful to not having to go through every single node and term on the umami task for example. There are also several issues in default_content about adding conditions, like exporting all articles. A separate command as opposed to an argument would work better for that to avoid ending up with too many combinations.
    * Not quite sure on the directory behavior. I'd consider also supporting dir if you want to export a entity on the other side but it doesn't really hurt to do references if there are none, at best it could be slightly faster, also not sure on it having a default value to public://content. With a separate argument, I'd consider making the directory required or even an argument. If we keep the default dir, then I'd recommend displaying the location as as a resolved real path.
    * I don't think recursion is a real issue, you you already keep track of entities that were exported and skip them. Typically you don't do this on sites with thousands or even just hundreds of entities, so I wouldn't worry too much about execution time either.
    * for the regular command, we moved IO out of the exporter now, but for this, it's back. one option would be to use a generator to yield completed ExportMetadata objects and add the data to it as well.
    * what's the reason for moving the image/file callbacks back into their respective module ( still think we should consider doing that on the property level, then there is no difference between ER/file/image).
    * Unclear how the hash would work on import. For the importer, the filename needs to match the filename on the file entity (note: there is a possible conflict here two files are in a separate folder but have the same name). I'd make the argument required. Alterantively, extend the metadata structure and explicitly store uri => filename there, then we could also deal with duplicates (the import has a hash based check to identify if an existing file is the same or not, could the same here)
    * Should we do a follow-up to move the file entity stuff to the same file event subscriber on the preImport event? Optionally introduce a corresponding ImportMetaData object?

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    The command is still limited to a single entity.

    To me, adding support for more combinations (all entities of a type, or of a bundle) is follow-up material. This is about getting dependencies exported at all. Tagging for that.

    what's the reason for moving the image/file callbacks back into their respective module

    The file callback was moved into its respective module because now it has to specifically deal with the idiosyncracies of FileInterface entities. That doesn't belong in the subsystem. Since Image depends on File and requires similar treatment, its callback also needs to be moved into the Image module so that it can re-use the File callback after the File module has set it up.

    Should we do a follow-up to move the file entity stuff to the same file event subscriber on the preImport event

    No, because PreImportEvent runs once per import, not once per entity, and has a different purpose.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Fixed these two points of review:

    If we keep the default dir, then I'd recommend displaying the location as as a resolved real path

    I think that's a great idea. I'd like to keep a default export location; it makes using the command easier.

    Unclear how the hash would work on import

    D'oh, that's a great point. I changed it to use the base name of the URI instead. I don't think we can make the parameter required; \Drupal\file\FileInterface::getFilename() says that the return value might be NULL, for reasons it does not explain. We should probably account for that.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    To me, adding support for more combinations (all entities of a type, or of a bundle) is follow-up material. This is about getting dependencies exported at all. Tagging for that.

    That's fair, but my main point is that exporting with references should be a separate command and not just controlled by an option. It has very different output and inputs and will be confusing to use otherwise (--dir does nothing without --with-references, entity ids become optional only if you also specify --with-references, additional arguments in the future will also likely only work with either mode). IMHO, it's much easier to handle, explain and use if they are separate.

    The file callback was moved into its respective module because now it has to specifically deal with the idiosyncracies of FileInterface entities. That doesn't belong in the subsystem. Since Image depends on File and requires similar treatment, its callback also needs to be moved into the Image module so that it can re-use the File callback after the File module has set it up.

    Yes and no. Nothing has changed about the handling of field_item:file and field_item:image. The only change is in regards to file entities, which image doesn't need to know anything about and is not related to the field types. That belongs there yes, but there is no requirement to change the other bits. You can export a file directly and you could have a file referenced through other means, for example embedded in text, which is currently not something that either this or default_content can handle as a dependency (which we can think about in follow-ups if we care enough. I *think* Drupal CMS does currently use that, not sure if it's in default content).

    TLDR, Moving is fine, just the reason for why it's OK in the parent issue and not here doesn't really hold up and it's not really related to this change. IMHO it should either be already like this in the initial issue with the two subscribers or we shouldn't touch it here :)

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    That's fair, but my main point is that exporting with references should be a separate command and not just controlled by an option. It has very different output and inputs and will be confusing to use otherwise (--dir does nothing without --with-references, entity ids become optional only if you also specify --with-references, additional arguments in the future will also likely only work with either mode). IMHO, it's much easier to handle, explain and use if they are separate.

    I don't think I agree with this.

    The command-line world is replete with examples of commands that have options that don't apply in all contexts, as many man pages can confirm. It's not hard to just say something like (for, say, --dir): "This option has no effect if --with-dependencies is not also set". I wouldn't be remotely worried about a user of this command failing to understand that; it's an extremely common pattern that will be well-understood by the command-line jockeys who will use the exporter.

    Besides, most modes of this command will need an output directory. The first invocation we're adding (export a single entity) is the unusual case. In all other cases I can think of, both --dir and --with-dependencies make more sense:

    # Export all nodes to a directory
    drupal content:export node --dir=...
    
    # Export all nodes and their dependencies to a directory
    drupal content:export node --dir=... --with-dependencies
    
    # Export all tags terms to a directory
    drupal content:export taxonomy_term tags --dir=...
    
    # Export all tags terms and their dependencies to a directory
    drupal content:export taxonomy_term tags --dir=... --with-dependencies
    

    Since --dir has a default value that works, none of these seem particularly difficult or confusing to me.

    I think having a second command would be more confusing, because then you have to remember which command you want, as opposed to just quickly running drupal help content:export and easily learning which options you're looking for. (Although you could also use drupal list to see the list of commands, so that's admittedly a bit of a straw man argument.)

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland
    # Export all nodes to a directory
    drupal content:export node --dir=...
    

    In this example you made the ID now optional. But now you need to have condition case on whether or not either --dir OR --export-references was provided. drupal content:export node without at least one of those arguments is not valid, because we can't dump multiple nodes IMHO.

    # Export all tags terms to a directory
    drupal content:export taxonomy_term tags --dir=...
    

    And now you've also introduced a bundle argument. Are we going to be guessing if something is an ID or a bundle? I get it, it's just an example, we could also do --bundle tags instead, but either way, now that's another case that won't work without --dir/export-references.

    It's going to require a lot of conditions to deal with that and also you'll need to document this somehow.

    IMHO, conditionally different semantics around arguments (required or not) is a good reason to split the command.

    It's possible that in the future we'll invent more use cases that do call for a second command, but why jump the gun now? We can cross that bridge when we get there.
    

    Not sure if you mean to split up the command then or add a separate command specifically for the new use case. If split the existing: It's one thing to mark the code as internal and change that but IMHO, if you break commands that people are used to and might have written scripts for and what not, they will be annoyed.

    A different direction would be to change the command to always export to a directly, and maybe add a separate command that prints and doesn't have those extra options for testing (because you're basically always going to need it in files, except when you''re testing and running it repeatedly?). Or even require something like --dir=- if you want it to print. I think there are a bunch of bash commands that require you to explicitly specify to print their stuff to stdout?

    Either way, I've made my point, I'll let others comment on what they prefer.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    My overall feeling is this:

    • If we're exporting a single entity, and only a single entity, print it by default. Same as how Default Content works.
    • If we're exporting multiple entities, regardless of why we're exporting multiple entities, they go into a directory.

    I don't think these semantics are so head-scratchingly complicated that we need two commands to do it.

    That said, I agree in theory about seeing what the wider community would like, because I ultimately am not going to die on the hill of "how many commands do we want?". It's not that important to me.

    What is important is that I land this in a timely manner, since I (and the wider community) need it to start building site templates.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Upstream issue is in, so this is no longer postponed.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Self-assigning to get this one shipshape.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Nobody's jumping in here, so these are the semantics I think I'll go with.

    If you export only one item (no recursion), and don't specify a directory (with --dir), it gets printed out:

    drupal content:export node 42
    
    # YAML DUMP HERE
    

    If you export only one item, no recursion, but you do specify --dir, it goes into that directory:

    drupal content:export node 42 --dir=foo
    
    1 item exported to foo.
    

    If you export with recursion, and specify --dir, it'll go into that directory:

    drupal content:export node 42 --with-references --dir=foo
    
    10 items exported to foo.
    

    If you export with recursion and forget to specify --dir, the command chooses one for you:

    drupal content:export node 42 --with-references
    
    10 items exported to /real/path/to/public/content.
    

    To me, this feels very reasonable: if you provide --dir, the export always goes there, regardless of how many items get exported. If you don't give --dir, the command does something sensible by default.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts
  • πŸ‡ΊπŸ‡ΈUnited States thejimbirch Cape Cod, Massachusetts

    I assume multiple commands will add a layer of complexity and make it harder to maintain the code in the long run. Like if/when we add export all entities of a bundle, would you have to add that option to both commands? IMHO, I feel like the single command with options would be the easiest to grok for recipe authors.

    The examples should be like this since it currently requires that you give it a single entity ID.

    # Export a node to a directory
    drupal content:export node 4 --dir=...
    
    # Export a node and its dependencies to a directory
    drupal content:export node 4 --dir=... --with-dependencies
    
    # Possible future state: Export all tags terms to a directory
    drupal content:export taxonomy_term --bundle=tags --dir=...
    
    # Possible future state: Export all tags terms and their dependencies to a directory
    drupal content:export taxonomy_term --bundle=tags --dir=... --with-dependencies
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    Re #21:

    I know I said I won't comment on this further, but couldn't resist.

    I don't think separate commands would increase code complexity. The implementation is already separate, just the actual command code isn't. It might be a bit more code, but the two methods would IMHO be less complex with fewer conditions.

    # Export a node to a directory
    drupal content:export node 4 --dir=...
    

    Right now, you can't do this, that's part of my argument. --dir only works with --with-dependencies. In the current implication, this will just ignore --dir and still print node 4 to stdout.

    # Possible future state: Export all tags terms to a directory
    drupal content:export taxonomy_term --bundle=tags --dir=...
    

    This doesn't exist yet, so it's just guessing, but this will IMHO again be a bit confusing which option does what and works when. If it follow the current behavior, then using --bundle would also switch the output mode from stdout to files, and then --dir optionally allows to change the default export directory. but it will still print to a directory if you don't provide --dir.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    For exporting "all nodes of a type" or just "all nodes", I think I also favor a second command. But exporting "one thing" or "one thing with its dependencies" is its own command, which we're doing here.

    I think it makes sense to support --dir for a single entity as well. That seems harmless. I'll implement that.

  • πŸ‡¦πŸ‡²Armenia murz Yerevan, Armenia

    So, about exporting a single entity with dependencies and without - will there be a difference in the main entity location?

    For example, we export a single node with id=123 and bundle=article.
    This node has a relationship field "pages" linked to two nodes with bundle=page, ids=345,456, and a "tags" field linked to 3 tags.

    So, in the exported directory, the node with id=123 will be at the root location, or inside the "node" subdirectory?

    Option one:

    - 123.yml
    - node/234.yml
    - node/345.yml
    - taxonomy_term/1.yml
    - taxonomy_term/2.yml
    - taxonomy_term/3.yml

    Option two:

    - export_dir
    - node/123.yml
    - node/234.yml
    - node/345.yml
    - taxonomy_term/1.yml
    - taxonomy_term/2.yml
    - taxonomy_term/3.yml

    (I know that the file name will contain uuid, not the id, just used ids for readability).

    If it will be option two, then we can go with the same command name, if option one - seems it's better to use different command names, because the output structure will be different when exporting one and multiple entities.

    And about printing the entity content to stdout - seems it's better to handle by a separate option like --print or --dump, or --stdout.

    By the way, we can export multiple entities to stdout as a single stream (and to files too) using a delimiter --- - see https://yaml.org/spec/1.2.2/

    YAML uses three dashes (β€œ---”) to separate directives from document content.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Here's what I'm implementing with regard to command-line semantics:

    $ drupal content:export node 42
    # Dumps YAML, same as now.
    
    $ drupal content:export node 42 --dir=my-content
    # Creates my-content/node/SOME_UUID.yml.
    
    [success] The content item "Title of Record" was exported to /full/path/to/my-content.
    
    $ drupal content:export node 42 --with-dependencies
    # Exports recursively to public://content, which is the default of --dir.
    
    [success] 20 items were exported to /full/path/to/public/files/content.
    
    $ drupal content:export node 42 --with-dependencies --dir=my-content
    # Does the same thing as above, except with a set destination.
    
    [success] 20 items were exported to /full/path/to/my-content.
    

    I think this is pretty clear.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Ready for another look - test coverage is complete.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts
  • πŸ‡ΊπŸ‡ΈUnited States thejimbirch Cape Cod, Massachusetts

    I added the followup.

    ✨ Exporting Content should allow for excluding fields Active

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts

    Change record updated.

  • πŸ‡ΊπŸ‡ΈUnited States phenaproxima Massachusetts
  • πŸ‡ΊπŸ‡ΈUnited States thejimbirch Cape Cod, Massachusetts

    We got @berdir's blessing in this slack thread.

    The original command change record β†’ has been updated to reflect these additions.

    Marking as RTBC. Go team default content!

Production build 0.71.5 2024