Problem/Motivation
When using gatsby_instantpreview
to do builds, or when using gatsby_fastbuilds
with log_published enabled, the buildRelationshipJson
function does a recursive crawl through all related entities that the module is configured to send to Preview/Build. This can become problematic in sites which make heavy use of entity references. In the worst case, where content is highly connected, it's possible for a single-node update to include hundreds or even thousands of entities alongside the entity which is actually being updated. This is harmful to both Drupal performance and build performance.
This may sound like a very theoretical issue, but just recently we saw an incremental build take 5-10 times longer than normal, and upon looking into it we noticed that over 100 entities were included when in practice, just the node would've been sufficient. But the node has a taxonomy reference, and the taxonomy term has other references, and before long we're sending a large amount of data which hasn't changed since Gatsby last heard from Drupal.
From what I can tell, the inclusion of related entities is just a way to get certain entities to Gatsby without triggering a build. If you create 5 taxonomy terms and upload 3 images in preparation for creating a node, there's probably no benefit in sending those to Gatsby yet. If you can avoid sending them until they're needed, it can substantially cut down on the number of superfluous builds, so we include them when sending over a node that uses them.
Steps to reproduce
- Enable gatsby_instantpreview and gatsby_fastbuilds
- Create a content type with an entity reference field to other nodes
- Create a node A which references another node B
- Check /gatsby-fastbuilds/sync/[recent timestamp] and confirm that the insert log for node A also includes the data for node B
Proposed resolution
Avoid sending entities which Gatsby is already aware of unless they've actually changed. We don't want to actually track the entities which have been sent (several Gatsby builds could be pulling data from the same Drupal instance.) but if it's possible to know which entities Gatsby should already be aware of, we can avoid sending those.
As it's currently set up, in order to reach the recursive buildRelationshipJson
function, the original entity being created/updated/deleted must be a node. While we can't easily be certain that an arbitrary entity has been sent to Gatsby, it is safe to assume that all published nodes have been sent. So when traversing entity relationships to include things Gatsby isn't yet aware of, if the referenced entity is a node, just skip it.
To be clear, this is not a comprehensive fix. If other entity types are extensively interconnected, it's possible to bring in a lot of unnecessary data. But this feels like a straightforward improvement which can substantially cut down on sending unnecessary entities.
Remaining tasks
Create patch and/or discuss.