External Entities

Created on 27 October 2021, over 2 years ago
Updated 6 March 2024, 4 months ago

The purpose of this idea is to help Drupal become more impactful within a larger ecosystem of digital tools through the concept of an External Entity (aka: remote entity). I believe the Drupal community should offer adopters a clear, well-documented reference architecture and framework for this pattern. Many Drupal features and integrations can apply for content that is not strictly owned or managed by Drupal. How can we unlock Drupal’s potential to pull content from an external source and allow that content to seamlessly leverage Drupal’s strengths? The answer and solution could help Drupal to be leveraged in a broader set of tools and a larger ecosystem of capabilities that range from headless to simplified third-party integrations.

Where is the limitation today? The content migration pattern assumes Drupal is the source of truth. It brings content into Drupal and assumes Drupal subsequently owns that data. What if it does not? What if content changes from the system Drupal migrated from? Of course you can re-run migrations, but this is manual. This can cause drift between Drupal and other sources of truth that own content. Drupal does not distinguish between external content and content it owns. What happens if a user edits content that isn’t owned by Drupal? A better pattern needs to be established that helps establish Drupal inside of a composable enterprise (as mentioned in DriesNote) where other systems can continue to be the source of truth for their own data and Drupal can leverage this data and harness it’s capabilities. In this sense, softening the constraint that Drupal is the source of truth could allow it to leverage and pull data from any system and still offer Drupal features. And, if this external entity capability allowed for intelligent mapping of Drupal’s existing data structures to external sources, this could be a powerful feature for existing Drupal applications as well.

JavaScript frameworks, like ReactJS, VueJS, and GatsbyJS, handle the pattern of external content natively because these tools do not store their own content (JAMStack/static site generation/SPA). Content is external to the framework and different from how Drupal is architected (combined content store and dynamic rendering). There are only external content services, often API-based, for which the content is pulled. JS tools are built natively with external content. Drupal, on the other hand, maintains content and configuration through its database and configuration system assuming Drupal natively owns the content. Defining external content standards and capabilities would allow Drupal to more naturally address this gap and interface with other systems much like the JS communities do today. 

What are common use cases for which this pattern already happens (in a Drupal context)?

  • Feeds-like use cases (consume content from RSS)
  • Third party integrations (Digital Asset Management, commerce backends, PIMs, content syndication, social media, CRM, etc)
  • Content migrations (continuous migration pattern)

Why would you want to pull external content into Drupal? Drupal already does a lot! Users want to leverage Drupal’s native features which are appealing for any content, not just content it manages. Things like theming, caching, Views, migrate, View Modes, web services, and more are all also complementary for external content. And, allowing external content to work in concert with Drupal-owned content which has Drupal’s powerful, customizable framework is incredibly compelling. If Drupal can establish a pattern that allows it to bring together content from a number of sources in an elegant way, users wouldn’t be forced to duplicate content using Drupal’s migrate system but Drupal can be properly enabled to interface with other content in an enterprise.

This is no simple problem to solve. External systems may have content structured or unstructured. Interoperability varies platform by platform through different standards or even proprietary methods. Drupal would need to manage this new type of pattern by maintaining retention policies, mapping it’s content models to its external equivalent, managing change both structurally and with the content itself, and much more.

What needs to be considered to solve this?

  • Awareness and parsing of external data into Drupal entities (ideally through the UI and through extensible parts of the framework)
  • Extensible logic to able to support different types of interoperability and standards (GraphQL, REST API, JSON:API)
  • Mapping of external data structures to Drupal native data structures (discretional/selective, not forced)
  • Ability to inspect external content and create a new data structure and mapping, if desired.
  • Configuration options of content selection, API keys/access constructs, and retention policies, some specific to the data source
  • Optional behaviors (one-time copy/editable, synchronized/non-editable, direct pull) 
  • Awareness and interoperability for event-driven data operations (CRUD, webhooks)

What exists today?

Drupal itself has a lot of subsystems that can and should be considered for this solution. The entity system already models and stores data. The caching system already maintains cached content based on retention policies. The migration system already has and maintains mappings from different content sources and has tooling to perform content migration activities. The web service capabilities of Drupal can and should be considered for standards and interoperability. Contrib maintains webhook capabilities (which could become a candidate for core). And, much more. It’s not out of the realm of possibility these systems can be brought together through a new user interface that offers a low or no code experience for site builders to manage external entities.

Currently there is a contributed module that appears to implement this capability. As proposed, this external entity idea appears to be more comprehensive. From some limited testing, the module appears to cover a limited set of the aforementioned scope and has some known limitations, like a lack of Views support. A key design difference appears to be only a direct connection to the remote APIs with a caching footprint. A direct connection could be an optional behavior of this proposed solution, but a more native integration with the entity system likely affords more subsystem support. Detailed comments are offered in the appendix below. But, to materialize this idea further, I think both this idea and the contributed module need more perspective from the community to realize the full breadth of this problem space.

What could happen?

I would like to see a small group established to define an approach from this idea and build a proof of concept that solves popular and varied use cases (a DAM, a CRM, a PIM, a feed, etc). Having an out of the box example, much like Umami, that can showcase these features and may be useful in increasing adoption for those outside of Drupal. 

Appendix: Feedback from testing the External Entities module

  • In testing, it installed on Drupal 9 without issue
  • Two basic “types” of connections exist out of the box: a REST endpoint and/or WIKI endpoint with a plugin system for other connections (framework)
  • REST assumes a fair amount of things about the API being consumed that many may not adhere to, such as specific paging parameters
  • There is an overall lack of documentation
  • The module not very fault-tolerant: one API request made on a listing page caused OOM errors almost immediately
  • Lists did not display as entities with display modes on the front-end (although REST queries appeared to return data while debugging).
  • Individual entities did display on the front-end. The queries happen dynamically on page load with optional caching.
  • There does appear to be a way to create custom annotations to “inject” new field-level data with external entities. As I was unable to really get the individual ones displayed correctly and since there was no documentation, I did not attempt to figure out how that would work.
  • Module does not create Drupal structures automatically from external content
  • Open questions:
    • Should the entities be stored within Drupal outside of caching? As of now, external entities basically create routes that can display data from another endpoint, but they are not natively visible within the Drupal admin.
    • How do we make it friendlier to map objects to Drupal fields? The existing mapping used is very rudimentary as they are just text fields that require the exact text. It would be a far better user journey to get select dropdowns with a limited set of options to map incoming REST responses to specific Drupal fields for the particular entity type (akin to Feeds behavior).
    • How do we make it easier immediately upon install? The customer journey of the module could benefit from some UX/DX perspective tied to ease-of-use and time-to-value. For example:
      • The feeds module has a UI for determining the data mapping of the incoming data to fields within Drupal. It would really speed up the process to visually be able to map the transformation of data.
      • The administrative forms could benefit from UX analysis and significantly improved documentation.
      • A validation step after adding new endpoints to confirm the data looks right and is modeled correctly would be very useful.
      • Improved ability to create local data for an entity within the Drupal UI without writing custom annotations.
      • Views support feels critical to me so the incoming data can be easily listed on the front-end.
      • There’s currently a pathauto submodule in external_entities. Without parsing through code, it would be great to know what it does and how it works.
Feature request
Status

Active

Component

Idea

Created by

🇺🇸United States nerdstein United States

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇫🇷France guignonv

    It's a 2-years old topic but I'd like to add some comments and update for other people that would see this topic later.

    First a couple of updates regarding External Entities module:

    1. More than two "types" of connections exist by now:
      • REST (native) with its derivative Wiki endpoint (native) and a new specialized endpoint for BrAPI (breeding API) that shows the REST base is generic enough to build other clients
      • Databases : supports both local and external PostgreSQL and MySQL database types so far. Works with custom "raw" SQL queries. Supports CRUD and filtering. There is a derived client for a specialized database schema (for biological research) called Chado Light with a more user-friendly interface that should it's possible to build on the base "database" client to bring user-friendly interfaces to handle specific database schemas.
      • Files : supports both all records in one file, each record in separate files, a mix of both and files as data record themselves (ie. an image file for instance). There is a set of derived client to support: CSV/TSV , JSON , XML and YAML .
      • a "type mixer", xnttmulti , that enables the mix of several sources (any combination of the above) into one. Sources can be filtered and merged by groups or accumulated (ie. a given external entity id can gather data from 2 sources, adding or overriding fields, or 2 sources can also provide the same type of entities that are listed one after the other). It also supports join field between sources (ie. merge data from a source to another using a field of the first source as a key to fetch the other source item, rather than using only one identifier key for all sources).
    2. There is also a extension module called xnttmanager that enables automatic "annotation entities" creation as well as data synchronization and/or caching. It's also able to list available external fields (mapped or not) as long as the ID field is mapped (but this limitation may be removed soon) and highlights good and invalid mapping. The ability to automatically generates a corresponding Drupal field structure for an external entity is a feature request on its way.
    3. For data mapping, there are now field mapper plugins. 2 are built-in in external entities module: simple and JSON Path. Another plugin exists to handle the use of expression that allow some PHP string functions: xnttstrjp .
    4. There is a (currently alpha) plugin to make external entities work with Views natively: xnttviews . It's more relevant to use it with local sources such as databases or files rather than REST sources for obvious performance reasons.
    5. Another issue that was not mentioned is that it was not possible to map file/image Drupal fields to external content. It was a pity because there are many plugins that work with 'file' or 'image' field types that could not be used with external entities. This problem is now solved with the last (currently 'dev') version of the xnttfiles module. It's now possible to map file or image fields to external content without falling back to a link field with external image cache module as it used to be.

    Regarding performances, it depends what you want to achieve but there are several solution already available. You can use local Drupal cache, data replication to local Drupal entities (not external) with automated synchronization (using xnttmanager module) and you can also index external entity content with modules of the search API eco-system.
    Note: in the case of database or file sources, the performance should be there without all that stuff since we're working on local data.

    So, I see here above a list of features that should be kept in case of a "redefinition" of what Drupal external entities should be.

    A last note: most of those modules mentioned above are in development stage but they already fulfill many needs and are promising. It is, for instance, possible to have a (old) commerce site with it's own database and build on a side a Drupal site that access that database (even while it's live) and that can (later) take over the database. It's also possible to use the source mixing or the xnttmanager to load data from an external source and convert/store it into another source (ie. read data from a file and save into database or vice-versa with xnttmulti or duplicate external data into Drupal as Drupal "regular" content entities through xnttmanager). There are so many possibilities and use cases that can be fulfilled that it's hard to list them all here!

  • 🇫🇷France guignonv

    Additional comment: I forgot to emphasis one important thing that makes a big difference between External Entities and other modules like External Data Source (or Tripal for biological data): it is entity-based and not field-based.

    Why is it important to me? Because I need to create hybrid entities made from several sources. For instance, I have some parts of my data stored in a database and some other parts in files and I can also aggregate some other information from REST services (for an example, I have "germplasm" -a kind of specific organism/plant genetic profile- data in a database but I aggregate some processing status from a flat TSV file used by another external application and I also need to aggregate data from another partner site to know if that germplasm has been used in some experiments). If my data was loaded on a field basis, it would mean each data source would be queried separately for each field. So, if I have 20 database fields, 4 file fields ans 10 REST fields, I would have 20 database queries, 4 file system requests and 10 (web) REST queries! It would not be efficient. With the External Entity approach and the xnttmulti module, I can query just one time each source and then map the data to my Drupal fields which I feel is more efficient. That's a key point: the less external sources are queried the better it is.

Production build 0.69.0 2024