OK, I'm quite there now.
@rp7, I had a deeper look into your code. I think, what you are trying to implement is already there (but not interfaced yet). Look at the code of current GroupAggregator: there is a set of constants STORAGE_CLIENT_MODE_* (and related STORAGE_CLIENT_FLAG_*). These are used to control how storage clients are used on a per-storage-client basis. You can find this setting and on the storage client aggregation settings:
So, you can have an external entity that is not read-only (ie. triggers save/update/delete events), but does not perform anything on its storage clients (if their modes are set to STORAGE_CLIENT_MODE_READONLY). It is more powerful to manage that for each client rather than globally. For instance, I have an external entity that uses 2 storage clients, one "TSV file" storage client, and one "SQL database" storage client. It would be possible to have the TSV file storage client in STORAGE_CLIENT_MODE_READONLY while the SQL database storage client would be in STORAGE_CLIENT_MODE_WRITEONLY. If you have a data aggregator that supports field mapping by storage client, then you can convert a TSV file into a database record just by saving an external entity!
That's just a very simple example of how it would be useful to manage the "read/write mode" on a per-storage-client basis. I'm currently implementing a more powerful data aggregator ( data model aggregator → ), based on the group aggregator, that would allow very powerful and useful things in my business. It would allow to aggregate data from several source as well as converting/transferring data from one source to others (while using custom data processors sometimes, to re-work the source data).
Now, I would like to go a bit further in this discussion and consider the other side! :-) From the Drupal side, we can manage what's going on. What about having a Drupal REST service provided by external entities, that could get triggered when external data is modified? For instance, I have an external source that updates its data and I would like to let my Drupal site know that its corresponding external entity cache needs to be invalidated/updated. Then, it would mean that I could set my local Drupal cache "permanent" and it will only be updated when the data actually changes! That would be efficient. But maybe, it could be created in a separate module.
Any update on this?
After rethinking of your use case and looking at the code, I don't believe you're trying to achieve things the right way.
I think the current REST client implementation already provides what is need to do what you want, and the xntt_views plugin could also be an alternative for dynamic filtering (while it still has issues to solve).
I will close this issue as "won't fix" but you may reopen it if you are not satisfied. You may also consider chatting with me on the #external-entities channel on Drupal slack!.
No news. Reopen if not satisfied.
guignonv → created an issue.
1) Was already solved in v3 at the very beginning.
2) Should be solved by commit # 8df83e41.
Feel free to reopen the issue if not satisfied. :-p
Commit # 60bbb626 provide major improvement: it is now possible to manage conflicting routes and choose custom routes different from the external entity type identifier.
@mortona2k: Could you check that commit #84895eda (3.0.x branch) does the job for REST client that are not supporting client-side filtering?
It should now be possible to uninstall the external entity file field sub-module. I tested with an xntt using it and it was not complaining anymore.
The merge also provides improvements on the UI: there's no need for an extra field to map the file URIs anymore. An external file URIs property is provided by the file field mapper itself for the mapping. An update is also provided to upgrade existing configs using an extra field for the URI mapping, to transfer that mapping to the new external file URI mapping. Just run the update and you should be fine.
That makes me think I was wondering if the way we assign routes could not be problematic. I forgot what I did bu I think if you use the Drupal contact form and then create an external entity called "contact", you don't have access to the contact page anymore. To be checked (I postponed that task and forgot about it). Maybe we should handle XNTT routes differently? For instance, provide a default route based on the type, like now, but allow user override on the XNTT type form? Or should forbid the use of existing routes as XNTT type machine name? There is also this recurrent problem of XNTT with underscore in their machine name that is translated as dashes in URLs (while underscores are leagal in URLs but not following standards).
As we chatted on slack, I think I'll implement a way to support the 'IN' operator on the identifier by default for any storage client. That would be a generic solution that would avoid enabling Drupal-side filtering for REST clients, which can be problematic.
Fixed by last commit.
I had a look and I have a good and a "bad" news. The good news is that it IS possible to reference external entities using a regular entity reference field, without any modification or trick. It already works. That was the good news... :-)
The "bad" news is that the storage client(s) used to fetch external entities must support the 'IN' operator in query filtering. If it does not, there's no way for the default Drupal entity reference validator to fetch all the referenced external entities and validate their IDs. That's why @mortona2k got issue referencing external entities.
When you use REST services, you don't have a "standard" query that returns all the valid IDs, given a list of IDs. For the builtin REST client, there is a way to get around that but with a real cost. On the storage client settings, in the "filtering" section, you may check the box "Enable Drupal-side filtering". This will support Drupal filters such as "IN" as well as others. The problem will be that the REST client will need to fetch ALL the external entities from the REST service to filter them on the client side, which can be a real problem if there are many and/or if the service is slow. You may use cache (and pre-populate it using the XNTT manager) but it may not be an ideal solution.
Regarding tests, the external entity reference field is already tested appropriately (SimpleExternalEntityTest::testSimpleExternalEntity()). No need to add new tests.
So, I'll close that issue as "Working as designed". For the very specific case of REST services and the "IN" filter operator, one of you may create a new issue that would be a "feature request" for the REST client. I don't see so far an easy way to add that feature so propose solutions if you create that feature request! :)
Fixed and tested in commit #be6254123de042317a9a2973374445ec66dcb84e.
Looks good to me. Merging.
It should be fixed by latest commit #fbdd08f4
UPDATE: It needs more than a simple fix. An "external file storage" class needs to be created to replace all the default file storage class methods that relies on database entries. I've started some work and it will be fixed for v3. I hesitated to change the priority to "Major" as it should be uninstallable without any issue but it does not prevent the system from working.
Noted. I'll see that soon. Seems not too complicated to fix but it'll need tests.
I'll have a look next week.
Quick note: I have several sites using external entities with references to other external entities (same xntt type as well as different xntt type). So it should work unless there is a regression somewhere I missed, that does not affect my sites.
if an API call for data with an ID returns nothing, do we assume it's deleted, or something else?
You can't "assume" things in my opinion, since you may have a service temporarily unavailable for instance, or a bug in a remote source. Maybe, it could be a setting on the storage client config, to tell how to behave...
I'm on vacation and will have a look when I'm back in a week.
First, I'd like to clarify one thing: from my perspective, you cannot use External Entities 3.0.0 with the xnttfiles module as they are not compatible. The "xnttfiles" module is compatible with external entity 2.0.x and has been integrated into 3.0.0 core now.
The problem you are referring to, however, is reproducible in 3.0.0 using the "External entities file and image fields support" sub-module.
I can reproduce the problem and I'll fix for v3. If you really have xnttfiles module installed, it might probably be because you updated from 2.0.0 to 3.0.0. I got a dev site with both xntt 3.0.0 and xnttfiles, so it is not impossible but it should not happen on clean installs.
Could you confirm which version of External Entity module you're running (2.0.x or 3.0.x) and if you are really using the xnttfiles → module (while a 3.0.0 update should have replaced the use of the xnttfiles module by the xntt 3.0.0 core file features).
UPDATE: Drupal\xntt_views\Entity\Render\ExternalEntityFieldRenderer needs to be modified line 36-39:
$dynamic_renderers = [
'***LANGUAGE_entity_translation***' => 'TranslationLanguageRenderer',
'***LANGUAGE_entity_default***' => 'DefaultLanguageRenderer',
];
We should provide a new implementation for these classes.
Also, @kalpanajaiswal is right for ExternalEntityField that needs to be also modified to not work with SQL tables. Instead of table names, field machine names should be used.
Quick translation support setup when a single request returns all the translations, each under a sub-key. Ex.:
[
'id' => 42,
'en' => [
'title' => 'some title',
'body' => 'some content',
],
'fr' => [
'title' => 'un titre',
'body' => 'du contenu',
],
];
- I assume your site has multi-lingual support enabled
- Edit your external entity type and go to the new "Language settings" tab below the "Storage" tab
- Check "Enable translations"
- For each supported language, check "Override field mapping"
- Now, for each mapped field, prefix your raw field name (or adapt your field path) with the corresponding language prefix.
For instance, a title field initially mapped to "en.title" should be mapped to "fr.title" for the French language field mapping override. If you know some external fields are not translated, you may keep the original mapping (for instance id remains "id"). You may also remove field mapping (ie. "Not mapped") of some fields if needed as well as map language-specific fields that are not mapped by default. - If you don't always have a translation for a given language, you may map its ID field property using the "Conditional mapping" property mapper and check if another translated field is set. For instance, you can check if the raw field "fr.title" is not empty to map the French ID field to the raw field "id". When the French title is not set or empty, the French ID field will not be mapped, resulting in a non-existent translation instance.
Fixed by the new multi-lingual and translation support for external entities added since commit #a0d7c58e503c597242e5493971d44ee5375bd22e.
I will not implement and upgrade as it is too much work for not so many users, while it is not so complicated to update manually.
Here is the manual update procedure after updating to beta 6:
- I assume your site has multi-lingual support enabled
- Edit your external entity type and go to the new "Language settings" tab below the "Storage" tab
- Check "Enable translations"
- For each supported language, check "Override field mapping"
- Now, for each mapped field, prefix your raw field name (or adapt your field path) with the corresponding language prefix
For instance, a "title" field initially mapped to "remote_title" should be mapped to "fr.remote_title" for the French language field mapping override. If you know some remote fields are not translated, you may keep the original mapping (for instance for numbers or identifiers).
Update: I will soon merge the update on translation support. Once it is done, it might impact this issue and we will see what it still needs to have problems fixed since it was related to the language module (from my previous investigations).
Thanks kalpanajaiswal, I checked what you pointed out and the changes make sense (and not a big deal). Merged.
@Colan if you can test and confirm the fix, I let you close the issue.
Updated and improved the documentation to better use native features and go farther.
I just committed (143a9d5e236ae1b6f9442b6e6711867427b2871b) a new working implementation for multi-lingual support on branch "3.0.x-translation-support". I'm planing to try to write an update to turn vertical aggregators using the 'language' mode into language field mapping overrides and deprecate 'translation' vertical aggregation.
I just committed (143a9d5e236ae1b6f9442b6e6711867427b2871b) a new working implementation for multi-lingual support on branch "3.0.x-translation-support".
Now, when you have external sources that provide multiple languages, you can override storage settings and field mapping accordingly. No need to have a specific raw structure keyed by Drupal language codes, and no need to use vertical aggregator with "language" type aggregation!
I've removed the "language" field mapping as it is automatically set according to the language overrides. However, I wonder if there is a use case where people would get external data without knowing their language and would need to force-map that field to an external value providing the language. I'll think more about it and may change things a little.
I'll need to try to write an update to turn vertical aggregators using the 'language' mode into language field mapping overrides.
I'll also need to add translation tests before asking for community tests and merging to core.
I'll check that once I'm done with the translation support issue.
It's weird though. I don't see how the system would know about external files since their virtual file identifier is not in the database.
I've added branch "3.0.x-translation-support" to main repos. Work in progress...
I'm not there yet but I'll keep an eye on this issue. ;)
Merged. Thanks! :)
Investigation update:
Drupal ContentEntityBase expects to have translations stored in member ->translations[$langcode]['entity']
.
Furthermore, translated field values need to be stored in ->values[$field_name][$langcode]
(provided by the mapped raw data array to the constructor).
I want to support those use cases:
- data aggregator providing all the translated data at once in the raw data array (ie. not possible or not efficient to separate translations)
- data aggregator only return one (or a reduced set of) translated data but can fetch the requested translated data on demand
- fetched translated data may not be keyed using Drupal language codes
New approach, closer to @rp7 initial one:
- A new "Language settings" tab on external entity type configuration (ExternalEntityTypeForm
) where:
- translations can be enabled/disabled
- for each non-default language, a checkbox to enable field mapping override and another one for storage (data aggregator) override
- for both overrides, the same sub-forms used for the default field mapping and storage settings. When override becomes checked, it will duplicate original settings and user will be able to completely change the settings (ie. remove a field mapping or even change the data aggregator for instance).
- on xntt loading, ExternalEntityStorage::extractEntityValuesFromRawData()
will populate ->values[$field_name][$langcode]
with the data it has
- to manage on-demand translation, ExternalEntity::getTranslation() will be overriden to fetch additional translations when needed
- to know if a translation is available for a given langcode, either their is a corresponding storage config override (meaning new data needs to be fetched) or a corresponding field mapping override (meaning data is already there but translated fields need to be mapped)
-This post may be updated later according to new investigations-
UPDATE: I'll also include filed mapping override by language to support the use case of this issue: data in multiple languages.
I'll create a branch.
guignonv → made their first commit to this issue’s fork.
Fixed code style and added ordering of options in annotation selection part to find items in alphabetical order.
Merged.
Updating the issue title.
What is more relevant, SPARQL or GraphQL?
I would go for GraphQL because it seems more promising but in my research domain, SPARQL seems more appropriate. Hard to decide...
No update here for a while. I think current code allows external entities to be created both programmatically, and by config (as well as through the UI of course). Marking the issue fixed. Please reopen the issue if needed.
Next efforts regarding this feature should be ported on
✨
Current xntt lock system is using State API and should be using Config API instead
Active
.
OK thanks!
The code looks good to me but I didn't have time to test it. No problems with current automated tests though. Is it ready to be merged or further review is needed?
I had a deeper look into the code and (re)thought about the logic.
First, your example works for me with no error (PHP 8.3.21) for a regular external entity type created through the UI.
$storage = \Drupal::entityTypeManager()->getStorage('some_regular_xntt');
$query = $storage->getQuery();
$storage_client = $query->getExternalEntityType()->getDataAggregator()->getStorageClient(0);
// It works, I got a storage client instance.
// Note: $query->getExternalEntityType() returns a \Drupal\external_entities\Entity\ExternalEntityType instance, as expected.
So, I believe you are using a custom external entity type created programmatically to get such an error?
Anyway, in terms of logic, I see no issue in current code.
Your line:
$storage_client = $query->getExternalEntityType()->getDataAggregator()->getStorageClient(0);
assumes $query->getExternalEntityType()
returns a ConfigurableExternalEntityTypeInterface
object while it might not be the case. In fact, before calling ->getDataAggregator()->getStorageClient(0)
, you should check that the returned object is actually an instance of ConfigurableExternalEntityTypeInterface
.
I think there is either a problem of logic in your code (like the missing test above) or I am missing something. I did not find any inconsistency in current sources (...which does not mean there are none but so far, so good...).
Maybe I should explain here the reason of the existence of ConfigurableExternalEntityTypeInterface:
if you look in v2, there were only ExternalEntityTypeInterface. The idea of adding a layer is not just to add complexity and problems, it is because developers may want to create their own external entity type without having to use the same current logic that works with data aggregator + storage clients + field mappers + property mappers + data processors. The ExternalEntityInterface
just defines what makes an entity "external" (methods to manage raw data or annotations) while the ConfigurableExternalEntityTypeInterface
adds what is necessary to manage Drupal fields, meaning that fields are configurable by external entity types. When you have Drupal "fields", you need to map them to external data fields: get external data (data aggregator + storage clients) and map (field mapper + property mapper + data processor).
With current design, I believe one can define and implement a external entity type that may not use Drupal fields at all and use its own logic (its own ExternalEntityStorageInterface implementation, etc.).
To conclude, for me, it works as designed. Please provide me a test class that demonstrates the problem otherwise. ...or provide more of your code that raise a problem, with the context.
No news on this issue in a month and I'll set to to "Closed (works as designed)".
You were right, the original schema was not updated! Only people with an old version that run update hook got the schema fixed. Thanks for the fix. I added the unique key constraint that was also missing in your patch (add it to your instance as well ;) ).
Please confirm it works for you now with this fix.
guignonv → changed the visibility of the branch 3.0.x to active.
guignonv → changed the visibility of the branch 3.0.x to hidden.
The fix seems correct so far, but I need to double check why the column was not there already. It was modified a long time ago (2-3 years I think) and I wonder if the fix is to add the column or remove the instructions using it.
Here is my new proposal to manage languages:
It will be managed by ExternalEntityType class. On the external entity type edit form, there will be a new horizontal tab "Language settings", just below the "Storage" tab. On that tab, the user will be able to select a language and it will display current storage configuration form with language specific overrides. In other words, we would have a base storage config (data aggregator config) with possible overrides by languages. For instance, for a REST client, the REST service URLs could be overridden according to the language; for TSV clients, the source file could be changed or a filter could be added; for SQL clients, the queries could be adapted. Since it's the aggregator config that is overridden, it would also be possible to even change the storage client (ie. provide translations from another storage client). The idea would be to provide a checkbox to allow a "language" storage config override and highlight what is overridden (on client side through Javascript). Unchecking+re-checking the checkbox would reset "language" storage config for update. It will be possible to improve this user interface later.
On the run time, the config to load according to current language would be selected by the external entity type when ExternalEntityType::getDataAggregatorConfig() is called.
No change is required to storage clients (including base). No change is required to ExternalEntityStorage.
Modified: ExternalEntityType, ExternalEntityTypeForm and external entity type config schema (...and ExternalEntity for ->setTranslatable(TRUE)
;-) ).
That could be enough but we could get a step further to simplify user config management: we can take profit of the Token module recently added as a dependency to use it in storage clients where it could be pertinent. For instance, a REST service URL could use the token [language:langcode]
and may not need to override storage config for any language.
What do you think?
While this approach works, after thinking, I'm not sure it's the best approach to handle multiple languages. See #3476326 for discussion. I think we should not aggregate multiple languages but rather discriminate which client (config?) to use according to a requested language.
With #3506455, we can (I did not succeed yet though...) merge several language sources into a single external entity with sub-arrays keyed by __external_entity_translation__<LANG_CODE>
.
For instance, example 1:
[
'id' => 42,
'title' => 'English title',
'somefield' => 'other text',
'someotherfield' => 501,
'__external_entity_translation__fr' => [
'id' => 42,
'title' => 'Titre français',
'somefield' => 'autre texte',
'someotherfield' => 501,
],
]
@rp7, in your current approach, you expect raw data to be an array keyed by languages.
For instance, example 2:
[
'en' => [
'id' => 42,
'title' => 'English title',
'somefield' => 'other text',
'someotherfield' => 501,
],
'fr' => [
'id' => 42,
'title' => 'Titre français',
'somefield' => 'autre texte',
'someotherfield' => 501,
],
]
What I don't like with example 1 is that untranslated fields can be duplicated (and it could cost a lot of memory, depending what is stored) and what I don't like in example 2 is that some object may be untranslated but have a key corresponding to a language code which may lead to an incorrect "language availability guessing". If you systematically change the raw structure of all external entities to add a language layer, then there would be no "incorrect guessing" but it would add a layer to raw data and make things more complicated.
I agree I don't see better/ideal solutions at the moment, but we should discuss how things could be handled here. From my side, my main concern is not how the final translatable raw structure will look like but rather how that structure will be generated.
For instance, you could consider consuming a REST API where the language is selected in the URL: you would use a vertical aggregator with as many source as languages you support. Each source provides values for a given language.
Now, if you consider a TSV file (Excel-like file) where there is a "lang_code" column telling the language of the record. You could find multiple time the same ID but with a different value for its "lang_code" column. How to aggregate that? Or should we consider each translated record as independent, but resolve the record to use using both its identifier and its lang_code? It can be interesting to do so to only load the appropriate translation.
So far, ideally, I would prefer the system to be able to only load the appropriate translation, without querying multiple sources (or multiple time the same source). For REST clients, it would mean to select the appropriate URL according to the selected language, for TSV clients or an SQL client to filter on a language column, for a file system client maybe to select the directory scheme according to a language code, etc. I think language should be managed by the storage clients because they know how to filter by language. Otherwise, they would always return all the translations and it would not be efficient. Now, how could that be achieved? I believe there should be a language parameter in the storage client ::load()/::loadMultiple() methods but it's an API change (fortunately, we're still in beta) or a service could be used to get the language instead? What if we want to edit and save another language while the current language is different? I need to think more about it but if you have ideas, please share! :)
What I would prefer to get is example 3:
- Query storage client to get entity 42 in "en":
[ 'id' => 42, 'title' => 'English title', 'somefield' => 'other text', 'someotherfield' => 501, ]
- Query storage client to get entity 42 in "fr":
[ 'id' => 42, 'title' => 'Titre français', 'somefield' => 'autre texte', 'someotherfield' => 501, ]
guignonv → created an issue.
I added translation and I can confirm this problem is related to translation: I can reproduce the issue.
I cannot reproduce the problem. I tried with a XnttTSV storage client and the preview works with rendered entity and its title. I suspect there is something else. What storage client did you use?
Regarding the xntt_views module code, it needs many improvements and many things are still to be implemented properly. In this case, we use the default filter class provided by the views module except for string because I wanted to add a 'FUZZY' filter to test a custom filter behavior.
As @notkataoka noticed, the default string filter provided by views works with database strings and uses the 'LIKE' operator and modifies the filter value which is not suitable for other storage back-ends to efficiently manage filtering.
Filters are passed to the storage clients using StorageClientInterface::transliterateDrupalFilters(). That is where filters can be translated to what is supported by the client when possible.
There is not an official/standard list of supported operators, but a basic list is provided in the Drupal doc and in XNTT StorageClientBase::testDrupalFilter():
'=', '!=', '<>', '>', '>=', '<', '<=', 'STARTS_WITH', 'CONTAINS', 'ENDS_WITH', 'IN', 'NOT IN', 'IS NULL', 'IS NOT NULL', 'EXISTS', 'NOT EXISTS', 'BETWEEN' and 'NOT BETWEEN'
We should list what is used by views to add to this list (eg. "NOT STARTS_WITH").
StorageClientBase::testDrupalFilter() is used when "standard" filters are not supported by the storage client, to provide a "degraded" filtering support on results. The current implementation does not support properly negated filters (ie. NOT ...). Some are, some aren't. It should be fixed because we can meet "NOT CONTAINS" for instance. The code needs to be refactored to handle "NOT" + any other filter.
For investigations, the method ::getTableMapping() comes from SqlContentEntityStorage class while XNTT storage implements ContentEntityStorageBase (which is correct).
I'll continue investigating...
While the code does not look bad, I could not reproduce the initial problem. However, with no surprise, I tried with a SQL storage client. I need time to try with another client. I'm also wondering if the problem would still be there with "::opNotContains()" "::opNotStartsWith()" and "::opNotEndsWith". I don't have all my ideas clear as it's late but I think the "final" operator is supposed to be provided by the storage clients themselves, so there should be no need for this patch. I need to review that more seriously...
guignonv → made their first commit to this issue’s fork.
@guignonv: Maybe this is why you're not running into it?
Yes that might be why because I rarely use full renderded entity. But now, I could try to reproduce the problem later to fix it.
Thanks! :)
What I can say is that I had a look to the views code of Search API to create the views plugin. I'm quite sure their code solves this problem in a way or another. So, my advice would be to see what they did... :p
I think it has been fixed in the current dev version (commit #397e51aa1227ddf52271bc26d7584b280275238d), not the last beta one.
Either try the dev version or patch until we release a new beta... :-S
Hi!
Thanks for the patch. For this feature, I had in mind to use features of the token module. But when I implemented that part, the token service was not part of the storage client services and it has recently been added.
Wouldn't it be more powerful to just support token replacement as it would go beyond static values provided by the query as it could include dynamic values (such as current used login for instance, or a user token, etc.)?
So I would prefer to keep using $this->entityTypeManager->getStorage('my_external_entity')->load("1234");
(to only keep a real id as parameter and keep the existing logic) and provide "color=blue" using another more generic mechanism in the single query parameters part (ie. ::getParametersFormDefaultValue('single')). I would see something either using tokens (which could return nothing if not available for instance) or using an event.
@arwillame, could you add a test in RestClientFunctionalTest.php to test your use case and ensure the fix remains stable in the future?
Your use case could be documented in the test to avoid regressions.