I'm not there yet but I'll keep an eye on this issue. ;)
Merged. Thanks! :)
Investigation update:
Drupal ContentEntityBase expects to have translations stored in member ->translations[$langcode]['entity']
.
Furthermore, translated field values need to be stored in ->values[$field_name][$langcode]
(provided by the mapped raw data array to the constructor).
I want to support those use cases:
- data aggregator providing all the translated data at once in the raw data array (ie. not possible or not efficient to separate translations)
- data aggregator only return one (or a reduced set of) translated data but can fetch the requested translated data on demand
- fetched translated data may not be keyed using Drupal language codes
New approach, closer to @rp7 initial one:
- A new "Language settings" tab on external entity type configuration (ExternalEntityTypeForm
) where:
- translations can be enabled/disabled
- for each non-default language, a checkbox to enable field mapping override and another one for storage (data aggregator) override
- for both overrides, the same sub-forms used for the default field mapping and storage settings. When override becomes checked, it will duplicate original settings and user will be able to completely change the settings (ie. remove a field mapping or even change the data aggregator for instance).
- on xntt loading, ExternalEntityStorage::extractEntityValuesFromRawData()
will populate ->values[$field_name][$langcode]
with the data it has
- to manage on-demand translation, ExternalEntity::getTranslation() will be overriden to fetch additional translations when needed
- to know if a translation is available for a given langcode, either their is a corresponding storage config override (meaning new data needs to be fetched) or a corresponding field mapping override (meaning data is already there but translated fields need to be mapped)
-This post may be updated later according to new investigations-
UPDATE: I'll also include filed mapping override by language to support the use case of this issue: data in multiple languages.
I'll create a branch.
guignonv → made their first commit to this issue’s fork.
Fixed code style and added ordering of options in annotation selection part to find items in alphabetical order.
Merged.
Updating the issue title.
What is more relevant, SPARQL or GraphQL?
I would go for GraphQL because it seems more promising but in my research domain, SPARQL seems more appropriate. Hard to decide...
No update here for a while. I think current code allows external entities to be created both programmatically, and by config (as well as through the UI of course). Marking the issue fixed. Please reopen the issue if needed.
Next efforts regarding this feature should be ported on
✨
Current xntt lock system is using State API and should be using Config API instead
Active
.
OK thanks!
The code looks good to me but I didn't have time to test it. No problems with current automated tests though. Is it ready to be merged or further review is needed?
I had a deeper look into the code and (re)thought about the logic.
First, your example works for me with no error (PHP 8.3.21) for a regular external entity type created through the UI.
$storage = \Drupal::entityTypeManager()->getStorage('some_regular_xntt');
$query = $storage->getQuery();
$storage_client = $query->getExternalEntityType()->getDataAggregator()->getStorageClient(0);
// It works, I got a storage client instance.
// Note: $query->getExternalEntityType() returns a \Drupal\external_entities\Entity\ExternalEntityType instance, as expected.
So, I believe you are using a custom external entity type created programmatically to get such an error?
Anyway, in terms of logic, I see no issue in current code.
Your line:
$storage_client = $query->getExternalEntityType()->getDataAggregator()->getStorageClient(0);
assumes $query->getExternalEntityType()
returns a ConfigurableExternalEntityTypeInterface
object while it might not be the case. In fact, before calling ->getDataAggregator()->getStorageClient(0)
, you should check that the returned object is actually an instance of ConfigurableExternalEntityTypeInterface
.
I think there is either a problem of logic in your code (like the missing test above) or I am missing something. I did not find any inconsistency in current sources (...which does not mean there are none but so far, so good...).
Maybe I should explain here the reason of the existence of ConfigurableExternalEntityTypeInterface:
if you look in v2, there were only ExternalEntityTypeInterface. The idea of adding a layer is not just to add complexity and problems, it is because developers may want to create their own external entity type without having to use the same current logic that works with data aggregator + storage clients + field mappers + property mappers + data processors. The ExternalEntityInterface
just defines what makes an entity "external" (methods to manage raw data or annotations) while the ConfigurableExternalEntityTypeInterface
adds what is necessary to manage Drupal fields, meaning that fields are configurable by external entity types. When you have Drupal "fields", you need to map them to external data fields: get external data (data aggregator + storage clients) and map (field mapper + property mapper + data processor).
With current design, I believe one can define and implement a external entity type that may not use Drupal fields at all and use its own logic (its own ExternalEntityStorageInterface implementation, etc.).
To conclude, for me, it works as designed. Please provide me a test class that demonstrates the problem otherwise. ...or provide more of your code that raise a problem, with the context.
No news on this issue in a month and I'll set to to "Closed (works as designed)".
You were right, the original schema was not updated! Only people with an old version that run update hook got the schema fixed. Thanks for the fix. I added the unique key constraint that was also missing in your patch (add it to your instance as well ;) ).
Please confirm it works for you now with this fix.
guignonv → changed the visibility of the branch 3.0.x to active.
guignonv → changed the visibility of the branch 3.0.x to hidden.
The fix seems correct so far, but I need to double check why the column was not there already. It was modified a long time ago (2-3 years I think) and I wonder if the fix is to add the column or remove the instructions using it.
Here is my new proposal to manage languages:
It will be managed by ExternalEntityType class. On the external entity type edit form, there will be a new horizontal tab "Language settings", just below the "Storage" tab. On that tab, the user will be able to select a language and it will display current storage configuration form with language specific overrides. In other words, we would have a base storage config (data aggregator config) with possible overrides by languages. For instance, for a REST client, the REST service URLs could be overridden according to the language; for TSV clients, the source file could be changed or a filter could be added; for SQL clients, the queries could be adapted. Since it's the aggregator config that is overridden, it would also be possible to even change the storage client (ie. provide translations from another storage client). The idea would be to provide a checkbox to allow a "language" storage config override and highlight what is overridden (on client side through Javascript). Unchecking+re-checking the checkbox would reset "language" storage config for update. It will be possible to improve this user interface later.
On the run time, the config to load according to current language would be selected by the external entity type when ExternalEntityType::getDataAggregatorConfig() is called.
No change is required to storage clients (including base). No change is required to ExternalEntityStorage.
Modified: ExternalEntityType, ExternalEntityTypeForm and external entity type config schema (...and ExternalEntity for ->setTranslatable(TRUE)
;-) ).
That could be enough but we could get a step further to simplify user config management: we can take profit of the Token module recently added as a dependency to use it in storage clients where it could be pertinent. For instance, a REST service URL could use the token [language:langcode]
and may not need to override storage config for any language.
What do you think?
While this approach works, after thinking, I'm not sure it's the best approach to handle multiple languages. See #3476326 for discussion. I think we should not aggregate multiple languages but rather discriminate which client (config?) to use according to a requested language.
With #3506455, we can (I did not succeed yet though...) merge several language sources into a single external entity with sub-arrays keyed by __external_entity_translation__<LANG_CODE>
.
For instance, example 1:
[
'id' => 42,
'title' => 'English title',
'somefield' => 'other text',
'someotherfield' => 501,
'__external_entity_translation__fr' => [
'id' => 42,
'title' => 'Titre français',
'somefield' => 'autre texte',
'someotherfield' => 501,
],
]
@rp7, in your current approach, you expect raw data to be an array keyed by languages.
For instance, example 2:
[
'en' => [
'id' => 42,
'title' => 'English title',
'somefield' => 'other text',
'someotherfield' => 501,
],
'fr' => [
'id' => 42,
'title' => 'Titre français',
'somefield' => 'autre texte',
'someotherfield' => 501,
],
]
What I don't like with example 1 is that untranslated fields can be duplicated (and it could cost a lot of memory, depending what is stored) and what I don't like in example 2 is that some object may be untranslated but have a key corresponding to a language code which may lead to an incorrect "language availability guessing". If you systematically change the raw structure of all external entities to add a language layer, then there would be no "incorrect guessing" but it would add a layer to raw data and make things more complicated.
I agree I don't see better/ideal solutions at the moment, but we should discuss how things could be handled here. From my side, my main concern is not how the final translatable raw structure will look like but rather how that structure will be generated.
For instance, you could consider consuming a REST API where the language is selected in the URL: you would use a vertical aggregator with as many source as languages you support. Each source provides values for a given language.
Now, if you consider a TSV file (Excel-like file) where there is a "lang_code" column telling the language of the record. You could find multiple time the same ID but with a different value for its "lang_code" column. How to aggregate that? Or should we consider each translated record as independent, but resolve the record to use using both its identifier and its lang_code? It can be interesting to do so to only load the appropriate translation.
So far, ideally, I would prefer the system to be able to only load the appropriate translation, without querying multiple sources (or multiple time the same source). For REST clients, it would mean to select the appropriate URL according to the selected language, for TSV clients or an SQL client to filter on a language column, for a file system client maybe to select the directory scheme according to a language code, etc. I think language should be managed by the storage clients because they know how to filter by language. Otherwise, they would always return all the translations and it would not be efficient. Now, how could that be achieved? I believe there should be a language parameter in the storage client ::load()/::loadMultiple() methods but it's an API change (fortunately, we're still in beta) or a service could be used to get the language instead? What if we want to edit and save another language while the current language is different? I need to think more about it but if you have ideas, please share! :)
What I would prefer to get is example 3:
- Query storage client to get entity 42 in "en":
[ 'id' => 42, 'title' => 'English title', 'somefield' => 'other text', 'someotherfield' => 501, ]
- Query storage client to get entity 42 in "fr":
[ 'id' => 42, 'title' => 'Titre français', 'somefield' => 'autre texte', 'someotherfield' => 501, ]
guignonv → created an issue.
I added translation and I can confirm this problem is related to translation: I can reproduce the issue.
I cannot reproduce the problem. I tried with a XnttTSV storage client and the preview works with rendered entity and its title. I suspect there is something else. What storage client did you use?
Regarding the xntt_views module code, it needs many improvements and many things are still to be implemented properly. In this case, we use the default filter class provided by the views module except for string because I wanted to add a 'FUZZY' filter to test a custom filter behavior.
As @notkataoka noticed, the default string filter provided by views works with database strings and uses the 'LIKE' operator and modifies the filter value which is not suitable for other storage back-ends to efficiently manage filtering.
Filters are passed to the storage clients using StorageClientInterface::transliterateDrupalFilters(). That is where filters can be translated to what is supported by the client when possible.
There is not an official/standard list of supported operators, but a basic list is provided in the Drupal doc and in XNTT StorageClientBase::testDrupalFilter():
'=', '!=', '<>', '>', '>=', '<', '<=', 'STARTS_WITH', 'CONTAINS', 'ENDS_WITH', 'IN', 'NOT IN', 'IS NULL', 'IS NOT NULL', 'EXISTS', 'NOT EXISTS', 'BETWEEN' and 'NOT BETWEEN'
We should list what is used by views to add to this list (eg. "NOT STARTS_WITH").
StorageClientBase::testDrupalFilter() is used when "standard" filters are not supported by the storage client, to provide a "degraded" filtering support on results. The current implementation does not support properly negated filters (ie. NOT ...). Some are, some aren't. It should be fixed because we can meet "NOT CONTAINS" for instance. The code needs to be refactored to handle "NOT" + any other filter.
For investigations, the method ::getTableMapping() comes from SqlContentEntityStorage class while XNTT storage implements ContentEntityStorageBase (which is correct).
I'll continue investigating...
While the code does not look bad, I could not reproduce the initial problem. However, with no surprise, I tried with a SQL storage client. I need time to try with another client. I'm also wondering if the problem would still be there with "::opNotContains()" "::opNotStartsWith()" and "::opNotEndsWith". I don't have all my ideas clear as it's late but I think the "final" operator is supposed to be provided by the storage clients themselves, so there should be no need for this patch. I need to review that more seriously...
guignonv → made their first commit to this issue’s fork.
@guignonv: Maybe this is why you're not running into it?
Yes that might be why because I rarely use full renderded entity. But now, I could try to reproduce the problem later to fix it.
Thanks! :)
What I can say is that I had a look to the views code of Search API to create the views plugin. I'm quite sure their code solves this problem in a way or another. So, my advice would be to see what they did... :p
I think it has been fixed in the current dev version (commit #397e51aa1227ddf52271bc26d7584b280275238d), not the last beta one.
Either try the dev version or patch until we release a new beta... :-S
Hi!
Thanks for the patch. For this feature, I had in mind to use features of the token module. But when I implemented that part, the token service was not part of the storage client services and it has recently been added.
Wouldn't it be more powerful to just support token replacement as it would go beyond static values provided by the query as it could include dynamic values (such as current used login for instance, or a user token, etc.)?
So I would prefer to keep using $this->entityTypeManager->getStorage('my_external_entity')->load("1234");
(to only keep a real id as parameter and keep the existing logic) and provide "color=blue" using another more generic mechanism in the single query parameters part (ie. ::getParametersFormDefaultValue('single')). I would see something either using tokens (which could return nothing if not available for instance) or using an event.
@arwillame, could you add a test in RestClientFunctionalTest.php to test your use case and ensure the fix remains stable in the future?
Your use case could be documented in the test to avoid regressions.
OK. I think I understand the problem but I need to think if your approach is the best. There might be a deeper design issue (-probably my bad-). I'm working other projects in parallel so I'll do my best to think about it as soon as I can and come back to you here. No hurry?
Looks good. Just to make sure I understand everything, it means you would get a single entity with no text keys and you will have to map numeric "keys" to fields then, right?
guignonv → made their first commit to this issue’s fork.
@colan, I don't see any recent commit on v2. Are you sure it has been fixed on that version as well? Or was this issue made on the wrong version maybe and the problem did not exist on v2?
Looks good to me. We can merge and see if we got back a similar problem in another use case one day. If it ever happens, we will advise at that time.
guignonv → created an issue.
Merging this will require to release a new beta as the API changed since storage clients now need to have a token service to be compatible.
It means existing plugins need to be updated and have a minimum requirement for this new beta (to create).
I'll update the plugins I manage and check for the one I don't. Meanwhile, you can merge.
Reviewed and merged.
I think I met that problem once. It rings a bell to me. Well, right now, I'm busy on another project but I'll be able to get back on this in 1 or 2 weeks I think/hope.
Merged! Thanks for the implementation. :)
(...) why is it necessary to map to a field that corresponds to a JSON property? Why not allow the file field to directly map to a property that can hold either a single URI or an array of URIs? Is there a specific use case where mapping to a label is beneficial? Additionally, is it possible to modify the approach so that the file field has its own property for direct mapping?
I'm quite sure we can get rid of an extra text or URI field to map a file field. Like you said, I think it is possible to have the file field has its own file URI property. I didn't investigate though, so I can't guarantee it possible. That's why I added to the roadmap for v3 what I put in #2 here. It needs to be investigated, and it is quite complex I think. I was planning to do it in the coming months but not in the coming weeks. Feel free to investigate! :-) But I may not be able to provide extensive support due to my other priority tasks.
Furthermore, I have this parent::addFieldValuesToRawData($field_values,$raw_data,$context); commented out, as I was pretty sure it does nothing for the file field. Is that fine? or should I include it?
I'm not sure. I think I left it in case, one day, the parent needs to do something special that should be applied to every field mapper, for instance triggering an event. But if it is commented because it does nothing right now, it's ok. If one day, the parent does something, then (maybe after a new issue is raised... :) ) we would put back the call to the parent method. Not a big deal.
Code looks good. I also enjoy the use of custom response decoder as it gives an example of using custom response decoders in external entities. I need to test it though... I'll do my best.
I'm wondering though if it could have been achieved using data processors. But I need to test to really understand the problem when facing data and not theory! ;-p
Also, I need to make sure it will work the way "reverse", when you need to save external data. I'm not sure it does... :-s
Looks good so far. Let me know when it's ready to be merged! :)
That was not an easy one. :)
In Drupal, when you use $entity->save() on a new entity, it updates the entity identifier to the new generated value and returns the global constant SAVED_NEW.
With external entities, an entity can come from one or multiple storage clients (sources). When you want to create a new record, a new identifier may be required as input (for example for the "File" storage client) or it can also be optional and generated by the storage client source. To make it more complicated, when you use multiple sources with "foreign keys", you may have to create records in "secondary" sources that may use different identifiers than the one you may provide.
The only way to have it working is to update the new entity identifier after it has been saved on the external sources. This action must be performed by the storage client because it is the one that got the entity, turns it into a raw array and send that array to the remote source. But the remote source must provide back the new identifier! Otherwise there is no generic way to get that new id. For the REST client, we will expect the query used to save the new entry to return the newly saved entity with its new identifier. I've updated the code to have it grab that new identifier and update the corresponding entity identifier. I tested and it appears to work.
What I didn't test yet is if you want to specify an identifier value by yourself. External entities does not allow to provide an identifier on the entity creation form as it follows the default behavior of ContentEntityForm. A work around would be to have another text field "field_specified_id" using the same mapping as the id field and use that field to specify a new id. Not tested though. I wonder what will be the behavior when the raw entity array will be generated: will the "null" value of the id field override the provided value of field_specified_id? I need to check that. There are no rules (or events) at the moment to tell in which order fields should be used to populate a raw entity array. (something to fix one day...)
At the moment, I did not create a corresponding automated test yet. Also, I need to check the other base clients: File storage client and SQL database storage client. And I did not check if the group aggregator handles new id properly when using "foreign keys" (it supposed to as it tricks the entity id for each client).
Meanwhile, I let you test the changes on the issue fork.
Set to major as saving is a core feature and it should work.
It make sense. I'll add a note for a future implementation of that feature from 📌 [META] Roadmap for v3.0 Active (currently under "For v3 beta 5"):
See if xntt_file_field module could rely on an added file field property (computed) rather than on another xntt field for the URI mapping
As those are tightly related.
Merged.
OK, I think it might be just a very stupid thing: I prepared a file data structure that I did not pass to the file create() method...
Could you try this fix and let me know if it works?
By the way, I never tried the file field plugin with data alteration operations! I'm not sure it will work in that case but who knows... :-p
From my perspective, you can not alter an external file you got from a URI but you can alter the URI you got. So, the code that might be missing is when saving data, the file field plugin should get the new file URI and report it to the text field used to map the URI. But I'm not sure it will be so easy. What if the URI field as also altered manually? Who wins? The value entered by the user or the one provided by the new file? Maybe, the behavior should be set on the file field mapper plugin settings? --> a new feature request? :-)
And for information (to people wondering how that works and did not find where it is explained), most of it is explained in this README.md. It could be improved though. Let me know what could be clarified.
guignonv → made their first commit to this issue’s fork.
Same issue on 3.0.x. Tested patch and merged on 3.0.x. Thanks! I let v2 maintainers apply it to v2.
- Could you provide a way to reproduce your problem on a fresh install?
- What storage client(s) do you use? Is it set to "read only"?
- Did you system use to work before or has it been like that all the time? (When did that change?)
- May your storage source be unavailable from times to times? (leading to external entities not being loaded)
- Do you use translation? (Content index: "entity:discourse_post/8:und" makes me think you do not. I would not see a direct reason explaining your issue but language management in xntt is a thing that recently changed)
It appears to be related to the way Search API works. A deep understanding of how it works is needed. For instance, I don't know if, when you access and indexed item, Search API tries to update it from external content and if that content is not available (even momentarily), it would clear the item. It's a lot of investigation work and I don't have much time so you'll need to provide me as much information as you can on how your system works.
If it is too complicated to solve, I would consider using the Xntt Manager → instead of search API in this case. It can be used to synchronize (ie. similar to Search API indexing) local Drupal content entity with external entities content. You can do it manually or using a cron to do it periodically. Then, you can use those "synchronized" content entity as you wish. If you synchronized to a node type content, you can use all features available on nodes. It could be a simple alternative (which can be deployed in parallel for testing). Keep me tuned.
Thanks! :) Updated.
Looks good to me. Yes, we can merge. I agree it should be a core requirement as it is very useful everywhere. A new beta would be needed indeed. The roadmap should be updated accordingly.
I'll have to check the storage plugins I manage to have them updated when needed (if they override the base constructor) but I'll need that new xntt beta release before.
@colan feel free to merge when you thinks it is ok.
Looks good to me. But it means we have to add the (core) token module as a dependency. I fixed tests accordingly. They pass on my environment but not on GitLab pipeline, probably because of the new dependency? Weird.
The fix is quite simple. Thanks! Merged.
I hope it is not a high priority for you nor urgent as I don't have time currently to review this feature request... :s
I'm not against the idea of splitting Rest client functions into pieces though. I would need time to read carefully what you put and understand what is your final goal and then I could have an opinion.
Merged.
Note: the second query should be corrected as it uses "de" instead of "fr". ;)
https://www.dummy.com/fr/jsonapi/node/publication?filter[id]=d4133e27-92a8-40c9-9b57-122494e444c5
I'm not sure as I am not currently dealing with multiple languages so I can' tell (it's far in my stack but it's there).
I'll let @das-peter answer if he passes by, but to give more info in case he does not, this patch allows to aggregate multiple language version of a content into one external entity instance.
So, let's say you use a REST client that gather content from a website and that content site provides 2 endpoints, one for English and one for French. Each content item has the same id on both endpoint but its content is in different language for each endpoint. You use the vertical aggregator with 2 REST clients, one for each endpoint, and you aggregate them using the new option "as a translation" for the second client. Then you are supposed to have a drop-down letting you choose the language corresponding to the client endpoint and it will add the translated content as a sub-field of the external entity instance. Your xntt array would look something like that, more a less:
[
'id' => 1234,
'title' => 'Some title',
'content' => 'Some content in English.',
'__external_entity_translation__fr' => [
'id' => 1234,
'title' => 'Un titre',
'content' => 'Du contenu en français.',
],
];
...that's what I understood so far.
Merged.
PHP Unit tests fixed now.
C.I. has been enabled. However, there are 2 errors in the "testHorizontalAggregation" test. I checked the logic of test and it appears ok to me so there is a bug in the horizontal aggregator client I think. :-( I think it used to work so it might be a real regression.
Also, there is a couple of validation warnings and many of them could be easily fixed. If somebody volunteers, go ahead for a PR! :-D
I mark it fixed to cleanup the issue queue. Feel free to reopen if not.
I support your approach. I don't have time to test such a thing right now but I'll try to next week. I reviewed the code though and it looks very good!
Fair enough! :) Merged! (I got time to update the repo now)
I support the idea of using tokens and there may be other places where it could be useful as well.
But it means we will need the token module as a requirement right? (I didn't play with the token module in my modules since Drupal 7!) Not a big deal though...
Merged. Thanks!
Reviewed code. Look good and fair. Could you update the repository so I can merge it thanks? :)
Weird because I'm convinced I had increased URL length 2048 a long time ago! But it appears not. :) Thanks for pointing that out and I prefer to use the largest limit 2048 to be safe. I doubt people would have thousands of external entity types in their databases and would need a shorted field length to save database space! ,:-)
I also added the URL for counts for consistency.
guignonv → made their first commit to this issue’s fork.
Hi!
I reviewed the changes. Looks quite good so far and I agree on the change to get mappable fields from the xntt. It makes sens.
However, I less confident in this part you removed:
$derived_entity_type = $this
->getExternalEntityType()
->getDerivedEntityType();
if (empty($derived_entity_type)) {
$this->logger->warning('FieldMapperBase::getFieldDefinition(): Cannot get external entity derived type.');
return NULL;
}
This is used by annotation entities. I did not make automated tests on these so I'm not sure it will work properly. I believe you removed that part also because it was not working with your use case right?
What about keeping that part but commented then?
/*
@todo The following code as been disabled and impacts on annotation nodes should be checked before removing it permanently.
@code
$derived_entity_type = $this
->getExternalEntityType()
->getDerivedEntityType();
if (empty($derived_entity_type)) {
$this->logger->warning('FieldMapperBase::getFieldDefinition(): Cannot get external entity derived type.');
return NULL;
}
@endcode
*/
...or something similar.
Thanks!
Looks fine to me. I just reviewed the code and did not test it yet but I believe it would work.
I'll try to contact him as well and keep you tuned.