This is a spinoff of
1960684 - Allow filtering by "Language (with fallback)" →
(see from #68).
Here's a use case: I have a content type "Picture" that is never translated and not even translatable because it doesn't contain anything that could be meaningfully translated: just the image field, some stuff extracted from the EXIF, an ER link to the Tags vocabulary and an ER link to the City content type. Even though the Picture per se doesn't have any translations, tags and cities do, and I would like to get the same pictures when searching for, say, "Wien Gemälde" that would come up for "Vienna painting".
Out of the box the words "Wien" and "Gemälde" will never make it to the index because there is no German translation of the master entity (Picture), so there simply will not be an index item for German language. The approach implemented in 1960684 does not help here either because it will pretend that a [non-existent] index item for German exists but it will be a carbon copy of the English one (the only one actually stored).
So the objective was to create an index item for every missing translation and make sure the fields from referenced entities are fetched and stored in the item language.
The first step was to implement HOOK_search_api_index_items_alter in my own module to add index items for missing translations (very crude site-specific proof-of-concept grade code here):
function HOOK_search_api_index_items_alter(\Drupal\search_api\IndexInterface $index, array &$items) {
if ($index->get('id') !== 'search_pictures') return;
foreach ($items as $item_id => $item) {
if($item->getLanguage() == 'en') {
$items[str_replace(':en', ':ru', $item_id)] = new Drupal\search_api\Item\Item($item->getIndex(), str_replace(':en', ':ru', $item_id));
$items[str_replace(':en', ':ru', $item_id)]->setLanguage('ru');
}
}
}
I hoped that would be enough but it isn't: the ContentEntity datasource plugin won't populate an item pointing to a non-existent translation. Had to tweak the plugin code a bit (patch follows).
The result works fine for me with the only minor glitch that the index rebuilder on the search index page works abnormally: as the number of resulting index items is up to twice larger than that of the tracker items, it only indexes half of the items on the first run, then half of what's left etc. I can live with that for now but I am sure I can fix it easily if needed.
Now to why I am describing this here in such detail even though I was clearly told this functionality is not going to make it into the module (and I agree with that!). My solution involves two parts: one is implemented in my own custom module and the other one alters the Search API codebase. Extra index items for missing translations are NOT going to be created unless BOTH parts are implemented. So why don't we consider committing the "enabling" part to the module (it won't change any existing behavior unless someone fiddles with index items in hook_alter) and then leave it to the site developer to actually switch it on in their own custom module? The hook/event implementation gives the developer all the configurability they need: in your own code you can easily fine-tune it per index or per whatever property of the entity being indexed, without touching the Search API code at all.
What do you think?