External entities disappearing from Search API indices

Created on 27 February 2025, about 1 month ago

I'm using External Entities to gather a list of topics from a Discourse forum. I'm not doing anything with annotations (I didn't see a reason to, unless it ends up helping with this problem); I'm just using the external entities in two Search API indices: one that only has Discourse topics, and a second that includes them alongside Drupal content for the sitewide search.

I'm finding that over time, the Discourse entities I have indexed become unavailable in the Search API index. The index will show that I've indexed 0/0 Discourse entities, and the only way for me to get them loaded back in is to rebuild the index's tracking information and then reindex everything. The logs from Search API will also show that it can't load them: "Could not load the following items on index Content index: "entity:discourse_post/8:und"."

If it's helpful, I rebuilt and reindexed earlier this morning, and before three hours have passed, they've gone missing again. This is on a Pantheon-hosted sandbox, not a live site, if that's worth mentioning.

I'm not sure if this is a problem with the module or just with how I have things set up. I'm open to rearchitecting how I'm retrieving that data if that's necessary—the only consideration I can think of is that I use a custom Search API processor to better prepare some data to go into the search index, so I'd want to keep that in one form or another. If there's any more information I can provide that'd help pinpoint what's going on, please let me know. Thank you!

💬 Support request
Status

Active

Version

3.0

Component

Code

Created by

🇺🇸United States kurttrowbridge

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @kurttrowbridge
  • 🇫🇷France guignonv Montpellier
    1. Could you provide a way to reproduce your problem on a fresh install?
    2. What storage client(s) do you use? Is it set to "read only"?
    3. Did you system use to work before or has it been like that all the time? (When did that change?)
    4. May your storage source be unavailable from times to times? (leading to external entities not being loaded)
    5. Do you use translation? (Content index: "entity:discourse_post/8:und" makes me think you do not. I would not see a direct reason explaining your issue but language management in xntt is a thing that recently changed)

    It appears to be related to the way Search API works. A deep understanding of how it works is needed. For instance, I don't know if, when you access and indexed item, Search API tries to update it from external content and if that content is not available (even momentarily), it would clear the item. It's a lot of investigation work and I don't have much time so you'll need to provide me as much information as you can on how your system works.

    If it is too complicated to solve, I would consider using the Xntt Manager instead of search API in this case. It can be used to synchronize (ie. similar to Search API indexing) local Drupal content entity with external entities content. You can do it manually or using a cron to do it periodically. Then, you can use those "synchronized" content entity as you wish. If you synchronized to a node type content, you can use all features available on nodes. It could be a simple alternative (which can be deployed in parallel for testing). Keep me tuned.

  • 🇺🇸United States kurttrowbridge

    Hi! Thanks for the detailed response. I continued to try to figure out why the entities were becoming unavailable in the Search API index, but ended up taking your advice to use External Entities Manager instead, and that seems to be working well. I'm still able to do what I need with indexing and adding an additional Search API processor, just based on the nodes in the synced content type rather than the external entities. Thanks again!

Production build 0.71.5 2024