- Issue created by @smokris
- Status changed to Needs review
about 1 year ago 7:16pm 16 October 2023 - last update
about 1 year ago 540 pass, 2 fail The last submitted patch, 3: search-api-rendered-page-3394384-3.patch, failed testing. View results →
- Status changed to Needs work
about 1 year ago 8:39am 23 October 2023 - 🇦🇹Austria drunken monkey Vienna, Austria
Great feature request, thanks!
There was already an issue for this (indexing arbitrary pages) a long time ago, but I cannot find it myself anymore. I think the main problem we were thinking about back then was security – it’s pretty much impossible to index reliable access information for arbitrary pages, so access checking doesn’t really work. (Or could at least only work as postprocessing for search results.)
However, if the admin enters the paths to index themselves, and we include a big warning to only include publicly available pages there (or, more accurately, pages that can be accessed by everyone that will be able to access the search results pages), I guess that should be fine.
Would still be interesting to see what we discussed there, but I guess that can’t be helped if I can’t find the issue anymore.What I was wondering about is whether sending an HTTP request is really the best way to obtain the page contents? Seems that, at the very least, this would depend on the site’s theme. Most should, at this point, be “nice” and put all main content into
<main>
, but I cannot believe this is universal. We might at least need to make that XPath query configurable.
Executing the request internally would have seemed a more natural choice to me, but I guess we can see in the Rendered Item processor what a myriad of edge cases you run into there, when trying to render something in an unexpected context, so maybe the HTTP request really is the way to go.Another sticky point here is that, in its current form, this would also need to send those HTTP requests every time some of the pages are displayed as search results, which is of course unacceptable. (You could get around this using Solr, or some other backend that returns the indexed fields, but if we want to add this to the module the behavior must also be acceptable when using the database backend.) So, we might need to cache the indexed values (probably in a new cache bin), and clear that cache every time the pages are reindexed. (That way, we could also avoid reindexing if the values haven’t changed – e.g., with some hash of the contents.) Or, we could take the
title
value from the menu item and only use the HTTP request forviewItem()
, so users would need to use the “Rendered item” processor if they want the HTML contents.Anyways, yes, I do think this is, in principle, fit to be included in the Search API. Please just tell me when it’s ready to review. I might also post about it somewhere to attract other testers, to make sure this works well for as many sites as possible.
When the site-builder updates the list of page paths, search_api should call the datasource's
getItemIds
method to get an updated list of items that need to be indexed. It doesn't automatically do this, and I haven't yet figured out how to convince it to do so. In thesubmitConfigurationForm
method, I tried adding$this->index->rebuildTracker();
, but that prevents the new form values from being saved.Maybe compute the difference yourself in the form submit method and then manually call
$index->trackItemsInserted()
/$index->trackItemsDeleted()
as appropriate?In any case, thanks again for working on this!
- 🇬🇧United Kingdom aesuk
Would this eventually cover the indexing of content in a header or footer of a view.?
Global Text Area is a field where most describe/summarise the contents of the view.
And so there are many use cases where that needs to be indexed. I see people searching for those keywords in our analytics. But for various reasons that content is not on the nodes themselves - only the header of the view.
e.g if I searched loosely for running shoes. Sure I would want to see all the shoes in the search, but I would actually want the main category page to show up first place on that search.
- 🇦🇹Austria drunken monkey Vienna, Austria
@aesuk: If the view is included in the
<main>
section of the page, then yes, that would be included. (As would the view’s contents.) - 🇬🇧United Kingdom aesuk
ok thanks... we can make sure our templates have that in place