After documents have their contents extracted it would be useful to have a hook in order to further sanitize/optimize the extracted contents. We have sometimes very large GIS maps and the extracted contents could contain lots of trash Unicode characters from map symbols etc, content that is not helpful for solr to index. For background see https://www.drupal.org/project/search_api_attachments/issues/3259455#com... β
Allow a hook to process the extracted document content, before the 'limitBytes' processing happens.
In local testing this has also reduced our 'key_value' table from 361Mib to 157Mib for our particular site. And gives us the chance to improve the look for some search results.
- A hook allowing a custom module to process the extracted data.
Closed: works as designed
9.0
Code
Not all content is available!
It's likely this issue predates Contrib.social: some issue and comment data are missing.