Hook for Processing Extracted Content

Created on 21 January 2022, almost 3 years ago
Updated 30 November 2024, about 1 month ago

Problem/Motivation

After documents have their contents extracted it would be useful to have a hook in order to further sanitize/optimize the extracted contents. We have sometimes very large GIS maps and the extracted contents could contain lots of trash Unicode characters from map symbols etc, content that is not helpful for solr to index. For background see https://www.drupal.org/project/search_api_attachments/issues/3259455#com... β†’

Allow a hook to process the extracted document content, before the 'limitBytes' processing happens.

In local testing this has also reduced our 'key_value' table from 361Mib to 157Mib for our particular site. And gives us the chance to improve the look for some search results.

Proposed resolution

- A hook allowing a custom module to process the extracted data.

✨ Feature request
Status

Closed: works as designed

Version

9.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States NicholasS

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024