Hook for Processing Extracted Content

Created on 21 January 2022, over 3 years ago

Updated 30 November 2024, 7 months ago

Problem/Motivation

After documents have their contents extracted it would be useful to have a hook in order to further sanitize/optimize the extracted contents. We have sometimes very large GIS maps and the extracted contents could contain lots of trash Unicode characters from map symbols etc, content that is not helpful for solr to index. For background see https://www.drupal.org/project/search_api_attachments/issues/3259455#com... →

Allow a hook to process the extracted document content, before the 'limitBytes' processing happens.

In local testing this has also reduced our 'key_value' table from 361Mib to 157Mib for our particular site. And gives us the chance to improve the look for some search results.

Proposed resolution

- A hook allowing a custom module to process the extracted data.

✨ Feature request

Status

Closed: works as designed

Version

9.0

Component

Code

Created by

🇺🇸United States NicholasS

Live updates comments and jobs are added and updated live.

Incomplete comments

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Comment 7 months ago →
🇫🇷France izus

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024