- Issue created by @malcomio
- Merge request !41Issue #3511949: prevent memory errors from extractBody → (Open) created by malcomio
- First commit to issue fork.
- 🇬🇧United Kingdom malcomio
Possibly related to the following issues on older branches:
#2474849: Ignore attachments based on uri, filesize or file extension →
🐛 Request Entity Too Large Closed: cannot reproducePerhaps the way forward would be to limit extraction by file size.
There is a config option in the schema, and this is configurable via the Search API processors form.
We saw the error with the following config:
file_attachments: excluded_extensions: 'aif art avi bmp gif ico mov oga ogv png psd ra ram rgb flv' number_indexed: 0 number_first_bytes: '1 MB' max_filesize: '0' excluded_private: 1 excluded_mimes: 'audio/x-aiff image/x-jg video/x-msvideo image/x-ms-bmp image/gif image/vnd.microsoft.icon video/quicktime audio/ogg video/ogg image/png image/x-photoshop audio/x-realaudio audio/x-pn-realaudio image/x-rgb video/x-flv'
Perhaps we need to try changing these settings?
- 🇬🇧United Kingdom malcomio
It may also be worth adding extra logging, similar to 📌 Introduce debug mode - do not pollute indexing output with Tika warnings Active .
For example, if debug mode is on, the extractor could:
1. log details of the file before it tries to do the extraction
2. report success or failure