Add "jp2 jpc j2k sqlite3 sqlite db db3" to excluded defaults?

Created on 13 September 2019, about 5 years ago
Updated 21 February 2023, over 1 year ago

In issue #2913510 - Newest version of Tika giving warnings/messages there was talk about tika warnings about

  • JBIG2ImageReader not loaded. jbig2 files will be ignored
  • J2KImageReader not loaded. JPEG2000 files will not be processed.
  • org.xerial's sqlite-jdbc is not loaded.
  • Using fallback font 'LiberationSans' for 'XYZ'

As mentioned in that issue I got rid of all these warnings (sqlite-jdbc, JBIG2ImageReader and fallback font) by adding "jp2 jpc j2k sqlite3 sqlite db db3" to the "Excluded file extensions" setting.

Maybe we could add "jp2 jpc j2k sqlite3 sqlite db db3" to the default "Excluded file extensions" setting?

An old but similiar issue ( #1083824 - Add jpg to excluded defaults ) stated:

jpg images can contain meta data that can be indexed as well, and you want to index as much data as possible, so this works as designed.

I think the mentioned embedded extensions have metadata as well which could be indexed.
But since it does not work with the default configuration, I think we should exclude them by default.

Any thoughts?

Feature request
Status

Closed: works as designed

Version

1.0

Component

Code

Created by

🇩🇪Germany gngn

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024