Don't store full extracted file content data in the database

Created on 1 October 2019, about 5 years ago
Updated 11 July 2024, 5 months ago

After running a full migration on a project, I noticed my database went from 200MB to just about 3 GB.

I ran a query to find the largest tables, and this was entirely the key_value table at 2.6 GB. I noticed that every content item that Solr is indexing the PDF attachment has the entire text dump in this record, which leads to the ever increasing size.

This will not scale very well, as just 25,000 items with 1 PDF attachment created such a large increase in overall size.

I am using the built-in Solr Extractor with this module.

πŸ› Bug report
Status

Needs review

Version

9.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States kevinquillen

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024