ability to remove metadata from uploaded files

Created on 2 December 2019, about 5 years ago
Updated 1 February 2023, almost 2 years ago

Hi,

Requirement:

I am using drupal media along with s3fs module to store/read documents in/from Amazon S3.
Is there a way to remove metadata (personal information like author, company etc) while uploading documents (pdf, doc) to S3 so that user of the website never gets the personal information of the person who has uploaded the document?

Versions used:

drupal: 8.6.17
s3fs: 8.x-3.0-alpha13
drupal/dropzonejs: 2.0.0-alpha4

Any suggestion/recommendation is highly appreciated.

Feature request
Status

Active

Version

10.1

Component
Media 

Last updated 4 days ago

Created by

🇦🇺Australia amit.sharma.aust

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇦🇺Australia stephenrodrigo@yahoo.com Melbourne

    I agree with @solideogloria.

    Here is a good article how to remove the metadata in PDFs.
    Removing metadata from PDF files using Exiftool and qpdf

    I am actually using this in file_pre_save hook

    function xxx_file_presave($file) {
      // only proceed with pdf
      if ( !isset($file->filemime) || !preg_match('/pdf/', $file->filemime) ) {
        return;
      }
    
      $tmpfilename = 'private://' . drupal_random_key();
      $file_realpath = drupal_realpath($file->uri);
      $tmpfilename_realpath = drupal_realpath($tmpfilename);
      try {
        copy($file_realpath, $tmpfilename_realpath);
        $a = `exiftool -all:all= $tmpfilename_realpath`;
        $b = `qpdf --linearize $tmpfilename_realpath ${tmpfilename_realpath}_linear`;
        $c = `exiftool -all:all= ${tmpfilename_realpath}_linear`;
        $d = `qpdf --linearize ${tmpfilename_realpath}_linear ${tmpfilename_realpath}_linear_cleaned_linear`;
        copy("${tmpfilename_realpath}_linear_cleaned_linear", $file_realpath);
    
        drupal_set_message(t('PDF metadata cleaned for @filename.', array('@filename' => $file->filename)));
      } catch (\Exception $e) {
        watchdog(WATCHDOG_ERROR, 'PDF metadata cleaning failed for @filename.', array('@filename' => $file->filename));
        watchdog(WATCHDOG_DEBUG, $e->getMessage());
      }
    }
    
Production build 0.71.5 2024