ImageMagick module hangs file upload process when convert of PDF file results in more than 64K chars of warnings

Created on 10 January 2017, over 8 years ago
Updated 20 January 2025, 5 months ago

Problem/Motivation

Our Drupal 7.44 site uses the pdfpreview and imagemagick modules to generate thumbnails of PDF files that our users upload. The pdfpreview module is at version 7.x-2.1, the imagemagick module is at 7.x-1.0, the actual imagemagick convert application is at 6.8.6-3 2014-04-08 Q16, while ghostscript is at 9.16.

Our website has about 3,800 PDF files in its Drupal file repository. These PDFs are created by many different applications across the organization.

The site is designed so that when users upload a PDF using media browser from an CKeditor panel, pdfpreview / imagemagick generates a thumbnail and upon success, then the media browser closes and the user is back in CKeditor with their new file inserted.

For some months users report a few PDFs didn't upload successfully and they get a service unavailable error (which comes from Pantheon) inside the media browser window. In early December there were one or two PDFs that appeared to crash the website during the upload process. However there were no errors in the error log from these failed uploads, even when turning debug switch on in imagemagick module admin settings.

We've tried upgrading ImageMagick and ghostscript to newer versions on clones of the website, and that helps reduce the number of timeouts processing complex PDF files, but it doesn't end the upload hanging in Drupal.

In analyzing the 3,800 PDF files, it appears that the PDFs that lead to Drupal hangs have one thing in common: the warnings that convert outputs exceed 65536 characters (on webservers that have pipe buffer limits of 64KB). I've posted two example PDF files to demonstrate the problem:

#1 http://library.oregonmetro.gov/files/River_Island_project_map.pdf results in 56240 characters of convert warning output (+/- a few)
#2 http://library.oregonmetro.gov/files/GreshamRouteReview.pdf results in 1845851 characters of convert warning output (+/- a few)

PDF #1 converts fine both on our live site and in a clone of it with newer version of convert/ghostscript, while PDF #2 fails on both. I would guess #2 will fail even on sites with a higher 1MB pipe buffer size which is why I included it. I've tried a number of files greater than 65536 warning char length, and they all fail. All files less than that number succeed.

The warnings are all of the type:

WARNING in tgt_create tree->numnodes == 0, no tree created.
WARNING: No imsbtree created.
WARNING in tgt_create tree->numnodes == 0, no tree created.
WARNING: No incltree created.

repeated over and over again. Clearly ImageMagick's convert is having problems with the structure of the PDF, but like Acrobat Reader or Pro, the latest version of convert is able to open and rasterize them. If you are curious how many of the PDFs getting thousands of warnings were generated, they came from ESRI ArcMap, with feature attributes included. In a couple cases GIS specialists opened those in Illustrator and saved them out as new PDFs. Both example PDFs I posted are openable in Acrobat Pro and other PDF tools like OSX Preview.

I ran a script to run convert from the command line to save thumbnails of all 3,800 PDF files using the same parameters the Drupal site uses. That succeeds in saving all the thumbnails, albeit about 50 PDFs get warnings like the above in varying quantities.

In the file imagemagick.module I inserted a bunch of debug statements in the function _imagemagick_convert_exec() function, and the hang happens right after line 480
specifically inside the while statement

    while (!feof($pipes[1])) {
      $output .= fgets($pipes[1]); 

I dumped $command, $descriptors, $pipes and they all seem fine.
When I took the exact convert command that was being spawned via proc_open() to a terminal window, it generated the thumbnail successfully. That command looks like:
/usr/local/bin/convert '/my-path-to-drupal-root/sites/default/files/GreshamRouteReview-orig-test.pdf[0]' -quiet -colorspace 'sRGB' -alpha remove -quality '51' -resize '1024x1024' -density 72 -units PixelsPerInch '/my-path-to-drupal-root/sites/default/files/pdfpreview/da43810fe03e829ed723d96bc85627a7.jpg'

For the PDFs with > 64KB of warnings, Drupal / PHP is failing to do something that works fine as a standalone shell process.

Users would often either click save again on the upload, or try it again, resulting in multiple convert processes fired up for one upload attempt.

It would appear that various processes started by the imagemagick module never finish, and over time enough failed uploads contribute to exhausting resources on the server. The fact that we had site crashes as a result of imagemagick warnings seemed like a foundation for choosing a non-trivial issue priority.

Of course the time the site was crashing from this caused a lot of alarm, and we had to ask users not to upload PDF files. Our users' confidence in Drupal went down a bit as a result.

Proposed changes

Every year, user-generated PDFs get bigger and/or more complex. Perhaps the convert process should use a option like -quiet to suppress warnings? I looked through the entire site codebase and quiet does not seem to be used anywhere.

We've also seen the same type of behavior from imagemagick timing out on the conversion. Drupal seems not very well insulated from failures calling imagemagick. Would it be possible to have a timeout on the proc_open() with some error handling?

Sidenote: the site by default does not have the imagemagick advanced module activated. However during testing I turned it on, but it seemed to make no difference in the issue behavior. The convert command that I captured above was from a time when I had the advanced settings on.

🐛 Bug report
Status

Closed: outdated

Version

1.0

Component

Code

Created by

🇺🇸United States e3g

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024