Failed to validate remote image with no file extension

Created on 24 August 2019, over 5 years ago
Updated 28 September 2023, over 1 year ago

Problem/Motivation

If a file to import from a URL doesn't have a file extension, Feeds fails to import it.

Sample photo without file extension

https://s4.reutersmedia.net/resources/r/?m=02&d=20160330&t=2&i=1128905435

Error messages

1) failed to validate with the following errors:
field_photo.0: Only files with the following extensions are allowed: png gif jpg jpeg.

2) failed to save because the extension AAuE7mC3_09BzkSYxsRtNA=c0x00000000-cc-rp-mo-ba5 is invalid

A sample successful user photo with file extension

https://lh3.googleusercontent.com/a-/AAuE7mC3_09BzkSYxsRtNA=c0x00000000-cc-rp-mo-ba5.jpg

Steps to reproduce

  1. Find a resource URL where a file can be downloaded but the extension isn't specified in the URL
  2. Create a feed type, map to title and to a file target.
  3. Import data like the following (CSV example):
    title,file
    Lorem Ipsum,https://s4.reutersmedia.net/resources/r/?m=02&d=20160330&t=2&i=1128905435
    

Proposed resolution

If an extension isn't specified in the url, try to determine the extension by inspecting the mime type of the file.

Remaining tasks

  1. Write tests covering the following cases:
    • Import a file with extension that is not an image.
    • Import a file without extension that is not an image.
    • Try to import a file that does not exist.
  2. Write the fix in such a way that at least test cases 1 to 3 passes. Fixing test case 4 may be handled in an other issue such as #3023198: Display an understandable error message when source file contains links to non-existing files .
💬 Support request
Status

Needs work

Version

3.0

Component

Code

Created by

🇵🇭Philippines madelyncruz

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    703 pass
  • @elaman opened merge request.
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    703 pass
  • Pipeline finished with Canceled
    over 1 year ago
    Total: 176s
    #24633
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    703 pass
  • Pipeline finished with Canceled
    over 1 year ago
    #24635
  • Pipeline finished with Failed
    over 1 year ago
    Total: 1084s
    #24638
  • Had to import XLSX files that similarly did not have a filename in the path. Used content-disposition / content-type headers. If someone finds this useful, this worked for me, up to now.

  • 🇳🇱Netherlands megachriz

    Because tests are failing, I added a task to the list that says "Try to adjust the code or the tests so that they pass.".

  • 🇳🇱Netherlands arccoss Utrecht
  • 🇳🇱Netherlands megachriz

    @arccoss and I looked at this issue yesterday at the DICTU Drupal Developers Day in Assen.
    @arccoss confirmed that the MR !89 fixes the issue for images. We also found out why the test is currently failing. It is because the call to Drupal\feeds\Feeds\Target\File::getMimeSubtypeFromUrl() doesn't work in a Kernel test. The method is using cURL and since a Kernel test only has PHP and a database, but not an accessible site via HTTP, trying to retrieve something from a url like "http://localhost/modules/custom/feeds/tests/resources/assets/attersee_no..." will result into a 404 HTML-document, hence why the test said the extension is html, because that is what Drupal\feeds\Feeds\Target\File::getMimeSubtypeFromUrl() will get from a 404.

    A way to deal with this and keep the Kernel test, is to somehow override the curl part of Drupal\feeds\Feeds\Target\File::getMimeSubtypeFromUrl() in the test. So here are some thoughts about what we could do:

    • The method could be split up into two parts, one that does the curl thing and one that does the "finfo" thing.
    • In FileTest.php and in ImageTest.php we add a class that extends Drupal\feeds\Feeds\Target\File and Drupal\feeds\Feeds\Target\Image respectively and change FileTest::getTargetPluginClass() and ImageTest::getTargetPluginClass() to return the newly created classes. In these classes a method would need to be overridden:
      • getMimeSubtypeFromUrl() could be completely overridden, returning just an extension string.
      • Or just the curl part would get overridden, but that method override must be capable of returning something similar as what curl can do and I don't know enough about curl to know if that's doable.
    • An other thought, instead of using curl directly here, replace the logic with using a service. I'm not sure if the same result could be achieved using Guzzle?
    • I believe that there's also a possibility that curl is not available and perhaps the code should also account for that situation. At least in the D7 version of Feeds there was a check for this in the function http_request_use_curl(). Maybe we can avoid doing a check like this if we can replace the usage of curl with a service, as said in the point above.

    I also wondered why what there is another MR here. It seems like an other approach, but there's no clear explanation of why it exists.
    @imash Can you explain why you opened another MR? Did the other MR not work or does your MR solves something that is not covered by the other one?

Production build 0.71.5 2024