I had invalid URL values in a link field, stored in the database due to a flawed import from
contentimport β
. The module managed to save "n/a" in some URL field values. That issue about the module is here:
#2953064: Invalid URLs cause cron & views to stop working β
Later I found that both views and cron content indexing died fatally from any attempt to work with the nodes containing this URL. Fortunately the node editing page let me delete the bad links easily and everything returns to normal.
A step to reproduce, could be making a view that displays a node with "n/a" set in the URL field (or any other invalid value).
I think this is too severe a failure for Drupal for this type of content. If invalid content is in a field it is better to help the end user locate and remove this, than it is to simply stop working without any pointer to the source of the problem. Bad links in particular are something more likely to happen, perhaps imported from a feed automatically or similarly.
IMHO there should be an error placed on Watchdog logs with helpful information about which node is offending. Currently the error does not indicate what node is the problem because I think that is not available to the function.
I'm not sure what parts of the software are counting on an Exception to happen so what needs to be accounted for in a change, isn't clear to me.
Also I have put this under cron but it applies equally to views. If a particular node is jamming cron's search indexing process then that would be best to make it possible to trace which node is bad.
The exceptions are like:
Uncaught PHP Exception InvalidArgumentException: "The URI 'n/a' is invalid. You must use a valid URI scheme." at /web/core/lib/Drupal/Core/Url.php line 281
The bit of the public static function fromUri is:
elseif (empty($uri_parts['scheme'])) {
throw new \InvalidArgumentException("The URI '$uri' is invalid. You must use a valid URI scheme.");
}
I wonder if it would be better to simply return an empty string instead of an exception in this case. Certainly, I think if it is killing the cron indexing it would be better to not have an exception or work around it somehow.
Also maybe another optional element on the $options parameter array, perhaps like, "context" or "origin" which might be a node URL or some other reference to include in the error. Then whatever process fired the function fromUri could include this info. That alone would improve the situation a lot.
Related issues: This concerns generally changing the ways that URLs are stored and read out of the database in Drupal. Some of these are kind of loosely related but illustrate different aspects of how URLs are handled.