Backend processing of solr_date_range can lead to Solr Exception

Created on 22 October 2024, 3 months ago

Setup

  • Solr version: 9.2
  • Drupal Core version: 10.3.x
  • Search API version: 1.35
  • Search API Solr version: 4.3.5
  • Configured Solr Connector:

Issue

The way Solr Date Ranges are processed and added as a field to be sent to Solr can end with a whole document not being indexed if start/end or both fail and return empty.

In specific here:

https://git.drupalcode.org/project/search_api_solr/-/blob/4.x/src/Plugin...

Both $start = $this->formatDate($value->getStart()) and $end = $this->formatDate($value->getEnd()); might return NULL if the value (e.g a double date) can not be parsed by DateTimePlus (side note/ PHP/valid 64bit date) like
18371839-01-01T00:00:00-05:00 (yes I know, still valid though).

That value will trigger a warning (correct) but end in specific for date ranges being sent to Solr as `[ TO ]` which is invalid as a field value for a Date Range Field.

Which will trigger an exception like this

  "responseHeader":{
>     "status":400,
>     "QTime":0},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","java.lang.IllegalArgumentException"],
>     "msg":"ERROR: [doc=zxc5xb-drupal_ead-entity:node/337941:en] Error adding field 'drm_ead_edtf_all_dates_ranges'='[[1768-01-01T04:56:00Z TO 1769-01-01T04:55:59Z], [1795-01-01T04:56:00Z TO 1796-01-01T04:55:59Z], [1797-01-01T04:56:00Z TO 1798-01-01T04:55:59Z], [ TO ], [1811-01-01T04:56:00Z TO 1812-01-01T04:55:59Z], [1813-01-01T04:56:00Z TO 1814-01-01T04:55:59Z], [1825-01-01T04:56:00Z TO 1826-01-01T04:55:59Z], [1834-01-01T04:56:00Z TO 1835-01-01T04:55:59Z], [ TO ], [1844-01-01T04:56:00Z TO 1845-01-01T04:55:59Z], [1847-01-01T04:56:00Z TO 1848-01-01T04:55:59Z]]' msg=str is null or blank",
>     "code":400}}
> ). in Drupal\search_api_solr\SolrConnector\SolrConnectorPluginBase->handleHttpException() (line 1152 of /var/www/html/web/modules/contrib/search_api_solr/src/SolrConnector/SolrConnectorPluginBase.php). 

The simplest solution would be to check if both $start and $end are not empty, and if any of those are, then continue; iterating but don't set the value (a NULL as value also not a good idea)

Please let me know if you need more info. Happy to fork/make a patch if the solution sounds correct.

πŸ› Bug report
Status

Active

Version

4.3

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States diegopino

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @diegopino
  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    Isn't that exception the correct behaviour if the value could not be parsed?
    Simply silently skipping the value seems wrong to me.

    Can you provide a list of examples that are valid drupal field values, but result in a parse error?

  • πŸ‡ΊπŸ‡ΈUnited States diegopino

    Thanks for your reply @mkalkbrenner.

    The exception thrown at the backend/index level is not in question, at least as a last resort given that on a date out of range date time range we are sending a "[ TO ]" to solr, which is wrong . From a perspective of an user indexing data, it is not something and enduser can act on to correct the issue (or conform to date/time limitations of Drupal, which are not Solr limitations for date range values). Also it is not "silent", the date time parser will already log that the date could not be parsed.

    But I think my point is valid because the code is already doing what I believe is the right behavior (and the proposed solution) for dates here

    https://git.drupalcode.org/project/search_api_solr/-/blob/4.x/src/Plugin...

    case 'date':
                $value = $this->formatDate($value);
                if ($value === FALSE) {
                  continue 2;
                }
                break;
    

    So it would not hurt to do

    
              case 'solr_date_range':
                $start = $this->formatDate($value->getStart());
                $end = $this->formatDate($value->getEnd());
                if ($start === FALSE || $end === FALSE) {
                     continue 2;
                }
                $value = '[' . $start . ' TO ' . $end . ']';
                break;
    

    Thanks.

  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    You're right. We should be consistent here an provide a corresponding patch.

    Nevertheless, some examples would be good. Or even better, a test.

  • πŸ‡ΊπŸ‡ΈUnited States diegopino

    I will make a fork/patch this week. I want to better understand what are the date time class limits of Drupal for a test (and will provide a test too once I do so).

    We work with Cultural heritage( so our dates can span thousands if not million years into the past when dealing with archeological/paleontological artifacts) and for that we use custom fields/proper Drupal typed data sub properties based https://github.com/ProfessionalWiki/EDTF parsing, so not Drupal UI normal input values you might find in standard Drupal websites.

    The values are correct 64bit PHP dates though. Just to be clear, not asking for support of those dates (would require a complete overhaul of Drupal's time logic), just to handle them as other dates are being handled

    Thanks a lot for your feedback and time to review my issue

Production build 0.71.5 2024