All fields being indexed as arrays

Created on 12 April 2022, over 3 years ago
Updated 20 March 2023, over 2 years ago

It seems that this module is making all fields become indexed as arrays no matter the type specified in the mappings.

So a string, will be a text array, an integer an integer array and so on.

Not sure if this was inherited from ES connector, but it creates a weird situation as the values conveyed in the mappings don't correspond to reality.

Here is an example:

{
        "_source" : {
      
          "industry_sector_name" : [
            "Education & Training"
          ],
          "nid" : [
            45574
          ],
          "opportunity_type_name" : [
            "Virtual Experience"
          ],
          "overview" : [
            """Lorem ipsum""
          ],
          "parent_employer_advertiser_name" : [
            "Example"
          ],
          "study_field_name" : [
            "Engineering & Mathematics"
          ],
          "title" : [
            "Example virtual experience (REM)"
          ],
          "search_api_id" : [
            "entity:node/45574:en"
          ],
          "search_api_datasource" : [
            "entity:node"
          ],
          "search_api_language" : [
            "en"
          ]
        }
      }

Not really sure if there is a good reason for this, but it feels strange, especially coming from Solr Search API where those mappigs are respected.

Thoughts?

πŸ› Bug report
Status

Active

Version

1.0

Component

Code

Created by

πŸ‡¦πŸ‡ΊAustralia kyuubi

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    My understanding is all fields are multi-value fields, even if there is just one value. There is no array data type.

  • Status changed to Postponed: needs info over 2 years ago
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Postponing unless someone has more input on my previous comment.

  • Status changed to Active over 2 years ago
  • πŸ‡¦πŸ‡ΊAustralia tallytarik

    I've just run into this trying to use the Neural Search plugin with a custom knn_vector field and ingest pipeline. That field type only supports a single value, and throws an error when it's passed an array. This is what happens at the moment because the input text field (title in my case) is indexed as an array:

    [error] failed to parse field [title_embedding] of type [knn_vector] in document with id 'entity:node/12345:en'. Preview of field's value: '{knn=[...]}'. Current token (START_OBJECT) not numeric, can not use numeric value accessors

    I've hacked together a change to IndexParamBuilder::buildFieldValues() to return the value as a string instead of an array, and can confirm it now works. Something like the patch πŸ’¬ Source Fields in Elasticsearch Index are arrays RTBC in the linked issue might be the way to go - check the field cardinality for each field, and if it's 1, process and return the first (and only) value rather than as an array. I'm pretty new to OpenSearch so not 100% across if there could be other impacts of that change, though.

  • achap πŸ‡¦πŸ‡Ί

    Just wanted to say I had the exact same issue as tallytarik. For the most part everything being an array did not affect anything apart from when implementing a knn_vector field. It throws the same error for a multi value field. I used the IndexParamsEvent to alter the field to be single value and it works.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    I'm happy to consider this in the next major release branch. I think it would be a BC break and not sure if there would be an upgrade path needed.

  • Status changed to Needs work over 1 year ago
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Spent some time on looking at the patch and how this could be implemented here. The code that checks for whether a field is a list or not is quite complex and seems to indicate there is a lack of trust in the TypeData definition isList() method.

    If we were to proceed, I would expect we would need pretty decent Kernel test coverage to ensure indexing and querying work as expected.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    I just ran into this as my view was breaking.

    What happens is that, if you configure a view for an index named Foo, to use the field "Link to InsertEntityTypeName" under the category "Index Foo", the view crashes.

    This is because said field will eventually end up calling for the "langcode" property of the row in Drupal\views\Entity\Render\TranslationLanguageRenderer::getLangcode() with a $this->langcodeAlias as "langcode"

      /**
       * {@inheritdoc}
       */
      public function getLangcode(ResultRow $row) {
        return $row->{$this->langcodeAlias} ?? $this->languageManager->getDefaultLanguage()->getId();
      }
    

    Because $row->langcode is an array containing a single langcode rather than a string, the parent call from \Drupal\views\Entity\Render\EntityTranslationRendererBase::getLangcodeByRelationship() crashes because it has a return value type of string.

      public function getLangcodeByRelationship(ResultRow $row, string $relationship): string {
    

    Fields from the category "Foo datasource" do not seem to suffer from this as they use search_api's EntityTranslationRenderer instead. But that's from my limited testing.

    Long story short:

    1. Views expects strings in some places
    2. This module returns arrays of strings
    3. Weird crashes and painful debugging ensue

    So I'd bump this to major given how it can completely break Views :)

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    I just did some more digging and this cannot be solved in only this module. I implemented IndexParamsEvent to make sure my index had actual scalars in it and verified that using the opensearch dashboard.

    But Search API still uses arrays as ResultRow properties, because of SearchAPiQuery::addResults() relying on a bunch of Drupal\search_api\Item\ItemInterface, which in turn contain a bunch of \Drupal\search_api\Item\FieldInterface. The latter can only contain array values and this module's QueryResultParser::parseResult() correctly feeds it that.

    So even if we store scalar data in the index, Search API will still pass it around as arrays once retrieved from the index. Which means the problem from my previous comment persists.

  • πŸ‡§πŸ‡ͺBelgium kristiaanvandeneynde Antwerp, Belgium

    Moving back to normal as I found out Search API will always convert whatever data that came from the index into arrays. Which raises the question whether we need to even bother storing the data as scalars.

    The field we were struggling with was being added by a views data alter gone rogue.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Thanks for your investigation in #12.

    Given that, I think we should close this won't fix. Please re-open if there is an alternative approach.

Production build 0.71.5 2024