Allow analyzers to be specified for elasticsearch pipelines

Created on 21 November 2023, about 1 year ago
Updated 29 November 2023, about 1 year ago

Problem/Motivation

Use case: we need to store HTML data in a field, and perform search filters and sorts on the same field, ignoring the HTML characters.

This can be achieved in elasticsearch by adding an analyzer, for example:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "char_filter": ["html_strip"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          },
          "plain_text": {
            "type": "text",
            "analyzer": "my_analyzer"
          }
        }
      }
    }
  }
}

Steps to reproduce

N/A

Proposed resolution

Allow the analysis configuration to be provided in a pipeline yaml file:

my_pipeline:
  label: 'My Pipeline'
  class: '\Drupal\data_pipelines_elasticsearch\ElasticSearchDatasetPipeline'
  analysis:
    analyzer:
      my_analyzer:
        tokenizer: keyword
        char_filter:
          - html_strip
  mappings:
    properties:
      name:
        type: text
        fields:
          keyword:
            type: keyword
          plain_text:
            type: text
            analyzer: my_analyzer

Remaining tasks

Test coverage, reviews, etc

User interface changes

API changes

Data model changes

Feature request
Status

Needs work

Version

1.0

Component

Code

Created by

🇦🇺Australia mstrelan

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024