Change bulk indexing on Elasticsearch endpoint to aim at a single index

Created on 4 September 2023, about 1 year ago
Updated 25 September 2023, about 1 year ago

Problem/Motivation

We use this module to do processing of CSV's into Elasticsearch but our ES instance is behind a nginx proxy which does some logic based upon the index name in the URI. In past versions, since indexing was done document by document, this was fine. The latest version now correctly moved to the bulk endpoint, but by default, the bulk endpoint puts a list of indices in the body rather than the URI - the motivation being that you can bulk to multiple indices.

This implementation however will only currently push data into a single index via it's pipeline config.

The bulk endpoint does allow specification of a single index in the URI, which is a more correct way of using bulk if you only intend to use one index for your bulk import rather than many.

Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bul...

Steps to reproduce

There isn't a reliable method to reproduce this specific use case, more a correction of the implementation for it to be true to it's single index requirements.

Proposed resolution

I propose moving the specified index out of the body and into the URI, by moving it out of the body array in the DatasetDestination implementing class and adding it to the params array.

Remaining tasks

Check to see if tests are required for this extension?

User interface changes

N/A

API changes

N/A

Data model changes

N/A

πŸ› Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

πŸ‡¦πŸ‡ΊAustralia matthew.hallsworth.dpc

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.71.5 2024