xPath query does not work if source XML has a default namespace

Created on 29 December 2017, over 6 years ago
Updated 6 June 2024, 12 days ago

If the XML source contains a DEFAULT namespace, e.g:


<?xml version="1.0" encoding="UTF-8"?>
<root xmlns="http://www.a-domain.com">
  <contenitor>
    <article-node>
      <article-node__title>Article title</article-node__title>
      <article-node__body>Article body</article-node__body>
    </article-node>
  </contenitor>
  <another-different-contenitor>
    <article-node>
      <article-node__title>Article 2 title</article-node__title>
      <article-node__body>Article 2 body</article-node__body>
    </article-node>
   <article-node>
      <article-node__title>Article 3 title</article-node__title>
      <article-node__body>Article 3 body</article-node__body>
    </article-node>
  </another-different-contenitor>
</root>

The item_selector: or selector: xPath query does not work (because the default namespace needs to be registered before the xPath query, e.g. using $xpath->registerNamespace('root', 'http://www.a-domain.com');)

Request
1) a code improvement to automatically manage such case

or

2) A way to declare/register a proper default namespace within the yml (e.g. default_namespace: <value>)

Actual workarounds:
1) Manually remove the default namespace from the XML source (programmatically this is not simple to do because namespaces are not "normal" attributes)

2) If previous workaround is not feasible you can change xPath queries in the yml as described in https://www.palantir.net/blog/migrating-xml-drupal-8

example (based on XML above):
Change from this:

...
source:
  plugin: url
  data_fetcher_plugin: file
  # simple_xml used here instead of xml because it supports xpath better:
  data_parser_plugin: simple_xml
  urls: ./modules/custom/your-module/data/source.xml
  item_selector: //article-node
  fields:
    -
      name: article_title
      label: 'Title'
      selector: article-node__title
...

To this:

...
source:
  plugin: url
  data_fetcher_plugin: file
  # simple_xml used here instead of xml because it supports xpath better:
  data_parser_plugin: simple_xml
  urls: ./modules/custom/your-module/data/source.xml
  item_selector: '//*[local-name()="article-node"]'
  fields:
    -
      name: article_title
      label: 'Title'
      selector: '*[local-name()="article-node__title"]'
...
✨ Feature request
Status

Needs review

Version

6.0

Component

Miscellaneous

Created by

🇮🇹Italy MXT Milan

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.69.0 2024