HTML entities not importing in XML file

Created on 21 August 2024, 3 months ago
Updated 1 September 2024, 3 months ago

hi,

I exported nodes from Drupal 7 to XML files.
I import them to Drupal 10.

The issue is: HTML entities are not converted correctly.

So this in the XML:
<p> text

... is imported as:
p text

... and it should be
<p> text

The weird thing is that some XML files work to import, and some do not work with this issue. I import them as "full html without ckeditor", and no filters are enabled.

Please urgently help with this issue, I am willing to donate to your cause,
Thanks!

πŸ› Bug report
Status

Closed: works as designed

Version

1.0

Component

Code

Created by

πŸ‡§πŸ‡ͺBelgium Ananda

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Ananda
  • πŸ‡ΊπŸ‡¦Ukraine andriy khomych

    Hey ananda, well, I don't think it should be converted, this module covers specific field configurations in the importer.
    It sounds more like a misconfiguration, or missed plugin, I think you should have in the export correctly decoded values.
    Example of working XML source:

    <description>
    <![CDATA[ <p>
    

    However, if you need to support your case, I think we can manage the custom feeds plugin (your project plugin), you can try more plugins from https://www.drupal.org/project/feeds_ex β†’ or modify data using feeds events:

     your_module.importer.event_subscriber:
        class: Drupal\your_module\EventSubscriber\ImporterEventSubscriber
        tags:
          - { name: event_subscriber }
    
      /**
       * Event triggered during the parsing of the feed.
       *
       * This is called after actual parsing is done. See the priority in
       * getSubscribedEvents().
       *
       * @param \Drupal\feeds\Event\ParseEvent $event
       *   The parse feed event.
       */
      public function onParse(ParseEvent $event): void {
    foreach ($event->getParserResult() as $item) {
          /** @var \Drupal\feeds\Feeds\Item\DynamicItem $item */
          $guids[] = $item->get('xpath_feeds_item_guid');
    
          // Decode all HTML entities in all the text to avoid problems with data
          // that is ran through check_plain() when being displayed.
          $item_array = $item->toArray();
          foreach ($item_array as &$result) {
            if (is_array($result)) {
              foreach ($result as &$value) {
                $value = html_entity_decode($value, ENT_QUOTES, 'UTF-8');
                // Remove Unicode control characters and the Unicode Private Use
                // Area which is undefined.
                $value = preg_replace('/[\x{007F}-\x{009F}\x{E000}-\x{F8FF}]/u', '', $value);
              }
            }
            elseif (is_string($result)) {
              $result = html_entity_decode($result, ENT_QUOTES, 'UTF-8');
              // Remove Unicode control characters and the Unicode Private Use Area
              // which is undefined.
              $result = preg_replace('/[\x{007F}-\x{009F}\x{E000}-\x{F8FF}]/u', '', $result);
            }
          }
          $item->fromArray($item_array);
        }
    

    Also, this module https://www.drupal.org/project/feeds_tamper β†’ provides a way to modify data and you can try to decode it first.

    Could you ping me in Slack in Drupal?

  • πŸ‡ΊπŸ‡¦Ukraine andriy khomych

    My slack nickname - @andriy.khomych
    Slack channel - drupal.slack.com

  • πŸ‡§πŸ‡ͺBelgium Ananda

    hi,

    Thanks, however I can't join the Drupal slack: You can use any account with the domain:
    association.drupal.org
    socialenterprises.net

    I already tried before the tamper module, and it did not work.

    The bizarre thing is: half of the xml's work to import with converted html entities, and the other half gives this weird issue.

    Can I give you access so you can check it on this sandbox-dev site ?

  • Status changed to Postponed: needs info 3 months ago
  • πŸ‡ΊπŸ‡¦Ukraine andriy khomych

    Hi, you must use any method from my comment #2.
    It is not this module issue.

  • Status changed to Closed: works as designed 3 months ago
Production build 0.71.5 2024