Unsupported Unicode character in XML

Created on 4 January 2022, about 3 years ago
Updated 15 January 2025, 6 days ago

Respect the submission guidelines above! Drupal.org issue forks cause additional work for the project maintainer!

Setup

  • Solr version: 8.11.0
  • Drupal Core version: 9.2.10
  • Search API version: 8.x-1.21
  • Search API Solr version: 4.2.2
  • Configured Solr Connector:

Issue

I had an issue when trying to index entities with Solr on Drupal 7 and Drupal 9.
I was indexing Drupal node with FFFE symbol in title.
In the result index query failed with message.

org.apache.solr.common.SolrException: Invalid UTF-8 character 0xfffe at char #148122,​ byte #147231)

This happens because 0xfffe is illegal symbol for XML, and we need to remove it from XML as well as other symbols.

Any Unicode character is allowed, excluding the surrogate blocks, FFFE, and FFFF (not even as character reference).
https://www.w3.org/TR/xml/#charsets

It looks like very old issue, and this is also happens in Apache Solr module.
I can confirm that I have same issue on D7.

In Drupal 9
filterControlCharacters was moved to solarium library. And I created issue on solarium repository.

Possible solution:

Add option to filter to remove unsupported UTF8 symbols.

Similar issues:

🐛 Bug report
Status

Closed: won't fix

Version

1.0

Component

Code

Created by

🇷🇺Russia zniki.ru

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024