Problem with tables without tbody and & in content

Created on 28 November 2018, over 6 years ago
Updated 18 April 2024, 12 months ago

I have a problem getting data from the table that doesn't contain tbody tag and some of the values contain a & symbol (e.g.  ). Then I get no results.

<!--break-->

Here is simple example of my problem:

<table>
  <tbody>
  <tr>
    <td>0</td>
    <td>1</td>
    <td>2</td>
  </tr>
  <tr>
    <td>&nbsp;3</td>
    <td>4</td>
    <td>5</td>
  </tr>
  </tbody>
</table>

For example command //tr[2]/td => result is 3,4,5.

But without tbody (and all my sources are tables without tbody tag)

<table>
  <tr>
    <td>0</td>
    <td>1</td>
    <td>2</td>
  </tr>
  <tr>
    <td>&nbsp;3</td>
    <td>4</td>
    <td>5</td>
  </tr>
</table>

Same command //tr[2]/td => no result
I get the result only if I delete & symbol (in this case without tbody). It is weird combination of two problems.

My request is: Is it possible to add some measure into this module, that will add tbody for all tables before processing and then execute the xpath command?

Feature request
Status

Closed: outdated

Version

1.0

Component

HTML parser

Created by

🇨🇿Czech Republic tommer

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇯🇵Japan ptmkenny

    I'm closing this issue because there have been no updates in five years. If you have a similar issue, please open a new issue.

Production build 0.71.5 2024