We have several nodes with HTML code samples in them, using <
and >
for the tags so that the HTML tags are displayed as a code sample and not rendered. For example:
<pre class="code">
<div data-role="page">
<div data-role="header" data-position="fixed">
<h1>Stairs game</h1>
</div>
<div data-role="content">
<canvas id="c" >< /canvas>
<audio id="soundEfx" src="gameover.mp3" style="display: none;"></audio >
<audio id="game_id" src="Game.mp3" style="display: none;">< /audio>
<audio id="jump_id" src="jumping.mp3" style="display: none;"></audio >
</div>
<div data-role="footer" data-position="fixed">
</div>
</pre>
When the document is uploaded to Lingotek, and downloaded through the API, this code sample remains intact and unchanged; I added logging in various places in the Lingotek module to confirm this. The problem happens right when the value is saved to the database: in lingotek_process_entity_xml()
, decode_entities()
is called on the text before passing it to lingotek_unfilter_placeholders()
. This converts >
and <
back to > and < and saves those to the DB, like this:
<pre class="code">
<div data-role="page">
<div data-role="header" data-position="fixed">
<h1>Stairs game</h1>
</div>
<div data-role="content">
<canvas id="c" >< /canvas>
<audio id="soundEfx" src="gameover.mp3" style="display: none;"></audio >
<audio id="game_id" src="Game.mp3" style="display: none;">< /audio>
<audio id="jump_id" src="jumping.mp3" style="display: none;"></audio >
</div>
<div data-role="footer" data-position="fixed">
</div>
</pre>
This becomes a problem when viewing the node. When viewing the original node, those entities are displayed as < and > by the browser, so the HTML code sample is displayed as desired. But, when viewing the translation of the node, these HTML tags are rendered, which makes the node look pretty funky and defeats the purpose.
I've tracked that decode_entities()
call back to tbe origin of lingotek.api.inc, in a Sept 2011 code restructure. I wonder if anyone even remembers at this point: does decode_entities()
serve a purpose here? I removed it locally and didn't see any problems, but I haven't tested it thoroughly.
It's worth noting that we have pre tags set to ignore in the secondary configuration under Advanced Content Parsing; I'm not sure if this is relevant, but it sure seems like it could be.
pre:
ruleTypes: [EXCLUDE]
idAttributes: [id]