admin管理员组

文章数量:1430093

I have an HTML document that might have &lt; and &gt; in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that < is not valid inside of an attribute.

I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML, the DOM is unencoding the attributes. Strangely, it does this for &lt; and &gt;, but not some others like &amp;.

Here is a simple example:

var div = document.createElement('DIV');
div.innerHTML = '<div asdf="&lt;50" fdsa="&amp;50"></div>';
console.log(div.innerHTML)

I have an HTML document that might have &lt; and &gt; in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that < is not valid inside of an attribute.

I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML, the DOM is unencoding the attributes. Strangely, it does this for &lt; and &gt;, but not some others like &amp;.

Here is a simple example:

var div = document.createElement('DIV');
div.innerHTML = '<div asdf="&lt;50" fdsa="&amp;50"></div>';
console.log(div.innerHTML)

I'm assuming that the DOM implementation decided that HTML attributes can be less strict than XML attributes, and that this is "working as intended". My question is, can I work around this without writing some horrible regex replacement?

Share Improve this question asked Oct 6, 2015 at 15:31 murrayjumurrayju 1,80218 silver badges21 bronze badges 4
  • @Abel I am using jQuery's .html(), I just attempted to reduce down to where I think the "problem" is occurring. The source document is XML, which I run through a browser XSLT before inserting with .html(). Later I take it through the inverse process to get the XML back out. I just find it strange that the DOM is unescaping this character (and not others). – murrayju Commented Oct 6, 2015 at 16:13
  • I can't modify the source XML, and need to preserve the same content in the output at the end. I could run whatever transforms are necessary in the middle, but am looking for a way to do it better than some regex replace. Especially considering the character is <, which the document is full of. – murrayju Commented Oct 6, 2015 at 16:16
  • @Abel my only goal is to get it back out of the DOM the same way it went in (as &lt;). I'm putting it in with .text(string) and getting it out with .text(). The problem I have with this round-trip is that the input doesn't equal the output (only in this case). – murrayju Commented Oct 6, 2015 at 16:40
  • Ah, sorry. Well, that is probably only possible with other DOM methods, not with innerHTML. I.e., this works: div.firstChild.attributes['title']. But this requires a whole lot extra machinery to "mimic" innerHTML. – Abel Commented Oct 6, 2015 at 16:45
Add a ment  | 

4 Answers 4

Reset to default 2

Try XMLSerializer:

var div = document.getElementById('d1');

var pre = document.createElement('pre');
pre.textContent = div.outerHTML;
document.body.appendChild(pre);

pre = document.createElement('pre');
pre.textContent = new XMLSerializer().serializeToString(div);
document.body.appendChild(pre);
<div id="d1" data-foo="a &lt; b &amp;&amp; b &gt; c">This is a test</div>

You might need to adapt the XSLT to take account of the XHTML namespace XMLSerializer inserts (at least here in a test with Firefox).

I am not sure if this is what you are looking but do have a look.

var div1 = document.createElement('DIV');
var div2  = document.createElement('DIV');
div1.setAttribute('asdf','&lt;50');
div1.setAttribute('fdsa','&amp;50');
div2.appendChild(div1);
console.log(div2.innerHTML.replace(/&amp;/g, '&'));

What ended up working best for me was to double-escape these using an XSLT on the ining document (and reverse this on the outgoing doc).

So &lt; in an attribute bees &amp;lt;. Thanks to @Abel for the suggestion.

Here is the XSLT I added, in case others find it helpful:

First is a template for doing string replacements in XSLT 1.0. If you can use XSLT 2.0, you can use the built in replace instead.

<xsl:template name="string-replace-all">
    <xsl:param name="text"/>
    <xsl:param name="replace"/>
    <xsl:param name="by"/>
    <xsl:choose>
        <xsl:when test="contains($text, $replace)">
            <xsl:value-of select="substring-before($text,$replace)"/>
            <xsl:value-of select="$by"/>
            <xsl:call-template name="string-replace-all">
                <xsl:with-param name="text" select="substring-after($text,$replace)"/>
                <xsl:with-param name="replace" select="$replace"/>
                <xsl:with-param name="by" select="$by"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

Next are the template that does the specific replacements that I need:

<!-- xml -> html -->
<xsl:template name="replace-html-codes">
    <xsl:param name="text"/>
    <xsl:variable name="lt">
        <xsl:call-template name="string-replace-all">
            <xsl:with-param name="text" select="$text"/>
            <xsl:with-param name="replace" select="'&lt;'"/>
            <xsl:with-param name="by" select="'&amp;lt;'"/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="gt">
        <xsl:call-template name="string-replace-all">
            <xsl:with-param name="text" select="$lt"/>
            <xsl:with-param name="replace" select="'&gt;'"/>
            <xsl:with-param name="by" select="'&amp;gt;'"/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:value-of select="$gt"/>
</xsl:template>

<!-- html -> xml -->
<xsl:template name="restore-html-codes">
    <xsl:param name="text"/>
    <xsl:variable name="lt">
        <xsl:call-template name="string-replace-all">
            <xsl:with-param name="text" select="$text"/>
            <xsl:with-param name="replace" select="'&amp;lt;'"/>
            <xsl:with-param name="by" select="'&lt;'"/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="gt">
        <xsl:call-template name="string-replace-all">
            <xsl:with-param name="text" select="$lt"/>
            <xsl:with-param name="replace" select="'&amp;gt;'"/>
            <xsl:with-param name="by" select="'&gt;'"/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:value-of select="$gt"/>
</xsl:template>

The XSLT is mostly a pass-through. I just call the appropriate template when copying attributes:

<xsl:template match="@*">
    <xsl:attribute name="data-{local-name()}">
        <xsl:call-template name="replace-html-codes">
            <xsl:with-param name="text" select="."/>
        </xsl:call-template>
    </xsl:attribute>
</xsl:template>

<!-- copy all nodes -->
<xsl:template match="node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

Several things worth mentioning that might help someone:

  • Make sure that your HTML is truly valid, e.g. I was accidentally using \ when I should have had / and it caused this problem.
  • As the OP pointed out in the question, you can use &amp;, so you might try e.g. &amp;lt; and &amp;gt;.
  • There are alternatives to < and > that look similar.
  • There is an alternate way to express < and >: &#60; and &#62;.

本文标签: javascriptinnerHTML unencodes amplt in attributesStack Overflow