Question
· Apr 9, 2019

Escape non standard characters to windows 1252 character set

I have an xml that has non standard characters and I would like to transform it with xslt  so that those characters render in the format  &#nnn here is what I have so far any help appreciated 

xslt

 

 

<!--?xml version="1.0"?-->
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="Windows-1252" indent="yes" omit-xml-declaration="yes" method="xml">
>  <xsl:template match="/Recordset">
    <recordset><xsl:apply-templates></xsl:apply-templates></recordset>
  </xsl:template>
  <xsl:template match="*|@*">
    <xsl:copy>
      <xsl:apply-templates select="@*">
      <xsl:apply-templates>
    </xsl:apply-templates></xsl:apply-templates></xsl:copy>
  </xsl:template>
</xsl:output></xsl:transform>
 

XML

 


 
128 € euro sign
129    NOT USED
130 ‚ single low-9 quotation mark
131 ƒ Latin small letter f with hook
132 „ double low-9 quotation mark
133 … horizontal ellipsis
134 † dagger
135 ‡ double dagger
136 ˆ modifier letter circumflex accent
137 ‰ per mille sign
138 Š Latin capital letter S with caron
139 ‹ single left-pointing angle quotation mark
140 Œ Latin capital ligature OE
141    NOT USED
142 Ž Latin capital letter Z with caron
143    NOT USED
144    NOT USED
145 ‘ left single quotation mark
146 ’ right single quotation mark
147 “ left double quotation mark
148 ” right double quotation mark
149 • bullet
150 – en dash
151 — em dash
152 ˜ small tilde
153 ™ trade mark sign
154 š Latin small letter s with caron
155 › single right-pointing angle quotation mark
156 œ Latin small ligature oe
157    NOT USED
158 ž Latin small letter z with caron
159 Ÿ Latin capital letter Y with diaeresis
160 no-break space 
Discussion (6)2
Log in or sign up to continue

So the input is Windows-1252, and the output is Windows-1252 in which certain characters are mapped to their numerical escape sequence? You could do this with XSLT 2.0 using character maps.

Given this input (presented here as UTF-8 for visibility on the forum):

<?xml version="1.0"?>
<Recordset>
• coffee €5,• tea €4
</Recordset>

This stylesheet will escape the bullets and euro signs:

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:character-map name="a">
    <xsl:output-character character="€" string="&amp;#128;"/>
    <xsl:output-character character="•" string="&amp;#149;"/>
  </xsl:character-map>
  <xsl:output encoding="Windows-1252" indent="yes" use-character-maps="a"/>
  <xsl:template match="/">
    <Recordset>
      <xsl:value-of select="/Recordset"/>
    </Recordset>
  </xsl:template>
</xsl:stylesheet>

Output:

<?xml version="1.0" encoding="Windows-1252"?>
<Recordset>
&#149; coffee &#128;5,&#149; tea &#128;4
</Recordset>

@Eduard Lebedyuk  No it is not  a known node what I am trying to do is pull those characters and  pass them to  $TRANSLATE($SYSTEM.Encryption.Base64Encode(streamString),$C(10,13))' as part of a document I need to convert to a pdf but have tried all the encodings I could use from utf -8 to the windows 1252 and I get an error like so 

ERROR <Ens>ErrException: <ILLEGAL VALUE>zEncodeStream+18  @' set encString = $TRANSLATE($SYSTEM.Encryption.Base64Encode(streamString),$C(10,13))' any ways to get around the base 64 encoding