Issue with encoding (special character) - Business service

Primary tabs

Hello devs,

I'm facing an issue with one of my business services, which basically grabs a XML from a webservice (which in turn reads the data from the caché database) and does some processing afterwards. The XML content (which is formed of some of the table fields values) contains a special character: (left single quote)

Here is the code in charge of reading/processing the XML and its contents (essentialy the line $$$TRACE("Status Error "_$$$StatusDisplayString(sc))  is throwing an exception: Status Error ERROR #6301: SAX XML Parser Error: invalid character 0x18 while processing Anonymous Stream at line 1 offset 4798):

    Set xread = ##class(%XML.Reader).%New()
    Set xread.IgnoreNull=1

    pInput.Rewind()
    
    sc= xread.OpenStream(pInput)
    
    set save=""
    if sc {       
      for i=1:1:$l(..TargetClass,",") {
      s class=$p(..TargetClass,",",i)
      s node=$p(..ClassNode,",",i)
      Do xread.Correlate(node,class)
}
    
    while xread.Next(.obj, .sc) {     
    if $$$ISERR(sc) quit         
    set sc=..SendObject(obj)
    if $$$ISERR(sc) quit
    // store latest timestamp
    if $l(..SaveProperty)>0,$zobjproperty(obj,..SaveProperty)>save set save=$zobjproperty(obj,..SaveProperty)
    }
    }    
    // Save the timestamp up to the point of completion (or failure)
    if $l(save)>0 ##class(RMHCommon.Setting).Set(..SaveSetting,save)

    // error code during next or send request
    if $$$ISERR(sc) {
    $$$TRACE("Status Error "_$$$StatusDisplayString(sc))
    quit sc
    }

I am afraid this is something with caché/Healthshare character encoding, but thing is, my HS uses Unicode as the default encoding. By looking at this character definition within Unicode, it leads me to something weird:

http://www.fileformat.info/info/unicode/char/18/index.htm

Cancel character?

I'm a bit confused to be honest.

Can someone give me a helping hand?

Thanks in advance!

Replies

My guess is that your pInput stream contains XML encoded as UnicodeLittle characters.

The left single quote character is unicode codepoint 8216 decimal which is 2018 hex. In UnicodeLittle this gets transposed and the 0x18 comes first, followed by the 0x20.

Hey John, thanks for you response!

I wrote some debug piece of code to write into a xml file the pInput content and that's what I get:

<?xml version="1.0" encoding="UTF-8" ?> ... <Value><![CDATA[ ... <P>SORRY TO SAY THAT THIS CHARACTER WILL MESS THIS UP  ?</P> ... ]]></Value>

Where you see the question mark, it is indeed the place the weird character is supposed to be.

By opening the file with Notepad++ and checking the 'Encoding' menu, it shows 'Encode with UTF-8 without BOM'.

Any clues?

Thanks

I think you need to focus on the input stream. What type of stream is pInput ? You can get its classname using pInput.%ClassName(1)

If it is a file stream, what wrote it? Does its file contain a BOM at the start?

You might need to open a support case with WRC. I don't work for InterSystems.