Question Maarten Van den Vreken · Nov 10, 2017

SAX Parser error: invalid byte 'n' at position 2 of a 3-byte sequence while processing Anonymous Stream

Hi everyone,

I'm having trouble parsing XML containing unicode characters which I receive from an external webservice. I believe my file is saved properly with UTF-8 encoding but the SAX Parser still throws me an error.

I have 2 classmethods: 1 general one (get) to make a request to a webservice and return the date, and 1 (getSportsPerDate) to make a specific call and then parse the data.

ClassMethod getSportsPerDate(language As %String, date As %String){
    #dim cResponse As betradar.uof.api.descriptions.schedule.scheduleEndpoint    #dim res As %Stream.FileCharacterres=..get("sports/"_language_"/schedules/"_date_"/schedule.xml") reader=##class(%XML.Reader).%New()reader.OpenStream(res)reader.Correlate("schedule", "betradar.uof.api.descriptions.schedule.scheduleEndpoint")    if reader.Next(.cResponse, .sc) {        // Some codeelse {$SYSTEM.OBJ.DisplayError()    }}
ClassMethod get(request As %String) As %FileCharacterStream{response=##class(%Stream.FileCharacter).%New()path=^site.pref("networkPathData")_"uof\"_$zd($h, 3)
name="test.xml"##class(%File).CreateDirectoryChain(path)response.LinkToFile(path_"\"_name)response.TranslateTable="UTF8"response.BOM=$c(239, 187, 191)https=##class(%Net.HttpRequest).%New()https.Https=1https.SetHeader("ContentType","text/xml")https.ContentCharset="utf-8"https.SSLConfiguration="apibetradarcom"
    https.Server="api.betradar.com"
    https.SetHeader("x-access-token", ^site.pref("betradar", "uofToken"))https.Get("v1/"_request)    if https.HttpResponse.StatusCode=200 {response.CopyFrom(https.HttpResponse.Data)     }response.%Save()response.Rewind()//s file=##class(%File).%New(response.Filename)    //d file.Open("R")    //q fileresponse}

I call the above methods from terminal:

d ##class(betradar.uof.api).getSportsPerDate("en","2017-11-10")

When doing this a get error 6301: "SAX XML Parser-error: invalid byte 'n' at position 2 of a 3-byte sequence while processing Anonymous Stream at line 1 offset 16331"

However if I uncomment the 3 lines at the bottom of my get() method the data is parsed just fine. The saved file can also be opened in a browser or editor and is rendered properly, so I do think my file file is created good and that the data inside is UTF-8.

I'm currently using Caché 2015 on Windows: "Cache for Windows (x86-64) 2015.2.1 (Build 705U) Mon Aug 31 2015 16:45:59 EDT"

Does anyone have any ideas how I can solve this problem without actually reloading the stream from a file on disk? I do want to keep saving the files for logging purposes, but it's not really good practice to save a stream and then immediately open it again.

Regards,
Maarten

Comments

Robert Cemper · Nov 10, 2017

%Save() stream doesn't mean closing.
and Rewind() is different from %Reload().

So the intention of the 3 out-commented lines is not obvious.

Opposite to your comment %Open() does a reload from disk.

0
Maarten Van den Vreken  Nov 11, 2017 to Herman Slagman

I changed to %Stream.FileBinary is you suggested and changed the response.CopyFrom() to

while 'https.HttpResponse.Data.AtEnd {response.Write($zcvt(https.HttpResponse.Data.ReadLine(), "O", "UTF8"))}

and now it works perfectly. Thanks a lot!

0
Michael Angeleri  Oct 13, 2022 to Maarten Van den Vreken

This worked for me as well. In my case I was parsing an XML file ANSI encoded without proper header. Once the stream hit a special charachter (in my case "à") it wouldn't recognize it. I presume %XML.XPATH.Document and %XML.TextReader default to UTF-8 in this case, after converting the stream I was able to parse without issues.

set cda2Stream = ##class(%Stream.FileBinary).%New()
$$$ThrowOnError(cda2Stream.LinkToFile(cda2FileName))
// Converting to UTF-8 encoding as the original stream is ANSI encoded without proper headerset convertedStream = ##class(%Stream.GlobalBinary).%New()
while 'cda2Stream.AtEnd {
	$$$ThrowOnError(convertedStream.WriteLine($zconvert(cda2Stream.ReadLine(),"O","UTF8")))
}
// Parsing the converted stream$$$ThrowOnError(##class(%XML.TextReader).ParseStream(convertedStream,.Textreader))
$$$ThrowOnError(##class(%XML.XPATH.Document).CreateFromStream(convertedStream,.XpathDoc))
set XpathDoc.PrefixMappings = "s urn:hl7-org:v3"
0
Herman Slagman · Nov 10, 2017

As I recall Character-type streams always translate to Unicode.

You could try a Binary stream, which does not do any translation

HTH

0
Eduard Lebedyuk · Nov 10, 2017

Check that your XML declaration (encoding) corresponds to the actual stream).

0