Question
Maarten Van den... · Nov 10, 2017

SAX Parser error: invalid byte 'n' at position 2 of a 3-byte sequence while processing Anonymous Stream

Hi everyone,

I'm having trouble parsing XML containing unicode characters which I receive from an external webservice. I believe my file is saved properly with UTF-8 encoding but the SAX Parser still throws me an error.

I have 2 classmethods: 1 general one (get) to make a request to a webservice and return the date, and 1 (getSportsPerDate) to make a specific call and then parse the data.

ClassMethod getSportsPerDate(language As %String, date As %String)
{
    #dim cResponse As betradar.uof.api.descriptions.schedule.scheduleEndpoint
    #dim res As %Stream.FileCharacter

res=..get("sports/"_language_"/schedules/"_date_"/schedule.xml") 

reader=##class(%XML.Reader).%New()
reader.OpenStream(res)

reader.Correlate("schedule", "betradar.uof.api.descriptions.schedule.scheduleEndpoint")
    if reader.Next(.cResponse, .sc) {
        // Some code
else {
$SYSTEM.OBJ.DisplayError()
    }
}
ClassMethod get(request As %String) As %FileCharacterStream
{
    response=##class(%Stream.FileCharacter).%New()
    
path=^site.pref("networkPathData")_"uof\"_$zd($h, 3)
name="test.xml"
##class(%File).CreateDirectoryChain(path)

response.LinkToFile(path_"\"_name)
response.TranslateTable="UTF8"
response.BOM=$c(239, 187, 191)

https=##class(%Net.HttpRequest).%New()
https.Https=1
https.SetHeader("ContentType","text/xml")
    https.ContentCharset="utf-8"
https.SSLConfiguration="apibetradarcom"
    https.Server="api.betradar.com"
    https.SetHeader("x-access-token", ^site.pref("betradar", "uofToken"))
https.Get("v1/"_request)

    if https.HttpResponse.StatusCode=200 {
        response.CopyFrom(https.HttpResponse.Data) 
    }

response.%Save()

response.Rewind()

    //s file=##class(%File).%New(response.Filename)
    //d file.Open("R")
    //q file

    response
}

I call the above methods from terminal:

d ##class(betradar.uof.api).getSportsPerDate("en","2017-11-10")

When doing this a get error 6301: "SAX XML Parser-error: invalid byte 'n' at position 2 of a 3-byte sequence while processing Anonymous Stream at line 1 offset 16331"

However if I uncomment the 3 lines at the bottom of my get() method the data is parsed just fine. The saved file can also be opened in a browser or editor and is rendered properly, so I do think my file file is created good and that the data inside is UTF-8.

I'm currently using Caché 2015 on Windows: "Cache for Windows (x86-64) 2015.2.1 (Build 705U) Mon Aug 31 2015 16:45:59 EDT"

Does anyone have any ideas how I can solve this problem without actually reloading the stream from a file on disk? I do want to keep saving the files for logging purposes, but it's not really good practice to save a stream and then immediately open it again.

Regards,
Maarten

0
0 1,783
Discussion (5)2
Log in or sign up to continue

%Save() stream doesn't mean closing.
and Rewind() is different from %Reload().

So the intention of the 3 out-commented lines is not obvious.

Opposite to your comment %Open() does a reload from disk.

I changed to %Stream.FileBinary is you suggested and changed the response.CopyFrom() to

while 'https.HttpResponse.Data.AtEnd {
response.Write($zcvt(https.HttpResponse.Data.ReadLine(), "O", "UTF8"))
}

and now it works perfectly. Thanks a lot!

This worked for me as well. In my case I was parsing an XML file ANSI encoded without proper header. Once the stream hit a special charachter (in my case "à") it wouldn't recognize it. I presume %XML.XPATH.Document and %XML.TextReader default to UTF-8 in this case, after converting the stream I was able to parse without issues.

set cda2Stream = ##class(%Stream.FileBinary).%New()
$$$ThrowOnError(cda2Stream.LinkToFile(cda2FileName))
// Converting to UTF-8 encoding as the original stream is ANSI encoded without proper header
set convertedStream = ##class(%Stream.GlobalBinary).%New()
while 'cda2Stream.AtEnd {
	$$$ThrowOnError(convertedStream.WriteLine($zconvert(cda2Stream.ReadLine(),"O","UTF8")))
}
// Parsing the converted stream
$$$ThrowOnError(##class(%XML.TextReader).ParseStream(convertedStream,.Textreader))
$$$ThrowOnError(##class(%XML.XPATH.Document).CreateFromStream(convertedStream,.XpathDoc))
set XpathDoc.PrefixMappings = "s urn:hl7-org:v3"

As I recall Character-type streams always translate to Unicode.

You could try a Binary stream, which does not do any translation

HTH

Check that your XML declaration (encoding) corresponds to the actual stream).