SAX Parser error: invalid byte 'n' at position 2 of a 3-byte sequence while processing Anonymous Stream
Hi everyone,
I'm having trouble parsing XML containing unicode characters which I receive from an external webservice. I believe my file is saved properly with UTF-8 encoding but the SAX Parser still throws me an error.
I have 2 classmethods: 1 general one (get) to make a request to a webservice and return the date, and 1 (getSportsPerDate) to make a specific call and then parse the data.
ClassMethod getSportsPerDate(language As %String, date As %String) { #dim cResponse As betradar.uof.api.descriptions.schedule.scheduleEndpoint #dim res As %Stream.FileCharacter s res=..get("sports/"_language_"/schedules/"_date_"/schedule.xml") s reader=##class(%XML.Reader).%New() d reader.OpenStream(res) d reader.Correlate("schedule", "betradar.uof.api.descriptions.schedule.scheduleEndpoint") if reader.Next(.cResponse, .sc) { // Some code } else { d $SYSTEM.OBJ.DisplayError() } }
ClassMethod get(request As %String) As %FileCharacterStream { s response=##class(%Stream.FileCharacter).%New() s path=^site.pref("networkPathData")_"uof\"_$zd($h, 3) s name="test.xml" d ##class(%File).CreateDirectoryChain(path) d response.LinkToFile(path_"\"_name) s response.TranslateTable="UTF8" s response.BOM=$c(239, 187, 191) s https=##class(%Net.HttpRequest).%New() s https.Https=1 d https.SetHeader("ContentType","text/xml") s https.ContentCharset="utf-8" s https.SSLConfiguration="apibetradarcom" s https.Server="api.betradar.com" d https.SetHeader("x-access-token", ^site.pref("betradar", "uofToken")) d https.Get("v1/"_request) if https.HttpResponse.StatusCode=200 { d response.CopyFrom(https.HttpResponse.Data) } d response.%Save() d response.Rewind() //s file=##class(%File).%New(response.Filename) //d file.Open("R") //q file q response }
I call the above methods from terminal:
d ##class(betradar.uof.api).getSportsPerDate("en","2017-11-10")
When doing this a get error 6301: "SAX XML Parser-error: invalid byte 'n' at position 2 of a 3-byte sequence while processing Anonymous Stream at line 1 offset 16331"
However if I uncomment the 3 lines at the bottom of my get() method the data is parsed just fine. The saved file can also be opened in a browser or editor and is rendered properly, so I do think my file file is created good and that the data inside is UTF-8.
I'm currently using Caché 2015 on Windows: "Cache for Windows (x86-64) 2015.2.1 (Build 705U) Mon Aug 31 2015 16:45:59 EDT"
Does anyone have any ideas how I can solve this problem without actually reloading the stream from a file on disk? I do want to keep saving the files for logging purposes, but it's not really good practice to save a stream and then immediately open it again.
Regards,
Maarten
%Save() stream doesn't mean closing.
and Rewind() is different from %Reload().
So the intention of the 3 out-commented lines is not obvious.
Opposite to your comment %Open() does a reload from disk.
I changed to %Stream.FileBinary is you suggested and changed the response.CopyFrom() to
and now it works perfectly. Thanks a lot!
This worked for me as well. In my case I was parsing an XML file ANSI encoded without proper header. Once the stream hit a special charachter (in my case "à") it wouldn't recognize it. I presume %XML.XPATH.Document and %XML.TextReader default to UTF-8 in this case, after converting the stream I was able to parse without issues.
set cda2Stream = ##class(%Stream.FileBinary).%New() $$$ThrowOnError(cda2Stream.LinkToFile(cda2FileName)) // Converting to UTF-8 encoding as the original stream is ANSI encoded without proper header set convertedStream = ##class(%Stream.GlobalBinary).%New() while 'cda2Stream.AtEnd { $$$ThrowOnError(convertedStream.WriteLine($zconvert(cda2Stream.ReadLine(),"O","UTF8"))) } // Parsing the converted stream $$$ThrowOnError(##class(%XML.TextReader).ParseStream(convertedStream,.Textreader)) $$$ThrowOnError(##class(%XML.XPATH.Document).CreateFromStream(convertedStream,.XpathDoc)) set XpathDoc.PrefixMappings = "s urn:hl7-org:v3"
As I recall Character-type streams always translate to Unicode.
You could try a Binary stream, which does not do any translation
HTH
Check that your XML declaration (encoding) corresponds to the actual stream).