Question
· Apr 15, 2020

Exporting sources as XML to stream

Hello,

Recently I have been required to work with a method called ExportToStream.

The situation asks me to export a UTF-8-encoded JSON as a XML to be imported on old releases. Here's how I attempted to fulfill this request:

do $System.OBJ.ExportToStream("path/to/my/json/file.json", .stream,,,"UTF8")

The file is indeed encoded as UTF-8 and although the XML header denotes that it has been exported as UTF8:

<?xml version="1.0" encoding="UTF8"?>

The body content seems to differ:

"text": "Condição de pagamento sujeito a análise de crédito: "

I will say it again, the original file is encoded as UTF-8 and it displays correctly if seen from any editor.
Both the editor and the file utility identifies the file as UTF-8 without BOM (and that is correct).

With that said, can anyone figure what am I doing wrong? Or is that a bug?

I used Caché 2017 and IRIS 2019.3, both presented the same issue.

Discussion (10)1
Log in or sign up to continue

Hi Rubens.

Works fine for me on IRIS 2020.1 with rusw locale, see below.

Perhaps you can try to export directly to file, instead of using stream.

USER>!type ..\..\csp\user\test.json

{
"a":"русский текст"
}
USER>set stream=##class(%Stream.FileCharacter).%New()   

USER>set stream.Filename = "c:\temp\qq.xml"                                      

USER>do $System.OBJ.ExportToStream("/csp/user/test.json", .stream,,,"UTF8")      
Exporting to XML started on 04/16/2020 12:13:33
Exporting CSP/CSR or file: /csp/user/test.json
Export finished successfully.

USER>w stream.%Save()
1

USER>!type c:\temp\qq.xml

<?xml version="1.0" encoding="UTF8"?>
<Export generator="IRIS" version="26" zv="IRIS for Windows (x86-64) 2020.1 (Build 215U)" ts="2020-04-16 12:13:33">
<CSP name="test.json" application="/csp/user/" default="1"><![CDATA[
{
"a":"русский текст"
}]]></CSP>
</Export>

Hello @Alexander Koblov.

I also did the test using Export instead of ExportToStream and got the same result.

Now first thing, you must make sure that the file you used is indeed written using UTF-8.

You can check it by using the following command:

file -bi ..\..\csp\user\test.json

It should display:

charset=utf-8

Now regarding more tests I did, it seems like there's an imposed transcoding step when exporting the file. I ran several simulations with many type of combinations:

  • When the original is file written using UTF-8 and I exported using UTF8 it broke the encoding.
  • When the original is file written using UTF-8 and I exported using RAW (which is ISO-8859-1 in my case), it DID NOT broke the encoding.
  • When the original is file written using ISO-8859-1 and I exported using RAW, it DID NOT broke the encoding.
  • When the original is file written using ISO-8859-1 and I exported using UTF8 it DID NOT broke the encoding.

This is very strange.

Yes, what I meant to say is that the original file is correct. It's not us who did the double transcoding. The resulting output that I posted:

"text": "Condição de pagamento sujeito a análise de crédito: "

Is straight from the call from Export and/or ExportToStream. Which is why I said that these methods seems to impose a transcoding step.

This is weird, I shouldn't have to convert a file to RAW in order to export to UTF-8. But instead provide the same charset for both input/output so that the engine actually knows which encoding to use (but not transcode).

Unless there's is a way to effectively disable that hidden transcoding step that these method do, this make these methods really misleading.