Exporting sources as XML to stream
Hello,
Recently I have been required to work with a method called ExportToStream.
The situation asks me to export a UTF-8-encoded JSON as a XML to be imported on old releases. Here's how I attempted to fulfill this request:
do $System.OBJ.ExportToStream("path/to/my/json/file.json", .stream,,,"UTF8")
The file is indeed encoded as UTF-8 and although the XML header denotes that it has been exported as UTF8:
<?xml version="1.0" encoding="UTF8"?>
The body content seems to differ:
"text": "Condição de pagamento sujeito a análise de crédito: "
I will say it again, the original file is encoded as UTF-8 and it displays correctly if seen from any editor.
Both the editor and the file
utility identifies the file as UTF-8 without BOM (and that is correct).
With that said, can anyone figure what am I doing wrong? Or is that a bug?
I used Caché 2017 and IRIS 2019.3, both presented the same issue.
Hi Rubens.
Works fine for me on IRIS 2020.1 with rusw locale, see below.
Perhaps you can try to export directly to file, instead of using stream.
Hello @Alexander Koblov.
I also did the test using Export instead of ExportToStream and got the same result.
Now first thing, you must make sure that the file you used is indeed written using UTF-8.
You can check it by using the following command:
It should display:
Now regarding more tests I did, it seems like there's an imposed transcoding step when exporting the file. I ran several simulations with many type of combinations:
This is very strange.
@Rubens Silva
That sounds to me like double encoding.
I'd suggest using a HEX Editor (e.g. PSpad) to examine your files.
UTF-8 means that some characters have more than 8 bit.
By converting an already converted string you may get those strange effects.
And you found the way to avoid this already yourself.
@Robert.Cemper
Certainly it's not a hand-made double encoding.
Because we also made sure of that by writing a new file for both charsets to simulate the issue.
see this example to reproduce and explain that there is an unnecessary conversion on the way
as you showed in your example
"text": "Condição de pagamento sujeito a análise de crédito: "
Yes, what I meant to say is that the original file is correct. It's not us who did the double transcoding. The resulting output that I posted:
Is straight from the call from Export and/or ExportToStream. Which is why I said that these methods seems to impose a transcoding step.
This is weird, I shouldn't have to convert a file to RAW in order to export to UTF-8. But instead provide the same charset for both input/output so that the engine actually knows which encoding to use (but not transcode).
Unless there's is a way to effectively disable that hidden transcoding step that these method do, this make these methods really misleading.
So I'd suggest involving WRC to check the sources where the double translation comes from.
(probably since ever)
Yes, I think so too.
just an idea to understand:
what do you see if your .stream is a %Stream.GlobalBinary ?
Yes, but if you provide a pre-object like let's say: %Stream.FileCharacter, it outputs to it as well.
I also tried setting the TranslateTable to UTF-8 and used the OutputToDevice method to see the result, but that brought me the same result.
Social networks
InterSystems resources
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue