creating a UTF8 encoded files
Hello,
When we need to create a utf-8 encoded XML file, we use the Charset property of %XML.Writer:
set writer=##class(%XML.Writer).%New()
set writer.Charset="UTF-8"
How can we create regular txt files with such encoding?
Our Cache Installations are 8 bit and not unicode.
Thanks,
Nael Naser eldeen
Have you tried the %Stream.FileCharacter class? For example, the following should create a file that contains "Hello Lambda!" but with the unicode character for lambda:
Though I am not sure how you can create unicode characters in an 8 bit cache installation.
Thanks a lot for your quick answer Alok!
I tried it, it didn't work -
I just got a file containing-
"Hello !"
And the file is ANSI encoded.
I guess Its because we have special translation tables, and I don't know the details of that configuration.
I just thought- if its that easy to do with XML files, why not with regular files?
Regards,
Nael
To force to create UTF-8 even if you write just ASCII characters, you need to set BOM property of stream object:
BOM is abbreviation of Byte Order Mark and for UTF-8 it is the byte sequence
0xEF,0xBB,0xBF
https://en.wikipedia.org/wiki/Byte_order_mark
Thank you Tomas,
to clarify-
and
Do stream.TranslateTableSet("UTF8")
are not equivalent , correct?
but TranslateTableSet does set the BOM ?
Thanks,
Nael
Setting of TranslateTable to UTF8 just makes a translation of characters to utf-8, it doesn't set BOM. So you need to set both.
Hi Tomas,
thanks,
I checked , and I see that TranslateTable seems to set BOM ,
Because when I run the following two examples,
the files I got where identical
method 1:
Set stream=##class(%Stream.FileCharacter).%New()
Set sc=stream.LinkToFile("c:\temp\UTF8ExampleWithBom.txt")
Set stream.BOM=$C(239,187,191)
Do stream.TranslateTableSet("UTF8")
Do stream.Write("Hello שלום")
w stream.%Save()
method 2:
Set stream=##class(%Stream.FileCharacter).%New()
Set sc=stream.LinkToFile("c:\temp\UTF8ExampleNoBom.txt")
Do stream.TranslateTableSet("UTF8")
Do stream.Write("Hello שלום")
w stream.%Save()
The files looks same if you open it e.g. in Windows Notepad. This is because Notepad recognize the file as UTF-8 even if the BOM is missing but file contains some Unicode characters (>255). If you write pure ASCII ("Hello"), it will be open as ANSI file.
But if you open it in any hexadecimal editor you would see the second file is missing BOM.
EDIT: I found it sets the BOM property only if you set it AFTER setting of translation table:
Set stream=##class(%Stream.FileCharacter).%New() Set sc=stream.LinkToFile("c:\temp\UTF8ExampleWithBom.txt") Set stream.TranslateTable="UTF8" Set stream.BOM=$C(239,187,191) Do stream.Write("Hello") w stream.%Save()
This is because setting of TranslateTable reset BOM to empty string again.
This worked great for what I needed. The only change I needed was to use WriteLine vs. Write as I had to output a number of lines. Thanks!!
Hi,
It did work after all..
When I changed the command
Do stream.Write("Hello " _ $CHAR(955) _ "!")
to a command containing English and Hebrew letters, I got a UTF8 encoded file.
Thanks for your help!
Regards,
Nael
Glad I was able to help... I think $CHAR with argument > 255 only works on unicode instances. Were you able to paste Hebrew letters directly into the string in Cache studio? I don't fully understand how 8 bit instances work.