Question
· Nov 21, 2016

creating a UTF8 encoded files

Hello,

When we need to create a utf-8 encoded XML file, we  use the Charset property of %XML.Writer:

set writer=##class(%XML.Writer).%New()
set writer.Charset="UTF-8"

 

How can we create regular txt files with such encoding?

Our Cache Installations are 8 bit and not unicode.

 

Thanks,

Nael Naser eldeen

Discussion (10)0
Log in or sign up to continue

Have you tried the %Stream.FileCharacter class? For example, the following should create a  file that contains "Hello Lambda!" but with the unicode character for lambda:

ClassMethod WriteUTF8()
{
    Set stream=##class(%Stream.FileCharacter).%New()
    Set sc=stream.LinkToFile("c:\tmp\lambda.txt")
    Do stream.TranslateTableSet("UTF8")
    Do stream.Write("Hello " _ $CHAR(955) _ "!")
    Do stream.%Save()
}

Though I am not sure how you can create unicode characters in an 8 bit cache installation.

Hi Tomas, 

thanks,

I checked , and I see that TranslateTable seems to set BOM ,

Because when I run the following two examples,

the files I got where identical

method 1:

Set stream=##class(%Stream.FileCharacter).%New()
Set sc=stream.LinkToFile("c:\temp\UTF8ExampleWithBom.txt")
Set stream.BOM=$C(239,187,191)
Do stream.TranslateTableSet("UTF8")
Do stream.Write("Hello שלום")
w stream.%Save()

 

method 2:

Set stream=##class(%Stream.FileCharacter).%New()
Set sc=stream.LinkToFile("c:\temp\UTF8ExampleNoBom.txt")
Do stream.TranslateTableSet("UTF8")
Do stream.Write("Hello שלום")
w stream.%Save()

The files looks same if you open it e.g. in Windows Notepad. This is because Notepad recognize the file as UTF-8 even if the BOM is missing but file contains some Unicode characters (>255). If you write pure ASCII ("Hello"), it will be open as ANSI file.

But if you open it in any hexadecimal editor you would see the second file is missing BOM.

EDIT: I found it sets the BOM property only if you set it AFTER setting of translation table:

 Set stream=##class(%Stream.FileCharacter).%New()
 Set sc=stream.LinkToFile("c:\temp\UTF8ExampleWithBom.txt")
 Set stream.TranslateTable="UTF8"
 Set stream.BOM=$C(239,187,191)
 Do stream.Write("Hello")
 w stream.%Save()

This is because setting of TranslateTable reset BOM to empty string again.