Converting ISO-8859-1 input document
We have a Unicode installation of Cache'. A client wants to send us documents that will be machine-read and loaded, automatically. They want to create the documents in ISO-8859-1 ("Latin-1"). We'd need to convert the text to UTF8 for our system. I saw the documentation on the $ZCONVERT function, but I didn't see this option. How should it be done?
Thanks!
Product version: Caché 2018.1
$ZV: Cache for Windows (x86-64) 2018.1.4 (Build 505_1_20258U) Thu Sep 10 2020 10:22:22 EDT
In namespace %SYS you have a utility NLS that shows your installed conversion table and its short names.
%SYS>d ^NLS
2) Select defaults
2) I/O tables
Items marked with (*) represent the locale's original default
I/O table Current default
--------------------- --------------------
1) Process RAW (*)
2) Cache Terminal UTF8 (*)
3) Other terminal UTF8 (*)
4) File RAW (*)
5) Magtape RAW (*)
6) TCP/IP RAW (*)
7) System call RAW (*)
8) Printer RAW (*)
I/O table: 4
1) RAW (*) 2) UTF8
3) UnicodeLittle 4) UnicodeBig
5) CP1250 6) CP1251
7) CP1252 8) CP1253
9) CP1255 10) CP437
11) CP850 12) CP852
13) CP866 14) CP874
15) EBCDIC 16) Latin2
17) Latin9 18) LatinC
19) LatinG 20) LatinH
21) LatinT
So you see the shortnames but no Latin1 but CP1252 which is almost identical.
the related problem is described here:
https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html
"ISO-8859-1 (also called Latin-1) is identical to Windows-1252 (also called CP1252) except for the code points 128-159 (0x80-0x9F). ISO-8859-1 assigns several control codes in this range. Windows-1252 has several characters, punctuation, arithmetic and business symbols assigned to these code points."
and Encoding Problem: ISO-8859-1 vs Windows-1252
So you should check what your customer really does (some hide the fact they use Windows)
The appropriate table can be used in
Thank you! I followed what you said, right up to the last step, "The appropriate table can be used..." How does one use these tables, just by the name "CP1252", as in s str0=$zconvert(str,direction,"CP1252") ? Or .../IOTABLE="CP1252"
Thanks!
the format ./IOTABLE="CP1252" applies only using the OPEN command
$ZCONVERT and %Stream.FileCharacter just use "CP1252" just by the name
Thanks again!
If you get data as ISO-8859-1 (aka Latin1) and have a Unicode (IRIS/Cache) installation then usually you have nothing to do (except, to process the data). What do you mean with "convert the text to UTF-8"? In IRIS/Cache you have (and work with) Unicode codepoints, UTF-8 comes into play only when you export your data but in your case, it will rather be ISO-8859-1 or do I something misunderstand?
By the way, if you return your data back to your Latin1 source (as Latin1) then you have to take some precautions because you have an unicode installation, so during the data processing you could mix your Latin1 data with true unicode data from other sources!
See: https://unicode.org/charts/
Also, you may download and read:
https://www.unicode.org/versions/Unicode13.0.0/UnicodeStandard-13.0.pdf
Sorry, could you explain? If there are special characters (i.e., non-ASCII) in the input stream from ISO-8859-1, would they load correctly into our database - that is, as the correct corresponding Unicode characters - without a conversion process? Thanks.
Counterquestion, do you have an example of a 'non-ASCII' char?
Codepoints 0x00-0x7F (0 - 127) are the C0 controls, aka Basic Latin (ASCII)
Codepoints 0x80-0xFF (128-255) are the C1 controls, aka Latin1
Take a look on https://www.unicode.org/charts/PDF/U0080.pdf
For example, Ä or ä are the german umlaut-A respective umlaut-a,
$ascii("Ä") --> 196 and $ascii("ä") --> 228 type in a terminal session on your system: write $char(196) --> Ä
Download and compare the above pdf with your iso-8859-1 data, there should be no difference.
Huh. Well, that would simplify matters. You're saying that Latin-1 is actually a subset of Unicode, backwards compatible. If so, never mind, sounds like we're good!
not so terrible... but one more thing
as I wrote in my first answer, you have to care, always to return ASCII data and not WIDE data
Thank you for your help.
Social networks
InterSystems resources
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue