Question
· Oct 8, 2024

How to convert IRIS from 8-bit into Unicode ?

I have an IRIS installation that is using 8-bit charset encoding (set to deu8 / Latin 1). I would like to convert everything (database and system)  to Unicode

Charset encoding is something asked during installation, is it possible to change this on the fly ? The installer clearly say that Unicode systems cannot be converted. What about 8 bit ? 

Same for database : is there possible conversion ?

My current plan is the following :

- export all globals from 8-bit instance
- install a new Unicode instance
- import all globals into Unicode instance

Is there a simpler approach ?

Product version: IRIS 2021.1
$ZV: IRIS for Windows (x86-64) 2021.1 (Build 215U) Wed Jun 9 2021 09:39:22 EDT
Discussion (5)3
Log in or sign up to continue

It depends on how much non-unicode data you have. If it's not much, you can try to use XML way.

Another way, is to use some simple scripts, that order over all globals, and convert in place. Skipping indexes, with full rebuild.

I think there were multiple solutions, to this task. You can try to find them.

You have to collect as much as possible about your data.

  • Code, is it in 8-bit or not, or it's just all in English, some code may contain comments in a native language, and if you don't use git or other source control, you may need to convert too
  • Data, is it some legacy data, or class based. If native, is it delimiter based or same as classes with $listbuild.
    • Two different strategies in this case, data with some plain delimiter can be converted right away, $listbuild based data, would require to go through $listbuild
  • Any additional data. Some legacy applications may store additional information such as TUI/CHUI forms in pseudographics somewhere, you should look after this data as well
  • Any other possible sources 8-bit data
  • Communication, file-processing, can be changed

The last time I implemented a converter for 20+ years old application, more than 15 years ago, it was an application with textual terminal interface, and it went well.

Upgrading from an 8-bit instance to  a Unicode instance is much simpler as you can just skip your export and import steps.  Instead, just reinstall your original IRIS kit as an update kit.  During the update, the installation will ask you:

Do you want to convert 8-bit to Unicode <No>?

Just answer Yes and the instance will be converted.

Whenever a string value in a ^global variable contains only 8-bit characters then a Unicode IRIS instance stores that string in IRIS.DAT using 8-bit representation in order to save space.  After the update, all your existing global data items are still there and the strings are all in 8-bit.  IRIS Unicode instances use the UTF-16 Unicode encoding.  If you have any 8-bit strings encoded in UTF-8 then you can use $ZCVT to convert UTF-8 strings to the IRIS default Unicode representation which uses UTF-16.  Functions like $wlength, $wextract, etc. do not work on UTF-8 encoded 8-bit strings but they do work on the UTF-16 encoded strings.

Note, if you do port IRIS.DAT files between different hardware instances and you also port between big-endian hardware and little-endian hardware  (e.g., aix to windows) then there is a documented utility that describes how to convert the IRIS.DAT files between big-endian and little-endian representation.

There is no support for automatic conversion starting from a Unicode IRIS.DAT file back to an 8-bit IRIS.DAT file.  You can imagine this working if you are very lucky and the ported Unicode IRIS.DAT files just happen to have no Unicode strings, which will not happen with the "%SYS" namespace because the upgrade will add Unicode support to that namespace which will include some Unicode strings.  With only a few, easily found Unicode strings then you can use %ZCVT to convert UTF-16 to 8-bit UTF-8.  If are so lucky that you can do those conversions to completely remove all UTF-16 strings from a IRIS Unicode instance then you can try to install a new 8-bit instance and keep the IRISSYS and IRISLIB databases and replace the other database files with IRIS.DAT files that now just contain 8-bit user string data.  If you fail to convert all the Unicode strings while trying to go back to an 8-bit instance then I believe you will get a <WIDE CHAR> signal if you attempt to access wide UTF-16 data.