User bio
404 bio not found
Member since Feb 2, 2016
Replies:

The Class Reference pages for IRIS, available in any IRIS installation, describe all the classes installed in the installation.  The Class Reference for the %Regex.Matcher class documents that the %Regex.Matcher class comes from the International Components for Unicode (ICU). The ICU maintains web pages at https://icu.unicode.org .  The class reference documentation also contains the following statement for ICU documentation specific to the IRIS %Regex:

{quote}The definition and features of the ICU regular expression package can be found in https://unicode-org.github.io/icu/userguide/strings/regexp.html .{quote}

Additional documentation on the Unicode Regex package specific to how it interacts with the Unicode character set can be found at https://www.unicode.org/reports/tr18/ .

A quick comment on performance.  A TRY ... CATCH is very efficient at run time if no exception occurs.  The only extra code that is executed is a jump at the at end of the TRY block to go around the CATCH block.  However, TRY ... CATCH is slower than using the %ZTRAP when an execption does occur.  At compile time TRY ... CATCH blocks build tables that include program counters that describe which object code locations are in each TRY block and where the corresponding CATCH block begins.  When an error is signaled (or THROWn) at run time, it is necessary to scan the frame stack searching the compile-time generated TRY ... CATCH tables to find the CATCH location to go to after popping the intermediate stack frames.  If there are no TRY ... CATCH blocks involved then it is usually faster to just jump to the most recent setting of the $ZTRAP variable.

When exceptions are "exceptional"  then TRY ... CATCH is most efficient since it avoids pushing $ZTRAP value in preparation for handling an exception and it also avoids popping the $ZTRAP value when the exception does not occur.  If exceptions are part of a usual, successful, and often executed code path then it might be better to use $ZTRAP to catch those frequently executed signals.

Upgrading from an 8-bit instance to  a Unicode instance is much simpler as you can just skip your export and import steps.  Instead, just reinstall your original IRIS kit as an update kit.  During the update, the installation will ask you:

Do you want to convert 8-bit to Unicode <No>?

Just answer Yes and the instance will be converted.

Whenever a string value in a ^global variable contains only 8-bit characters then a Unicode IRIS instance stores that string in IRIS.DAT using 8-bit representation in order to save space.  After the update, all your existing global data items are still there and the strings are all in 8-bit.  IRIS Unicode instances use the UTF-16 Unicode encoding.  If you have any 8-bit strings encoded in UTF-8 then you can use $ZCVT to convert UTF-8 strings to the IRIS default Unicode representation which uses UTF-16.  Functions like $wlength, $wextract, etc. do not work on UTF-8 encoded 8-bit strings but they do work on the UTF-16 encoded strings.

Note, if you do port IRIS.DAT files between different hardware instances and you also port between big-endian hardware and little-endian hardware  (e.g., aix to windows) then there is a documented utility that describes how to convert the IRIS.DAT files between big-endian and little-endian representation.

There is no support for automatic conversion starting from a Unicode IRIS.DAT file back to an 8-bit IRIS.DAT file.  You can imagine this working if you are very lucky and the ported Unicode IRIS.DAT files just happen to have no Unicode strings, which will not happen with the "%SYS" namespace because the upgrade will add Unicode support to that namespace which will include some Unicode strings.  With only a few, easily found Unicode strings then you can use %ZCVT to convert UTF-16 to 8-bit UTF-8.  If are so lucky that you can do those conversions to completely remove all UTF-16 strings from a IRIS Unicode instance then you can try to install a new 8-bit instance and keep the IRISSYS and IRISLIB databases and replace the other database files with IRIS.DAT files that now just contain 8-bit user string data.  If you fail to convert all the Unicode strings while trying to go back to an 8-bit instance then I believe you will get a <WIDE CHAR> signal if you attempt to access wide UTF-16 data.

Certifications & Credly badges:
Steven has no Certifications & Credly badges yet.
Global Masters badges:
Steven has no Global Masters badges yet.
Followers:
Following:
Steven has not followed anybody yet.