Question
· 16 hr ago

Non ascii characters got considered as a question mark inside regex engine. Is this a bug ?

The following regex is matching while I think it should not :

write $match("♥","\?") //print '1' (unexpected)

It should only match the '?' character. I think what happen is the ♥ character got converted to '?' (as it's not within regular ascii range) before being validated by the regex engine.

In fact, this give a clue about what happen : 

write $char($ascii("♥")) //print '?'

Is this an IRIS well known limitation, is there workarounds ? Should I report it to InterSystems ?

In my case, a way to detect non standard ascii code in a string will be good enough. I'm not sure if it's possible if all string function consider those special characters as '?'.

Product version: IRIS 2024.1
$ZV: IRIS for Windows (x86-64) 2024.1.2 (Build 398U) Thu Oct 3 2024 14:01:59 EDT
Discussion (7)2
Log in or sign up to continue

I also did the following test : create a routine with "write $ascii("♥")" inside and call it from outside (eg: Studio console). It works (so server code also works).

However I have a IRIS server where write $ascii("♥") always return 63, even in code and Terminal. Is there a settings somewhere in portal for UTF-8 support ?

EDIT : I found where it's being defined, it's inside NLS (National Language Settings). 

The server has Latin1 defined, while the working local station has UTF-8. You can define different tables per category : for Terminal, Process, ...

I seriously doubt it is a %Regex issue.  The ? appears when some interface is converting characters to one of the many 8-bit character set codes and the source character code has a character that is not in the destination character code.  This can happen to about 2**20 characters of Unicode when they are converted to any 8-bit code.  It can also happen when converting 8-bit to a different 8-bit code.

Your terminal emulator and the IRIS terminal device can both do such conversions.  A IRIS file device can also do such conversions.  Different platforms can use different default conversions which explains why some different people cannot reproduce the results of other people.