Question
· Mar 4, 2022

UTF-8 Translation issue within IRIS

Good Afternoon My InterSystems IRIS Peers,

I have the following issue that I need help with, I tried all possibilities that I know of, therefore I am reaching out to the community for some insight.

I have a SFTP service that pulling 2.5.1 DFT HL7 messages from our SFTP file server. The issue is that IRIS is transforming patients names and addresses that contain special chars UTF-8 to ANSI.

Examples:

è = è
é = é
í = í-  .......etc

I would like to know if there is a possibility to transform the RAW HL7 before IRIS do its magic or within IRIS.

I have used the $Translate() function to convert the chars but, as you can see/tell that ended really quick, getting 15K of DFT a day.

Note:

Have have worked and played with the charset and the default char encoding and ended nowhere close..

The RAW content in the view raw contents in the portal has the correct char in it, but the View Full contents doesn't, is has the ANSI translation.

The have tried the following and no success.. grabbing the HL7 in IRIS, pulling the raw contents and converting that raw contests into a new HL7 but the ANSI chars is still present.

 

Please help and if you need more info I am happy to fill you in.

Best

Leon Wilson

Product version: IRIS 2021.2
Discussion (7)1
Log in or sign up to continue

What I really need is to do the translation to do the following.

è     e
é     e
ê     e
ë     e
ì     i
í     i
î     i
ï     i
ð     o
ñ     n
ò     o
ó     o
ô     o
õ     o
ö     o

à     a
á     a
â     a
ã     a
ä     a
å     a

ù     u
ú     u
û     u
ü     u
ý     y

.....

just so our system/server dont throw an error everytime.

Thank you again for your help.

This was one of two attempts, just to give you an idea.:

$TRANSLATE(source.RawContent,$CHAR(223)_$CHAR(225)_$CHAR(233)_$CHAR(237)_$CHAR(241)_$CHAR(243)_$CHAR(250)_$CHAR(161)_$CHAR(191)_$CHAR(224)_$CHAR(226)_$CHAR(227)_$CHAR(228)_$CHAR(232)_$CHAR(233)_$CHAR(234)_$CHAR(236)_$CHAR(237)_$CHAR(238)_$CHAR(242)_$CHAR(244)_$CHAR(245)_$CHAR(249)_$CHAR(250)_$CHAR(251)_$CHAR(253)_$CHAR(169)_$CHAR(192)_$CHAR(193)_$CHAR(194)_$CHAR(195)_$CHAR(200)_$CHAR(201)_$CHAR(202)_$CHAR(204)_$CHAR(205)_$CHAR(206)_$CHAR(209)_$CHAR(210)_$CHAR(211)_$CHAR(212)_$CHAR(213)_$CHAR(217)_$CHAR(218)_$CHAR(219)_$CHAR(221)_$CHAR(8482)_$CHAR(8242)_$CHAR(180)_$CHAR(8217)_$CHAR(8216)," aeinou  aaaaeeeiiiooouuuy AAAAEEEIIINOOOUUUY     ")

..ToUpper($TRANSLATE(source.{PID:PatientName().FamilyName},$CHAR(223)_$CHAR(225)_$CHAR(233)_$CHAR(237)_$CHAR(241)_$CHAR(243)_$CHAR(250)_$CHAR(161)_$CHAR(191)_$CHAR(224)_$CHAR(226)_$CHAR(227)_$CHAR(228)_$CHAR(232)_$CHAR(233)_$CHAR(234)_$CHAR(236)_$CHAR(237)_$CHAR(238)_$CHAR(242)_$CHAR(244)_$CHAR(245)_$CHAR(249)_$CHAR(250)_$CHAR(251)_$CHAR(253)_$CHAR(169)_$CHAR(192)_$CHAR(193)_$CHAR(194)_$CHAR(195)_$CHAR(200)_$CHAR(201)_$CHAR(202)_$CHAR(204)_$CHAR(205)_$CHAR(206)_$CHAR(209)_$CHAR(210)_$CHAR(211)_$CHAR(212)_$CHAR(213)_$CHAR(217)_$CHAR(218)_$CHAR(219)_$CHAR(221)_$CHAR(8482)_$CHAR(8242)_$CHAR(180)_$CHAR(8217)_$CHAR(8216)," aeinou  aaaaeeeiiiooouuuy AAAEEEEIIINOOOUUUY     "))

Hey Leon.

The element of this issue that is perplexing me is that there is a difference between the RAW and Full view.

Could you try sending a sample message to a HL7 File operation with the charset set to UTF-8? I'm curious to know if the characters display as expected, stay as the ANSI character, or become something else.

I am wondering if the ANSI displaying in just the Full message viewer is contained to just the display of the full message, and any issues you are seeing in a destination system are a separate but similar issue with character encoding.

What you see is typical for an inappropriate double encoding.

 
USER>r c
Gômez Jesús María
USER>zzdump c
0000: 47 F4 6D 65 7A 20 4A 65 73 FA 73 20 4D 61 72 ED         Gômez Jesús Marí
0010: 61     
;;; this is already encode in UTF-8 !!!!! 
;;; what you name RAW is already UTF-8 !!!                                                a

USER>s z=$zcvt(c,"O","UTF8")
USER>w z
Gômez Jesús María
;;; now its just broken
USER>zzdump z
0000: 47 C3 B4 6D 65 7A 20 4A 65 73 C3 BA 73 20 4D 61         Gômez Jesús Ma
0010: 72 C3 AD 61                                             ría
USER>

Without being able to see your environment, it's difficult to say where the disconnect is or what would need to be tweaked to decode those characters correctly.  However, if you have an opportunity to manually process the HL7 data at any point as it flows through the system, then you may be able to call $ZConvert/$ZCVT on the encoded data to decode it:

USER>s str = $C(90,111,108,195,173,118,97,114,101,115)
 
USER>w str
Zolívares
USER>w $ZCVT(str, "I", "UTF8")
Zolívares
USER>

https://docs.intersystems.com/iris20212/csp/docbook/Doc.View.cls?KEY=RCO...

However, there should be a way to specify to your business service the encoding of input data so that it can decode the data for you.  I would have thought that this would be done with either the "Charset" or "Default Char Encoding" settings, but it sounds like you've already tried that.  I'm not sure why this wouldn't be working, but I'm fairly confident that this is how encoded data is supposed to be decoded, so it may be worth another look.