UTF-8 Translation issue within IRIS
Good Afternoon My InterSystems IRIS Peers,
I have the following issue that I need help with, I tried all possibilities that I know of, therefore I am reaching out to the community for some insight.
I have a SFTP service that pulling 2.5.1 DFT HL7 messages from our SFTP file server. The issue is that IRIS is transforming patients names and addresses that contain special chars UTF-8 to ANSI.
Examples:
è = è
é = é
í = Ã- .......etc
I would like to know if there is a possibility to transform the RAW HL7 before IRIS do its magic or within IRIS.
I have used the $Translate() function to convert the chars but, as you can see/tell that ended really quick, getting 15K of DFT a day.
Note:
Have have worked and played with the charset and the default char encoding and ended nowhere close..
The RAW content in the view raw contents in the portal has the correct char in it, but the View Full contents doesn't, is has the ANSI translation.
The have tried the following and no success.. grabbing the HL7 in IRIS, pulling the raw contents and converting that raw contests into a new HL7 but the ANSI chars is still present.
Please help and if you need more info I am happy to fill you in.
Best
Leon Wilson
What I really need is to do the translation to do the following.
è e
é e
ê e
ë e
ì i
í i
î i
ï i
ð o
ñ n
ò o
ó o
ô o
õ o
ö o
à a
á a
â a
ã a
ä a
å a
ù u
ú u
û u
ü u
ý y
.....
just so our system/server dont throw an error everytime.
Thank you again for your help.
This was one of two attempts, just to give you an idea.:
$TRANSLATE(source.RawContent,$CHAR(223)_$CHAR(225)_$CHAR(233)_$CHAR(237)_$CHAR(241)_$CHAR(243)_$CHAR(250)_$CHAR(161)_$CHAR(191)_$CHAR(224)_$CHAR(226)_$CHAR(227)_$CHAR(228)_$CHAR(232)_$CHAR(233)_$CHAR(234)_$CHAR(236)_$CHAR(237)_$CHAR(238)_$CHAR(242)_$CHAR(244)_$CHAR(245)_$CHAR(249)_$CHAR(250)_$CHAR(251)_$CHAR(253)_$CHAR(169)_$CHAR(192)_$CHAR(193)_$CHAR(194)_$CHAR(195)_$CHAR(200)_$CHAR(201)_$CHAR(202)_$CHAR(204)_$CHAR(205)_$CHAR(206)_$CHAR(209)_$CHAR(210)_$CHAR(211)_$CHAR(212)_$CHAR(213)_$CHAR(217)_$CHAR(218)_$CHAR(219)_$CHAR(221)_$CHAR(8482)_$CHAR(8242)_$CHAR(180)_$CHAR(8217)_$CHAR(8216)," aeinou aaaaeeeiiiooouuuy AAAAEEEIIINOOOUUUY ")
..ToUpper($TRANSLATE(source.{PID:PatientName().FamilyName},$CHAR(223)_$CHAR(225)_$CHAR(233)_$CHAR(237)_$CHAR(241)_$CHAR(243)_$CHAR(250)_$CHAR(161)_$CHAR(191)_$CHAR(224)_$CHAR(226)_$CHAR(227)_$CHAR(228)_$CHAR(232)_$CHAR(233)_$CHAR(234)_$CHAR(236)_$CHAR(237)_$CHAR(238)_$CHAR(242)_$CHAR(244)_$CHAR(245)_$CHAR(249)_$CHAR(250)_$CHAR(251)_$CHAR(253)_$CHAR(169)_$CHAR(192)_$CHAR(193)_$CHAR(194)_$CHAR(195)_$CHAR(200)_$CHAR(201)_$CHAR(202)_$CHAR(204)_$CHAR(205)_$CHAR(206)_$CHAR(209)_$CHAR(210)_$CHAR(211)_$CHAR(212)_$CHAR(213)_$CHAR(217)_$CHAR(218)_$CHAR(219)_$CHAR(221)_$CHAR(8482)_$CHAR(8242)_$CHAR(180)_$CHAR(8217)_$CHAR(8216)," aeinou aaaaeeeiiiooouuuy AAAEEEEIIINOOOUUUY "))
Just wanted to share a test message. The RAW is the original message and the Blue one is what IRIS is transforming it into.
Hey Leon.
The element of this issue that is perplexing me is that there is a difference between the RAW and Full view.
Could you try sending a sample message to a HL7 File operation with the charset set to UTF-8? I'm curious to know if the characters display as expected, stay as the ANSI character, or become something else.
I am wondering if the ANSI displaying in just the Full message viewer is contained to just the display of the full message, and any issues you are seeing in a destination system are a separate but similar issue with character encoding.
Good Morning Julian,
Thank you for the response, I have added some more examples to show what happens when the RAW DFT gets pulled into IRIS.
What you see is typical for an inappropriate double encoding.
USER>r c Gômez Jesús María USER>zzdump c 0000: 47 F4 6D 65 7A 20 4A 65 73 FA 73 20 4D 61 72 ED Gômez Jesús Marí 0010: 61 ;;; this is already encode in UTF-8 !!!!! ;;; what you name RAW is already UTF-8 !!! a USER>s z=$zcvt(c,"O","UTF8") USER>w z Gômez Jesús MarÃa ;;; now its just broken USER>zzdump z 0000: 47 C3 B4 6D 65 7A 20 4A 65 73 C3 BA 73 20 4D 61 Gômez Jesús Ma 0010: 72 C3 AD 61 rÃa USER>
Without being able to see your environment, it's difficult to say where the disconnect is or what would need to be tweaked to decode those characters correctly. However, if you have an opportunity to manually process the HL7 data at any point as it flows through the system, then you may be able to call $ZConvert/$ZCVT on the encoded data to decode it:
USER>s str = $C(90,111,108,195,173,118,97,114,101,115) USER>w str ZolÃvares USER>w $ZCVT(str, "I", "UTF8") Zolívares USER>
https://docs.intersystems.com/iris20212/csp/docbook/Doc.View.cls?KEY=RCO...
However, there should be a way to specify to your business service the encoding of input data so that it can decode the data for you. I would have thought that this would be done with either the "Charset" or "Default Char Encoding" settings, but it sounds like you've already tried that. I'm not sure why this wouldn't be working, but I'm fairly confident that this is how encoded data is supposed to be decoded, so it may be worth another look.
Thank you all for your input and advise the $ZCVT() did the trick.