Question Yone Moreno Jiménez · Feb 24

What is the most efficient way to debug encoding characters issues?

Hello, how are you?

Using Healthshare for Interoperability, we often see a wide variety of encoding issues. Some happen when transforming XML to HL7. Some happen the other ay around transforming HL7 to XML.

Is there a valid way to understand which encoding needs to be used?

I ask, because we use SOAP or REST or TCP HL7 services.

And we deliver messages with TCP HL7, SOAP, HTTP Operations.

And we often face this issue where special characters like ñ or á, are not encoded correctly.

Thanks for your replies.

Product version: HealthShare 2020.1

Comments

DC AI Bot · Feb 24 Bot

To debug encoding character issues:

  1. Ensure the character encoding specified in the HL7 message's MSH segment (MSH:18) matches the service settings. For example, if MSH:18 specifies UTF-8, the application's Default Character Encoding should also enforce UTF-8. Use the ! prefix with the encoding name (e.g., !UTF-8) to disregard MSH:18 and force the service to use the configured encoding. Use a tool like zzdump to inspect characters and identify problematic inputs 123.

  2. For transformations or specific service configurations, the DefaultCharEncoding property in the business service or operation must align with the message encoding. Supported encodings include UTF-8, Latin1, NLS definitions installed on the server, and raw InterSystems translation tables 234.

  3. Issues often arise due to mismatched or unsupported encodings. Adjust settings strategically—for instance, testing alternative options like !latin1, !utf-8, or explicitly declared ones—ensuring MSH:18 and service configurations synchronize for proper handling of special characters like ñ or á 134.

Focus on inspecting and properly aligning encoding definitions to mitigate issues with special characters during transformations or operations.

Sources:

0
Julian Matthews · Feb 27

Depending on the issue I'm seeing, I tend to rely on these two links to help pin down the issue:

€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ.
0
Robert Barbiaux  Feb 28 to Julian Matthews

Another caveat is that some systems do not map all Unicode characters to their internal encoding while preserving the original character count.

This becomes problematic in combination with using fixed-length records.

I've seen this in the wild, a target system using an EBCDIC code page and translating Unicode characters with diacritics that do not exist in the code page into a 2 (or more !) character sequence.

This occured, for example, with Eastern Europe first or last names containing characters such as ǎ — Unicode: U+01CE (latin small letter a with "caron") that got translated to a - rather ugly - "a?" sequence.

0
Wenyi Liu · Feb 28
  1. Validate the source encoding FIRST (critical root cause):

    1. Check MSH-18 (HL7 character set): Use ISO-8859-1 (LATIN-1) (native for Spanish characters ñ/á) or UTF-8; never leave it blank.

    2. Check XML prolog: <?xml version="1.0" encoding="UTF-8"?>

    3. Check SOAP/REST headers: Content-Type: text/xml; charset=UTF-8

  2. Enforce explicit encoding (disable auto-detection):

    1. HL7 TCP Services/Operations: Set CharSet=UTF-8 in the config.

    2. XML transformations: Use %XML.Reader/Writer with OutputEncoding=UTF-8 in DTL/data conversions.

  3. Debug natively in HealthShare (fastest tool):

    1. Use the Message Viewer to inspect raw message bytes.

    2. Test encoding with IRIS terminal: WRITE $ZCONVERT("ñá","O","UTF8") / $ZCONVERT("ñá","O","LATIN1")

  4. Fix 2020.1 pitfall:

    1. Do NOT mix encodings; standardize on UTF-8 end-to-end for all interfaces (HL7/SOAP/REST/TCP).

This workflow instantly resolves corrupted special characters in HL7↔XML transformations.


0
Robert Barbiaux · Feb 28

The encoding behavior of HL7 services and operations in EnsLib.HL7 package is driven by 

  • the value in the message MSH:18 field 
  • the value of the DefCharEncoding property of EnsLib.HL7.Util.IOFraming, which is available on HL7 services and operations

If MSH:18 has a value, it is used to determine character encoding.

If MHS:18 is empty or DefCharEncoding property starts with "!", characters are encoded according to it's value, refer to the documentation.

Beware, I've seen more than once sloppy HL7 v2.x interface implementations that are sending messages with an MSH:18 value mismatching the actual character encoding (use ! to fix this).

To check what is actual encoding of an incoming message, capture the message bytes and open then with a text editor such a Notepad++, or view them with an hex editor, 😅 who's still using one of those ?

0