Question
· Apr 1, 2022

Issues with Character Encoding when viewing messages

Hi everyone.

I was wondering how people manage viewing messages that use a character encoding incompatible with the Management Portals use of UTF-8.

For example, a message that looks like this in Windows-1252/Latin1:

Will display as this under the message "Full Contents"

and this under the "Raw Contents"

*EDIT*

I tripped myself up early on, but this covers how I went wrong - https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html

Discussion (6)1
Log in or sign up to continue

To use the Euro symbol € as an example, it is a character in UTF-8. However, it is not a character in Latin1. Your Business Service uses the Default Char Encoding setting to read the message, so it does not read the Euro symbol as you would expect. Changing the Default Char Encoding to "unicode utf-8" would allow the Business Service to read that character as you would expect.

When you output your message, nothing in the message has actually changed (unless you have done some sort of Data Transformation). So, when you open the text editor, which uses UTF-8, you are able to see the Euro symbol again.

Hey Shamus. I do appreciate you replying, however I may not be being clear in my original question.

Even if I have the correct settings in the Business Service, if the content format is one that conflicts with UTF-8, the display within the management portal will show as per my examples because the browser is trying to display non UTF-8 content as UTF-8. My original question was asking how others work with this.

If you spot my second top-level comment, you will see that I had looked into using the browser to change the http content-type header, but that the major browsers no longer support such a feature.

I'm guessing my only option is to export a message and then review the content there.

Hey Shamus.

I'm back to eat some humble pie!

Turns out, I had confused myself early on and wrongly believed that Windows-1252/Latin1 were the same and I had my service set to Latin 1. This was then creating a scenario where I was digging myself into a hole of bad information.

In fact, they are almost the same except for the exact characters I was using in my example. These code points are used by Latin-1 as control codes, and when Windows-1252 is mislabeled as Latin-1 they get lost...

Thanks again for replying to my initial question and comments.

I have looked at how I can change the behavior of the browser, but it seems that the big 3 (well, two) no longer have this feature. Here is an article giving some back story.

Basically, Chrome used to have the option to manually overwrite the encoding type, but this was removed (however there are some 3rd party extensions out there to replace this feature, but I feel uncomfortable using random 3rd party extensions around healthcare data)

Firefox also used to have this feature, but they replaced it with a tool that attempts to repair the character encoding, however it didn't do a great job as it decided it was "IBM866" so the content looked like this: