Question
· May 15

How to set encoding configuration when using JDBC

Hello. Currently, we are developing using Cache 2018 version.
Our team is working on improving an existing legacy program so that it can also be used on the web.

Before asking my question, here is the development environment.

  • IDE: IntelliJ
  • Framework: Spring Boot, MyBatis
  • DB Connection: JDBC (using the library provided by InterSystems)

Currently, we are successfully mapping global data through the %PERSISTENT class and able to query it with SQL. However, the problem is that the retrieved "Korean" data is all broken. It definitely seems like an encoding issue, but I don’t know where to start fixing it.

In such cases, is it common to change Cache’s locale settings? Or should we adjust the encoding settings when establishing the JDBC connection? Our team does not have a Cache expert. I would appreciate if you could explain the configuration steps in detail.

The locale settings under the Management Portal’s “National Language Setting” are as follows:

  • Internal tables: Latin1
  • Input/output tables(TCP/IP): RAW  

To help understand the situation, here is a sample Java code I wrote for testing. If I decode the broken string(from cache) and then re-encode it to EUC-KR, the Korean text appears correctly. However, when setting UTF-8, it appears broken for some reason.

Charset EUC_KR = Charset.forName("EUC-KR");
byte[] bytes = broken.getBytes(StandardCharsets.ISO_8859_1);
String fixed = new String(bytes, EUC_KR);

(broken : ÃÖºÀ³² / fixed : 최봉남)

Product version: Caché 2018.1
Discussion (5)1
Log in or sign up to continue

Sorry for the insufficient answer, as I'm not very familiar with Cache.
When I run below code in the Cache terminal, it returns 0, so it seems to be the 8-bit version.

Write $system.Version.IsUnicode()


And the default locale appears to be set to Latin1. (When I check from the Management Portal, it shows as Latin1. If there's a more accurate way to verify this, please let me know.)

When I run below code in the Cache terminal, it returns 0, so it seems to be the 8-bit version.

That's what I thought. I have the Unicode version installed, the rusw locale, and there are no problems at all.

 
Example
(When I check from the Management Portal, it shows as Latin1. If there's a more accurate way to verify this, please let me know.)

In SMP: System Administration > Configuration > National Language Settings > Locale Definitions
You should have something like kor8, enu8, eng8, etc.

Sorry for the late reply.
When I check the locale from the given path, it shows: 

"Your current locale is: enu8 (English, United States, Latin1 (ISO 8859-1))"

And when I ran the Example code, the retrieved data appeared as "???".

To provide more context about my situation:
The legacy application was originally developed in vb6, and within that program, Korean characters are displayed correctly. There’s probably some encoding configuration set in the application, but since it was compiled into a Dll file, it's difficult to check or verify the details.

However, when I access the data through the Cache Management Portal, Cache Terminal, VSC, or IntelliJ, the Korean text appears broken, as I mentioned earlier.

However, when I access the data through the Cache Management Portal, Cache Terminal, VSC, or IntelliJ, the Korean text appears broken, as I mentioned earlier.

Give an example of what you see in globals in the InterSystems Management Portal:
System Explorer > Globals

Since you have an 8-bit version and not Unicode, you probably won't see the hieroglyphs (최봉남), but rather something similar to (ÃÖºÀ³²).

This is not corrupted data, just a string stored in KSC5601/EUC-KR encoding.

From Caché terminal (Unicode version, locale = korw (Korean, Korea, Unicode)):

USER>w $zcvt("최봉남","O","KSC5601")
ÃÖºÀ³²
USER>w $zcvt("ÃÖºÀ³²","I","KSC5601")
최봉남

The same on Java:

System.out.println(new String("최봉남".getBytes(Charset.forName("EUC-KR")), Charset.forName("ISO-8859-1"))); // ÃÖºÀ³²
System.out.println(new String("ÃÖºÀ³²".getBytes(StandardCharsets.ISO_8859_1), Charset.forName("EUC-KR")));   // 최봉남