How to set encoding configuration when using JDBC

Question

Question

Kim Jiyong · May 15

Hello. Currently, we are developing using Cache 2018 version.
Our team is working on improving an existing legacy program so that it can also be used on the web.

Before asking my question, here is the development environment.

IDE: IntelliJ
Framework: Spring Boot, MyBatis
DB Connection: JDBC (using the library provided by InterSystems)

Currently, we are successfully mapping global data through the %PERSISTENT class and able to query it with SQL. However, the problem is that the retrieved "Korean" data is all broken. It definitely seems like an encoding issue, but I don’t know where to start fixing it.

In such cases, is it common to change Cache’s locale settings? Or should we adjust the encoding settings when establishing the JDBC connection? Our team does not have a Cache expert. I would appreciate if you could explain the configuration steps in detail.

The locale settings under the Management Portal’s “National Language Setting” are as follows:

Internal tables: Latin1
Input/output tables(TCP/IP): RAW

To help understand the situation, here is a sample Java code I wrote for testing. If I decode the broken string(from cache) and then re-encode it to EUC-KR, the Korean text appears correctly. However, when setting UTF-8, it appears broken for some reason.

Charset EUC_KR = Charset.forName("EUC-KR");
byte[] bytes = broken.getBytes(StandardCharsets.ISO_8859_1);
String fixed = new String(bytes, EUC_KR);

(broken : ÃÖºÀ³² / fixed : 최봉남)

Product version: Caché 2018.1

Discussion (8)1

Log in or sign up to continue

Vitaliy Serdtsev · May 19

If that's correct, is there a way to configure the JDBC connection to interpret this data using EUC-KR encoding?

No: Caché JDBC Connection Properties
But even if JDBC had encoding settings, it wouldn't help you, because the 8-bit version of Caché doesn't support Korean and doesn't know anything about EUC-KR/KSC5601.

PS: I would consider switching to the Unicode version of Caché, which already has support for the Korean language.

0 0

score 0 · Answer 1 · 2025-05-15T03:41:49-04:00

Vitaliy Serdtsev · May 15

Which version of Caché are you using: 8-bit or Unicode?
What locale is set by default?

0 0

score 0 · Answer 2 · 2025-05-15T03:58:28-04:00

Sorry for the insufficient answer, as I'm not very familiar with Cache.
When I run below code in the Cache terminal, it returns 0, so it seems to be the 8-bit version.

Write $system.Version.IsUnicode()

And the default locale appears to be set to Latin1. (When I check from the Management Portal, it shows as Latin1. If there's a more accurate way to verify this, please let me know.)

score 0 · Answer 3 · 2025-05-15T04:30:26-04:00

When I run below code in the Cache terminal, it returns 0, so it seems to be the 8-bit version.

That's what I thought. I have the Unicode version installed, the rusw locale, and there are no problems at all.

Example

Class dc.a Extends %Persistent
{
Index is On s;
Property s As %String;

/// d ##class(dc.a).Test()
ClassMethod Test()
{
  s t=..%New()
  s t.s="최봉남"
  
  d t.%Save()
}
}

From any JDBC client:

select * from dc.a

Result:
ID	s
1	최봉남

(When I check from the Management Portal, it shows as Latin1. If there's a more accurate way to verify this, please let me know.)

In SMP: System Administration > Configuration > National Language Settings > Locale Definitions
You should have something like kor8, enu8, eng8, etc.

score 0 · Answer 4 · 2025-05-15T20:13:59-04:00

Sorry for the late reply.
When I check the locale from the given path, it shows:

"Your current locale is: enu8 (English, United States, Latin1 (ISO 8859-1))"

And when I ran the Example code, the retrieved data appeared as "???".

To provide more context about my situation:
The legacy application was originally developed in vb6, and within that program, Korean characters are displayed correctly. There’s probably some encoding configuration set in the application, but since it was compiled into a Dll file, it's difficult to check or verify the details.

However, when I access the data through the Cache Management Portal, Cache Terminal, VSC, or IntelliJ, the Korean text appears broken, as I mentioned earlier.

score 0 · Answer 5 · 2025-05-16T06:17:32-04:00

However, when I access the data through the Cache Management Portal, Cache Terminal, VSC, or IntelliJ, the Korean text appears broken, as I mentioned earlier.

Give an example of what you see in globals in the InterSystems Management Portal:
System Explorer > Globals

Since you have an 8-bit version and not Unicode, you probably won't see the hieroglyphs (최봉남), but rather something similar to (ÃÖºÀ³²).

This is not corrupted data, just a string stored in KSC5601/EUC-KR encoding.

From Caché terminal (Unicode version, locale = korw (Korean, Korea, Unicode)):

USER>w $zcvt("최봉남","O","KSC5601")
ÃÖºÀ³²
USER>w $zcvt("ÃÖºÀ³²","I","KSC5601")
최봉남

The same on Java:

System.out.println(new String("최봉남".getBytes(Charset.forName("EUC-KR")), Charset.forName("ISO-8859-1"))); // ÃÖºÀ³²
System.out.println(new String("ÃÖºÀ³²".getBytes(StandardCharsets.ISO_8859_1), Charset.forName("EUC-KR")));   // 최봉남

score 0 · Answer 6 · 2025-05-16T20:07:32-04:00

You're right!
The Korean data displayed in System Explorer > Globals appears as something like ÃÖºÀ³².

So, let me make sure I understood this correctly.
As I understand it, the data in the Cache database is currently stored using KSC5601/EUC-KR encoding, but the management portal(or another tools) in my local device is trying to interpret it using ISO-8859-1, which is why the characters look like broken.

If that's correct, is there a way to configure the JDBC connection to interpret this data using EUC-KR encoding?

score 0 · Answer 7 · 2025-05-19T20:08:37-04:00

Kim Jiyong · May 19

Then, i should consider switching to Unicode.

Thank you so much for your helpful answer!

0 0