Article
Robert Cemper · Jul 24, 2017 1m read

[Fixed in 2018.1.1 + quick fix] Anomaly in German sorting

Since centuries the German language carries along a special character
that had no upper case representation. End o f June 2017  there was the
official decision that lower case ß $c(223)  now also has an upper case 
representation $c($zhex("1e9e")) looking pretty similar in most fonts.
[ISO defined this character already in 2008.]

Investigating  $zconvert(...,"U") and $system.SQL.UPPER(),$system.SQL.SQLUPPER()
it turned out that $c(223) is unchanged during conversion.
No big issue for presentation as both "ß" look alike.
But bad for sorting.

With NLS correctly set to DEUW you get for lower case:

USER>zw low
low("weiser")=1
low("weißer")=1
low("weiter")=1

but for upper case it's broken:
USER>zw up
up("WEISER")=1
up("WEITER")=1
up("WEIßER")=1

Similar for global collation GERMAN3

Technically the sort is correct as lower case sorts after upper case.
The issue is already reported as prodlog 147715

I think you should be aware of this unexpected behavior for all those cases
where you don't have control over the content.
Eg. in text analysis or similar high volume input streams.

In past in typical properties this was often cheated using SS or SZ instead.
But this means changing the content and the length.
Switzerland just eliminated ß as valid character some years ago. Another approach.
 

11
0 489
Discussion (5)2
Log in or sign up to continue

I just got this information:

Developer group worked at it and version 2018.1 1 will contain the new pattern.

smileysmiley

Just one file should be changed for it, and it is less than an hour of work.

You can fix this error directly now, if you don't want to wait for the release of version 2018.1.1.

To do this, follow these steps:

  1. make export the locale "deuw"
    %SYS>Locales("deuw")="" d $system.OBJ.DisplayError(##class(Config.NLS.Locales).ExportList("loc_deuw.xml",.t,.Locales)) zw Locales,t
  2. fix the file loc_deuw.xml (by default located in the folder %CACHEHOME%\Mgr)
    Name of subtable
    (Where to insert)
    New lines
    (That to insert)
    Add the following lines to the appropriate subtables:
    COL-German3-Unicode <FromToItem FromToKey="55,55,1">83,83;</FromToItem>
    <FromToItem FromToKey="55,55,2">7838;</FromToItem>
    <FromToItem FromToKey="55,55,3">83,7838;</FromToItem>
    COL-Unicode-German3 <FromToItem FromToKey="7838">55,55;2</FromToItem>
    <FromToItem FromToKey="83,83">55,55;1</FromToItem>
    <FromToItem FromToKey="83,7838">55,55;3</FromToItem>
    LowerCase-Unicode-Unicode <FromToItem FromToKey="7838">223</FromToItem>
    UpperCase-Unicode-Unicode <FromToItem FromToKey="223">7838</FromToItem>
  3. import fixed loc_deuw.xml:
    %SYS>d $system.OBJ.DisplayError(##class(Config.NLS.Locales).ImportAll("loc_deuw.xml",.t,1+2+4)) zw t
    %SYS>d $system.OBJ.DisplayError(##class(Config.NLS.Locales).Compile("deuw"))
    %SYS>Locale^NLSLOAD("deuw")

    Just in case, restart Caché.

Now, run a small test

#include %systemInclude
#include %occErrors
#include %syNLS
test() public {
  
  #dim ex As %Exception.AbstractException

  try {
    $$$AddAllRoleTemporaryInTry
    n $namespace
    s $namespace="%SYS"
    
    oldLocale=$$$LOCALENAME
    w "Old locale = ",oldLocale,!
    ##class(Config.NLS.Locales).Install("deuw")
    "Current locale = ",$$$LOCALENAME,!!
    
    ^||low,^||up

    w="wei"_$c(223)_"er","weiter","weiser" {
      ^||low($zcvt(w,"L"))=1
      ^||up($zcvt(w,"U"))=1
    }
    zw ^||low,^||up
    
    low=$c(223)
    up=$zcvt(low,"U")
    zw low,up
    zzdump low,up
    
  }catch(ex{
    "Error "ex.DisplayString(),!
  }
  ##class(Config.NLS.Locales).Install(oldLocale)

}

My result:

USER>d ^test
Old locale = rusw
Current locale = deuw
 
^||low("weiser")=1
^||low("weißer")=1
^||low("weiter")=1
^||up("WEISER")=1
^||up("WEIẞER")=1
^||up("WEITER")=1
low="ß"
up="ẞ"
 
0000: DF                                                      ß
0000: 1E9E                                                    ẞ

WOW !
yesyesyes

Great stuff. I almost can't believe it.