Discussion
· Nov 9, 2020

Accent sensitivity and other internationalization issues

Hi,

InterSystems IRIS has long supported the obvious translation functions required to for converting to upper or lowercase to enforce case-insensitive string comparison (e.g. in ObjectScript with $zconvert) and sorting (e.g. with SQL collation functions, not to be confused with NLS collation). Customers in international contexts have at times used custom workarounds to also treat accent insensitivity or even more advanced normalization duct tape. We’re looking to address such use cases at the system and SQL level to increase convenience for this international audience, which is well represented on the Developer Community.

No rocket science here, but we hope to make this comprehensive yet at the same time keep it practical to build in and easy to use for customers (many of which will only need the basics):

  • Extend $zconvert() with an "A" flag (next to "U" and "L") for stripping accents based on the recommended ICU transformation. This will be implemented through an internal translation table which can be customized if need by (as has always been the case in IRIS)
  • Add a new %SQLGENERIC( transformation-spec [, truncate-length] ) collation function for use in DDL (defining columns or indices) or SQL queries,  using parameters to control the primary "transformations" our existing collation functions offer (uppercasing, trimming whitespace, ...), plus the new accent stripping feature.

To illustrate how this would look, here's how you could express the existing collation functions using %SQLGENERIC:

Current function Using %SQLGENERIC()
%EXACT %SQLGENERIC
%SQLSTRING(n) %SQLGENERIC('prepend-space, strip-trailing-whitespace', n)
%SQLUPPER(n) %SQLGENERIC('prepend-space, strip-trailing-whitespace, mode=U', n)
%TRUNCATE(n) %SQLGENERIC('', n)
%MINUS %SQLGENERIC('mode=-')
%PLUS %SQLGENERIC('mode=+')

If you have faced such requirements and/or implemented any of those creative workarounds in the past, please use the poll below or chime in how you've approached this in the past and how you think the above addresses this.

PLEASE NOTE: this is a proposal shared here for soliciting your valued feedback. We are not committing to this specific approach nor a particular target release at this point. We are awaiting your feedback, after all :-)

 

Thanks,
benjamin

Would this help address your Internationalization challenges?
Discussion (5)0
Log in or sign up to continue

I've never faced with accents, so, not sure about this case. But I see many useful use cases for using slugify in SQL. But this feature looks more complex in realization. There are many realizations in many languages, but no standard at all.

For the info, slug, slugify, translates string in any language to ASCII, URL compatible string.

For instance, it would help to get a cost-effective, language-independent index, but with a quite correct order.

Thanks for the reference. That's indeed a very good approach to solve the international-exact-sort question through NLS collations (see also this note). The new SQL collation described above is meant to provide an easy way to have an international-broad-brush transformation to accommodate the non-exact cases, such as using a LIKE operator that doesn't trip over a single-accent difference.