david clifte · Dec 2, 2016

How to remove accentuation?

How to remove accentuation of a word?


Árvore = Arvore

você = voce

Então = entao

The words above are in brazilian portuguese, I need to get rid with the accentuation such that I can compare two sentences.

Use $translate?


ClassMethod NoAccents(stringWithAccents as %String) as %String 


 w "before: ",stringWithAccents

 set accent="Áêã",usual="Aea",!

 set val=$translate(stringWithAccents,accent,usual)

 w "after: ",val

 return val


To handle this in the general case, you would decompose the string, then strip out non-spacing marks. Unicode normalization has been requested previously, and will hopefully make it into the product at some point.

Another option is to use a regular expression, like this:

ClassMethod ReplaceAccents(ByRef pWord As %String) As %Status
  Set tSC = $$$OK
  Try {
      Set dictionary = ##class(%ArrayOfDataTypes).%New()
      Do dictionary.SetAt("ÀÁÂÃÄÅ","A")
      Do dictionary.SetAt("àáâãäå","a")
      Do dictionary.SetAt("ÈÉÊË","E")
      //.... all the rest
      While dictionary.GetNext(.key) {
        Set matcher = ##class(%Regex.Matcher).%New("["_ dictionary.GetAt(key) _ "]", pWord)
        Set pWord = matcher.ReplaceAll(key)
  Catch tException {
    Set:$$$ISOK(tSC) tSC = tException.AsStatus()
  Quit tSC