Question
david clifte · Dec 2, 2016

How to remove accentuation?

How to remove accentuation of a word?

Ex:

Árvore = Arvore

você = voce

Então = entao

The words above are in brazilian portuguese, I need to get rid with the accentuation such that I can compare two sentences.

Thanks in advance.

 

 

 

 

 

00
1 0 3 491
Log in or sign up to continue

Use $translate?

e.g.

ClassMethod NoAccents(stringWithAccents as %String) as %String 

{

 w "before: ",stringWithAccents

 set accent="Áêã",usual="Aea",!

 set val=$translate(stringWithAccents,accent,usual)

 w "after: ",val

 return val

}

To handle this in the general case, you would decompose the string, then strip out non-spacing marks. Unicode normalization has been requested previously, and will hopefully make it into the product at some point.

Another option is to use a regular expression, like this:

ClassMethod ReplaceAccents(ByRef pWord As %String) As %Status
{
  Set tSC = $$$OK
  Try {
      Set dictionary = ##class(%ArrayOfDataTypes).%New()
      Do dictionary.SetAt("ÀÁÂÃÄÅ","A")
      Do dictionary.SetAt("àáâãäå","a")
      Do dictionary.SetAt("ÈÉÊË","E")
      //.... all the rest
   
      While dictionary.GetNext(.key) {
        Set matcher = ##class(%Regex.Matcher).%New("["_ dictionary.GetAt(key) _ "]", pWord)
        Set pWord = matcher.ReplaceAll(key)
      }
  Catch tException {
    Set:$$$ISOK(tSC) tSC = tException.AsStatus()
  }
  Quit tSC
}