Question
· Mar 7, 2017

Getting URL detector RegEx to work with Caché %Regex.Matcher

I am trying to write some code that takes in a string and does a serverside transformation of it to find embedded URLs and replace it with clickable links.   I found the following regex for Javascript which is rated highly on StackOverflow;

    replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim;
    replacedText = inputText.replace(replacePattern1, '<a href="$1" target="_blank">$1</a>');

And I tried to do the following in Caché ObjectScript but it's not working:

set matcher=##class(%Regex.Matcher).%New("/(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim",string)
set string = matcher.ReplaceAll("<a href='$1' target='_blank'>$1</a>")

After I run the first line if I try matcher.Locate() it always returns a 0 (no matches).

I tested the RegEx on https://regex101.com/ and confirmed that it is finding the groups just as I expected it would.  But it isn't working within Caché.

I admit that I am not a RegEx expert (but would like to learn).

Can anyone shed light onto why this isn't finding any matches in Caché when it does in JS?  I can't even get it to work on a simple case:

s string="http://www.google.com"

Thanks in advance for the help!

Ben

Discussion (3)1
Log in or sign up to continue

Here's a solution that works for me:

s string="http://www.google.com"
set matcher=##class(%Regex.Matcher).%New("(\b(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|])",string)
w matcher.ReplaceAll("<a href='$1' target='_blank'>$1</a>")

Key changes:

  • Remove the enclosing / /gim - this is part of the regex syntax in JS.
  • Add two a-z ranges for case insensitivity
  • Remove JS escaping of slashes (not necessary)

Thanks Tim!!  Very helpful.

One question /comment - your approach doesn't allow for case insensitivity of the http(s)/ftp prefix.  I would prefer to set the case insensitivity flag for the whole pattern.

According to the ICU documentation (http://userguide.icu-project.org/strings/regexp#TOC-Flag-Options):  

[quote]

(?ismwx-ismwx)Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match.

[/quote]

So I was able to make it work as follows:

set matcher=##class(%Regex.Matcher).%New("(?i)(\b(https?|ftp)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|])",string)
set string = matcher.ReplaceAll("<a href='$1' target='_blank'>$1</a>")
 

Thanks for the tips and pointing me in the right direction!