Solution for T9 Spelling Problem in Caché ObjectScript

Hi, Community!

Last weekend we held the Final of InterSystems Contest on Caché, DeepSee and iKnow under the aegis of IT Planet Student Championship in Ekaterinburg. BTW, this year we had more than 1,400 participants in InterSystems Contest.

One of the tasks for the final was to solve T9 Spelling problem  with Caché ObjectScript and use the minimum code. 

Problem description:
The Latin alphabet contains 26 characters and telephones only have ten digits on the keypad. We would like to make it easier to write a message to your friend using a sequence of keypresses to indicate the desired characters. The letters are mapped onto the digits as shown below. To insert the character B for instance, the program would press 22. In order to insert two characters in sequence from the same key, the user must pause before pressing the key a second time. The space character ' ' should be printed to indicate a pause. For example, 2 2 indicates AA whereas 22 indicates B.

Make the method which inputs string and returns digits.
Example:

USER>write ##class(ITPlanet.Task1).ToPhone(“hi”)
44 444
USER>write ##class(ITPlanet.Task1).ToPhone(“yes”)
999337777
USER>write ##class(ITPlanet.Task1).ToPhone(“foo  bar”)
333666 6660 022 2777
USER>write ##class(ITPlanet.Task1).ToPhone(“hello world”)
4433555 555666096667775553

The best result amongst the students was 192.

What's yours? ;)

  • + 3
  • 0
  • 682
  • 14
  • 6

Answers

I count 136 symbols in my solution (thanks Dmitry Maslennikov for advice)

p(t) public
{ 
 s s="" f i=1:1:$l(t){s p=$f(" #######abc#def#ghi#jkl#mno#pqrstuv#wxyz",$e(t,i))-2 s:p\4=$e(s,*) s=s_" " f j=1:1:p#4+1 s s=s_(p\4)} q s 
}

long variant: 

p(text) public
{
s str="" 
for i=1:1:$l(text)
{
 ;_________0000111122223333444455556666777788889999
 s pos=$f(" #######abc#def#ghi#jkl#mno#pqrstuv#wxyz",$e(text,i))-2 
 s:pos\4=$e(str,*) str=str_" " 
 for j=1:1:pos#4+1 s str=str_(pos\4)
} 
q str 
}

One more variant from Russian forum:

p(t)
{
R="" %=1:1:$l(tN=0:1:3 n=$f("adgjmptw ",$c($a($e(t,%))-N)) s:n n=n#10,R=R_$e(" ",$e(R,*)=n)_$e(n*1111,1,N+1),N=9 
}

132 symbols.

 

My result is 157 :)

I think I let other's to share their size, and tomorrow or sometime later I can share it.

157 is impressive: my first result was 174, which Eduard has improved to 161. I didn't thought it might be improven even further :)

Well, my code below

toPhone(s  = "") {
  s r="",c=96,p=1 f i=3,3,3,3,3,4,3,4{s:$i(p) k="" f j=1:1:i s k=k_p,c($c($i(c)))=k} f i=1:1:$l(s){s n=$g(c($e(s,i)),0) s:$e(n)=$e(r,*) r=r_" " s r=r_n} q r
}

In first I'm generate a plane array with all variants, and then just get all numbers from it, include space between identical numbers, and zero for unknown symbols such as space

and 149, with a little bit changes

ToPhone(s = "")
{
 s r="",c=96 f p=2:1:9{s k="" f j=1:1:3+(p=7)+(p=9) s k=k_p,c($c($i(c)))=k} f i=1:1:$l(s){s n=$g(c($e(s,i)),0) s:$e(n)=$e(r,*) r=r_" " s r=r_n} q r
}

And would you please provide "readable" version too just to see the algorithm better? TIA!

 

ToPhone(s = "")
{
 set r=""
 set c=96 
 for p=2:1:9 {
     set k="" 
     for j=1:1:3+(p=7)+(p=9) { 
         set k=k_p
         set c($c($i(c)))=k
     }
 } 
 for i=1:1:$l(s) { 
     set n=$g(c($e(s,i)),0) 
     if $e(n)=$e(r,*) set r=r_" " 
     set r=r_n
 } 
 quit r
}

Let see to the keypad again, it's getting obvious instantly that keys are (mostly) located by groups of 3 symbols, and if there would not be those "s" (corresponding to "7777") and "z" (which produces "9999") then implementation will be simple formula with division by 3 and corresponding modulo. Also space is exception and is not a part of sequential numerics.

So given this assumprion let us create the 1st approximation (no compression, not name reduction, everything ie readable and commented):

0(s) public {
    set S=""
    for %=1:1:$length(s) {
        set P=""
        set = $e(s,%)
        set = $a(c) - $a("a")
        if c=" " 
            set P="0"
        elseif c="z" {
            set P="9999"
        elseif c="s" {
            set P="7777"
        elseif i<($a("s") - $a("a")) // ($a("s") - $a("a")) = 18
            set n=i\3+1,m=i#3+1
            set before2 = $a("1") ; 49
            set $p(P,$c(+ before2),m+1)=""
        else {
            set n=i-1\3+1,m=i-1#3+1
            set before2 = $a("1") ; 49
            set $p(P,$c(+ before2),m+1)=""
        }
        set:$extract(S,*)=$extract(P,1) S=S_" "
        set S=S_P
    }
    quit S
}

[Don't botther to count symbols - we will compress the code a bit]

This strange `set $piece(string,symbol,offset+1) = ""` is actually filling of a string with the given symbol.

Let us review those several ifs, we see, actually, 2 groups of them:

  • 3 ifs for exceptions fom formulae;
  • 2 ifs for disjointed rows of groupd by 3 keys. The formulae is atually the same, but witth some offset.

So let's get rid of ifs via $select and extra offset itroduced.

#; get rid of ifs, replace with $selects
1(s) public {
    set S=""
    for %=1:1:$length(s) {
        set P=""
        set = $e(s,%)
        set = $a(c) - $a("a") ; $a("a")=97
        set = '< 18 ; ($a("s") - $a("a")) = 18
        set $p(P, $c(- o\3+1 + $a("1")), - o#3+1+1)=""
        set P=$select(c=" ": "0", 
                      c="z": "9999",
                      c="s": "7777",
                      1:P)
        set:$extract(S,*)=$extract(P,1) S=S_" "
        set S=S_P
    }
    quit S
}

[I believe this step is still obvious]

[[I dislike the way I had to put expressions without pairs but we need as short as possible, sothis is inevitable evil.]]

Now this is time to get shorter, but stiill readable version:

#; name reduction
2(s) public {
    S=""
    %=1:1:$l(s) {
        P=""
        = $e(s,%)
        = $a(c) - $a("a") ; $a("a")=97
        = '< 18
        $p(P, $c(- o\3+1 + $a("1")), - o#3+1+1)=""
        P=$s(c=" ": "0", c="z": "9999", c="s": "7777", 1:P)
        s:$e(S,*)=$e(P,1) S=S_" "
        S=S_P
    }
    S
}

That was simple- select code in Studio, then press Ctrl+Shift+E. 

And the latest step, is to convert this barely readable code to 1 line mess hard-code stuff:

#; linearization
3(s) public {
 S="" %=1:1:$l(s) {P="",c=$e(s,%),i=$a(c)-97,o=i'<18,$p(P,$c(i-o\3+50),i-o#3+2)="",P=$s(c=" ":"0",c="z":"9999",c="s":"7777",1:P) s:$e(S,*)=$e(P,1) S=S_" " S=S_PS
}

That's (if you count the 1st indent symbol) 173 characters length.

P.S.

Eduard Lebedyuk has improved this result a little bit:

  • he has replaced quoted literals with numerics (because they are both represend canonical numerics
    from ObjectScript point of view);
  • and replaced (where possiblle) comparisons of characters with comparisons of their derived ordinals (minus 97, or "a")
#; +Eduard modifications
4(s) public {
 S="" %=1:1:$l(s){P="",c=$e(s,%),i=$a(c)-97,o=i>17,$p(P,$c(i-o\3+50),i-o#3+2)=P,P=$s(i<0:0,i=25:9999,i=18:7777,1:P),S=S_$s($e(S,*)=$e(P):" "_P,1:P)S
}

We are 158 now!

I only managed to get my code down to 207 characters and it's even missing the whitespace->0 mapping, I saw that too late. Butwhoneedswhitespacestodayanyway?

In case you are curious here is my code:

 #define a(%x) $LG(o,$a($E(s,%x,%x))-96)
 s (x,o)="" f m=2:1:9{s r="" f n=1:1:$S(m=7:4,m=9:4,1:3){s r=r_m,o=o_$LB(r)}} f j=1:1:$L(s)-1{s x=x_$$$a(j) s:$E($$$a(j+1),1,1)=$E(x,$L(x),*) x=x_" "} q x_$$$a($L(s))

The first part actually builds a $LIST with the mapping from characters to the corresponding keypad sequence. The second part loops over the input string and does a lookup for the keypad sequence. A look-ahead checks if I have to add a whitespace to indicate a pause. Because of the look-ahead I have to treat the last character conversion outside of my main loop.

Tricks I used:

  • Replace if statements with $SELECT
  • Use SET statement to set multiple variables to the same values
  • Built a macro for code you need multiple times (I need the lookup three times, in my main loop, for the look-ahead and for the last character conversion) 

Not an optimal algorithm and it's even lacking a piece, but still it is a fun exercise.

Here's my code, of course I could shrink this down a lot, but figured people might want to read it.

Personally, I think looking at different ways of doing the same thing are a great learning lesson.  Judging based on the length of the code, simply encourages bad programming.

ClassMethod toPhone(i As %String)
{
    set k="----abc-def-ghi-jkl-mno-pqrstuv-wxyz ---"
    set i=$zconvert(i,"L"),o=""
    f p=1:1:$l(i) s c=$e(i,p),d=$f(k,c)\4+1,r=3-(d*4-($f(k,c))) s:r<1 r=4+r,d=d-1 s:d>9 d=0 s z="",$p(z,d,r+1)="",o=o_$s($e(o,$l(o))=d:" ",1:"")_z
    q o
}

Personally, I think looking at different ways of doing the same thing are a great learning lesson.  Judging based on the length of the code, simply encourages bad programming.

Agreed. So I think is good to provide two versions: readable and the shortest.

Yet another varaint of 148 characters length (for 8-bit instances only):

ClassMethod am(t) As %String [ ProcedureBlock = 1 ]
{
 s="",a="!""#123ABCQRSabcqrstЃ‚ѓ‘’“”•" i=1:1:$l(t){p=$a(a,$a(t,i)-96),j=p\16 s:$e(s,*)=s=s_" " s=s_$s(j:$tr($j("",p#16)," ",j),1:0)s
}

Comments

Alexander Koblov, thank you for such interesting problem for the Contest Final. Your variant? ;)

One more solution from Russian forum:

ClassMethod gluconatV2(xAs %String ProcedureBlock = 1 ]
{
 y="" i=1:1:$l(x){c=$e(x,i),a=$a(c)-97-(c]"r")-(c]"y"),z=c]" ",d=a\3+2*s:d=$e(y,*) y=y_" " j=0:1:a#3+("sz"[c)*y=y_dy
}

And one "extremum" version:

ClassMethod ToPhone( t, "(t){s s="" f i=1:1:$l(t) {f N=0:1:3 s n=$f(""adgjmptw "",$c($a($e(t,i))-N)) s:n n=n#10,s=s_$e("" "",s_-1[(n_-1))_$e(n_n_n_n,1,N+1),N=9} q s}"As %String CodeMode = expression ]
{
$xecute(a,t)
}

 

> one "extremum" version:

Does not really count. The method signature must be:

ClassMethod ToPhone(t As %String) As %String {}