Hi,

I knocked up some code to extract the plain text from an RTF document. It works for my purposes but would like to know if anyone can find a case where it does a bad job.

Copy your .rtf file into a flat array, eg x(1)=first line, x(2)=second line then:

d ##class(yourclass).StripRTF(.x,.y)

and you'll get the plain text in y

/// accepts an RTF doc in array form, doesn't care if lines are split across array members
/// returns plain text from the doc with one array item per line 
/// quit value is number of lines in array
ClassMethod StripRTF(ByRef rtfText = "", ByRef %plainText) As %Integer
{
 // use this code to view an rtf doc split into groups and indented when group level goes up
 // S IN=0,C=0 F I=1:1 S C=##class(ZSS.SMTP).EXTRACT(.x,I) Q:C="" I C="{" {S IN=IN+1 W !?IN*2,"(",IN,")"} W C I C="}" {W "(",IN,")" S IN=IN-1 W !?IN*2}
%line
 kill %plainText Set %line=0
 // you could add some speed here by working out how to set characters in the R array to null and
 // reducing the i pointer accordingly
R=rtfText
..Brace(.R,$i(i))
%line
}

/// A bit like $Extract but the first argument is a single level array, like array(1)="some text", array(2)="some more text"
/// ..EXTRACT(.array,9,11) would return "tso" as the 9th, 10th, and 11th characters of the data in the array
ClassMethod EXTRACT(ByRef array, from As %Integer, to As %Integer, SetToNull = 0) As %String
{
'$d(to) to=from
UncleLen=from-to+1 // length of string to return
string=""
index=1:1 {
  q:'$d(array(index))
short=array(index)
from'>$l(short) {
string=string_$e(short,from,to)
   // mimic SET $EXTRACT
SetToNull $e(array(index),from,to)=""
   q:$l(string)=UncleLen
  }
from=from-$l(short),to=to-$l(short)
from<0 from=0
 }
string
}
/// discard everything between two braces including the braces
/// done by either moving i to the last brace
/// or if SetToNull is passed as 1 then actually removing the characters from the RTF array
ClassMethod Discard(ByRef rtfText, ByRef i, SetToNull = 0)
{
inGroup=1,start=i,stop=i,discarded=""
 f  {
discard=..EXTRACT(.rtfText,$i(i))
  q:discard=""
discard="}" {
inGroup=inGroup-1
  }
discard="{" inGroup=inGroup+1
 q
 }
}
SetToNull ..EXTRACT(.rtfText,start,i,1)
}
/// return the contents of a Slash
ClassMethod Slash(ByRef R, ByRef i) As %String
{
string="\"
 F  {
  Set char=..EXTRACT(.R, $I(i))
  q:char="" // should never happen but don't want to get stuck in a loop because of a bad file
char="}" i=i-1 q
char=" " q
char="\" i=i-1 q
char="{" {
string?1"\"1a.an {
    // everything inside something like \abc1{this stuff}
    // example {\fonttbl{\fprq2{02020603050405020304}TimesNew Roman;} should all be discarded
..Discard(.R,.i)
string=$c(127)
    q
   }
char=..Brace(.R,.i)
  }
char="*",..EXTRACT(.R, i-2,i+1)="{\*\" {
   // "{\*\" at the start of a brace means the whole brace including nested braces can be ignored
..Discard(.R,.i)
string=$c(127)
   q
  }
char="'" {
string=string_char_..EXTRACT(.R, $I(i),$I(i))
   q
  }
string=string_char
  // everything in here should disappear because it looks like {\stylesheet{
string="\stylesheet"!(string="\info") {
..Discard(.R,.i)
string=$c(127)
   q
  }
 }
 //
 // add stuff here for special characters represented by \codeword
 // you could put them on a global
 //
string'="" {
  try {
  // catching stupid subscripts
string=$G(^GMAT("RTF","special characters",string),string)
  }
  catch {
   // do nowt
  }
 }
string="\lquote" string="'"
string="\rquote" string="'"
string="\ldblquote" string=""""
string="\rdblquote" string=""""
string="\'93" string=""""  // $c($zh("93")) should be a left double quote
string="\'94" string=""""
string?1"\'"2an string=$c($zh($p(string,"'",2)))
string="\par" string=$c(13,10)
 // REMOVING \codename and \codename1 and \codename1;
string ?1"\"1a.an.1";" ""
 // REMOVING \codename-20
string ?1"\"1.a1"-"1.n.1";" ""
string="{}" ""
string
 }
/// return the contents of a pair of braces (troosers!)
ClassMethod Brace(ByRef R, ByRef i) As %String
{
string="{"
 F  {
  Set char=..EXTRACT(.R, $I(i),i+1)
  // reached the end and there will be a loose "}"
char="",string="}" string=""
  q:char=""
  // escaped characters that should be allowed through to the text
escape=0
$lf($lb("\\","\{","\}"),char) i=i+1,escape=1
  e  char=$e(char)
char="\" {
char=..Slash(.R,.i)
  // we hit a ..Discard so remove the brace before it
char=$c(127) char="",$e(string,*)=""
string=string_char
  continue
 }
char="{" {
char=..Brace(.R,.i)
char=$c(13,10) {
   //works for 1st line only s %plainText($i(%line))=string,string="",char=""
{
%plainText($i(%line))=$p(string,$c(13,10),1)
string=$p(string,$c(13,10),2,*)
   q:string=""
  }
char=""
  }
string=string_char
  continue
  }
  // HEX ascii
char="\'" {
char=$C($ZHEX(..EXTRACT(.R, $I(i),$I(i))))
  }
string=string_$e(char,*)
string="{}" string=""
  q:char="}"
 }
string="{}" {
string=""
 }
$e(string)="{",$e(string,*)="}" {
string=$e(string,2,*-1)
 }
string
}

Hi Julius, I was playing Devil's "avocado 😁" with that. One could argue the case for that being the correct approach. One could also argue for Paul's unit tests based on seeing no reason why rotational symmetry couldn't apply. (I think my solution for that is 225 and doesn't beat my friend Paul's answer so I don't like it.) The point is that the spiral has to get smaller somewhere but we haven't been given a clear enough rule to follow. In the example in the question, the spiral only turns inward when it hits a used letter or a corner - a dead end as @Robert Cemper puts it. The unit tests below could also be valid. (My solution for this is 199)

There's an unwritten rule in the question that could be "as a human, make a decision when to turn inwards to make sense out of the order of letters". How do you code that?

        Set matrix($INCREMENT(matrix)) = "A,B,C"
        Set matrix($INCREMENT(matrix)) = "H,I,D"
        Set matrix($INCREMENT(matrix)) = "G,F,E"
        d $$$AssertEquals(..Solution(.matrix, 1, 1), "ABCDEFGHI")
        d $$$AssertEquals(..Solution(.matrix, 1, 2), "BCDEFGHA") // !!?
        d $$$AssertEquals(..Solution(.matrix, 1, 3), "CDEFGHABI")
        d $$$AssertEquals(..Solution(.matrix, 2, 3), "DEFGHABC") // !!?
        d $$$AssertEquals(..Solution(.matrix, 3, 3), "EFGHABCDI")
        d $$$AssertEquals(..Solution(.matrix, 3, 2), "FGHABCDE") // !!?
        d $$$AssertEquals(..Solution(.matrix, 3, 1), "GHABCDEFI")
        d $$$AssertEquals(..Solution(.matrix, 2, 1), "HABCDEFG") // !!?

You are correct Paul, but there's a problem with other tests too. The one with 3,1 above could return GHABCDI because you could argue that when the spiral leaves an outer edge it should never return to it. If the test that starts at 1,2 doesn't return to row 1 then why should 1,3 return to row 1 after leaving? These could be the correct tests:

        Set matrix($INCREMENT(matrix)) = "A,B,C"
        Set matrix($INCREMENT(matrix)) = "H,I,D"
        Set matrix($INCREMENT(matrix)) = "G,F,E"
        d $$$AssertEquals(..Solution(.matrix, 1, 1), "ABCDEFGHI")
        d $$$AssertEquals(..Solution(.matrix, 1, 2), "BCDEFGHI")
        d $$$AssertEquals(..Solution(.matrix, 1, 3), "CDEFGHI")
        d $$$AssertEquals(..Solution(.matrix, 2, 3), "DEFGHI")
        d $$$AssertEquals(..Solution(.matrix, 3, 3), "EFGHI")
        d $$$AssertEquals(..Solution(.matrix, 3, 2), "FGHI")
        d $$$AssertEquals(..Solution(.matrix, 3, 1), "GHI")
        d $$$AssertEquals(..Solution(.matrix, 2, 1), "HI")

It took me a minute or two to spot that you had gone with the valid assumption that n would be defined as the number of rows and columns from $I(matrix) which I didn't notice, and the matrix would be square.

I've gone for an interpretation that I think is equivalent to clockwise, the result is the same: I'm going East (right), South (down), West (left), North (up), then back to East again. Heading in a direction until a dead end then changing direction. As such I have variable V to indicate direction of movement V=1 for vertical (up or down, north or south) and V=0 for horizontal (left or right, west or east). Then another variable D, D=1 indicating increase and D=-1 for decrease. When you hit a dead end change V with V='V and if V becomes 0 then D=-D. That will always spin you clockwise but without limits so you need code to spot a dead end. Also, as others have said, there's no way to know which direction to start in. If you are in the bottom left corner do you start by going right or up?

Here's my code with comments:

ClassMethod Solution(ByRef m, x, y) As %String
{
 // enforcing a decreasing spiral, 210 without comments
 // Right means change y with V=0, D=1
 // Down means change x with V=1, D=1
 // Left means change y with V=0, D=-1
 // Up means change x with V=1 ,D=-1
 //
 // first letter starts the word
 w=$p(m(x),",",y),V=0,D=1
 // mark the position so it can't be re-used
 // get the next letter from $$n, end when no letter comes back, extend the word if it does, and repeat
n(x,y)=w,l=$$q:e=3 w=w_a
 // default start position to right and clockwise
 // quit if you've tried 2 directions
 // get the letter in that direction
 // if no letter or already got then change direction clockwise
 // to change direction, if currently vertical then you won't be for the next move and vice versa
 // if you are now horizontal then change direction
n() q:$i(e)=3 "" X=V*D+x,Y='V*D+y,l=$p($g(m(X)),",",Y) l=""!$d(n(X,Y)) V='S:'D=-$$n()
 // flag any anticlockwise letter as used
 n(V*-D+x,'V*-D+y)=0,x=X,y=Y,e=0 l
}
 

Hi Robert, I had to write something similar myself about 20 years ago when we moved from DSM to Caché to deal with all those pesky &$ZLIB functions. On top of that there was the problem of variable names that exceeded 8 characters. DSM was quite happy to truncate a variable to 8 characters and work with it. For example, a variable named LONGVAR was the same variable as one named LONGVARIABLE just because the first 8 were the same. Under DSM if you SET one of them the other was also SET, but not under Caché. All of these had to be found and corrected.

In the absence of an Intersystems' prescribed method I would set up a global variable with extra information on each namespace, e.g.

^%ZSYS("NAMESPACE","SYS")="IRIS"

^%ZSYS("NAMESPACE","LIVE")="CUSTOMER PRODUCTION"

The global variable with % at the start means it can be seen from all namespaces. You can add whatever information you like to it like the difference between production and non-prod etc.

Shaved down to 174. A dubious $Find-2 to get first string length. Changed a condition in the $Select from p<c&(r<3) to r<3*c>p

ClassMethod Type(a...) As %String
{
 j=$i(r):1:a{w=$tr(a(j)," "),p=$f(w,",")-2 i=2:1:$l(w,",") c=$l($p(w,",",i)),r=$s(p=c:r,r<3*c>p:2,r#2:3,1:4),p=c$p("Constant7Increasing7Decreasing7Unsorted",7,r)
}
also fits in @Robert Barbiaux attempt

ClassMethod Type(a...) As %String
{
 i=$i(r):1:$g(a){j=1:1:$l(a(i),","){l=$l($tr($p(a(i),",",j)," ")),c=$g(c,l),r=$s(l=c:r,r<3*l>c:2,r#2:3,1:4),c=lc$p("Constant1Increasing1Decreasing1Unsorted",1,r)
}