Wesley West · May 20, 2022

$zstrip a variable to separate the city, state and zip

I am looking to separate out a variable into 3 other variables


S REC="CANTON,TX.,75103"

S ZIP=$ZSTRIP(REC,"*","AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz,.")   RETURNS "75103"



I am having trouble stripping the other info, any help would be greatly appreciated and I do realize there is a better way to strip the zip, I just haven't found it yet.


Thanks in Advance!

Product version: Caché 2017.1
0 318
Discussion (15)2
Log in or sign up to continue

try this...

s REC="CANTON,TX,.75103"

s ZIP=$p(REC,",",3),STA=$p(REC,",",2),CTY=$p(REC,",")

I thought about that but I realized that not every entry is seperated by commas, some are 

MILFORD, OH 45150 


so you could try this: 

s REC="A B C,12345"

s REC=$tr(REC," ",",")     ; Translates all Spaces to commas

f x=1:1:$l(REC,",") s Bit=$p(REC,",",x) I Bit?.N s ZIP=Bit      ; This presumes that only one piece of the string is numeric and therefore that's your Zip code. (if there was more than one piece with numerics then it will use the last one) You could be more specific with the pattern match and specify it's ?1.5N (i.e. 1 to 5 Numerics)

sticking with your original plan of using $zstrip you could also do this:

s ZIP=$zstrip(REC,"*E'N")      ; Strips Everything (*E), Except Numeric ('N) but this is a bit dodgy as if there's any other number in the string it will appear in the result

If you can assume that the zip is always 5 digits and state is 2 digits, and if you can assume that the separator is always 1 character then you can do it by position:

set ZIP=$EXTRACT(x,*-4,*), STA=$EXTRACT(x,*-7,*-6), CTY=$EXTRACT(x,0,*-9)

If REC is consistently 3 "Words" with or without the comma then this should work...

s REC="MILFORD OH 12345"

s REC=$tr(REC," ",",")

s CTY=$p(REC,","),STA=$p(REC,",",2),ZIP=$p(REC,",",3)


Set REC="CANTON,TX.,75103"
Set li=$LFS(REC)
SET ZIP=$li(li,3)
SET STA= $li(li,2)
SET CTY=$li(li,1)


Another possibility using %Regex.Matcher :

Set REC = "MILFORD, OH 45150 "
Set m=##class(%Regex.Matcher).%New("([a-zA-Z]+)([\s,]+)([a-zA-Z]+)([\s,]+)([0-9]{5})")
Set m.Text = REC
Do m.Locate()
Set ZIP = m.Group(5)
Set CITY = m.Group(1)
Set STA = m.Group(3)

@Lorenzo Scalese You beat me to it smiley

I would suggest a different pattern though.  The pattern below will account for cities that have spaces in the names and extended zip codes.  It also performs a case insensitive match ( the (?i) at the start of the pattern)

"(?i)((?: *[\.\w])+)+[, ]+([a-z]{2})[.,\s]+(\d{5}(?:-\d{4})?)"

Try this one. The idea is, find the state (including the separators), everything before is the city and everything after is the zip code. Then we remove the separator chars (whitespaces, commas and dots).

ClassMethod Disjoin(data, cty, sta, zip)
    i $locate(data,"(\s|,|\.)[A-Za-z]{2}(\s|,|\.)",3,,sta) {
        s $lb(cty,zip)=$lfs(data,sta), sta=$$s(sta), cty=$$s(cty), zip=$$s(zip)
    } else { s (cty,sta,zip)="" }
    q sta]""
s(x)	q $zstrip(x,"<>w",",.")
Some examples
i ##class(DC.Test).Disjoin("CANTON,TX.,75103",.c,.s,.z) w c,", ",s,", ",z --> CANTON, TX, 75103
i ##class(DC.Test).Disjoin("MILFORD, OH 45150",.c,.s,.z) w c,", ",s,", ",z --> MILFORD, OH, 45150
i ##class(DC.Test).Disjoin("MILFORD OH 45150",.c,.s,.z) w c,", ",s,", ",z --> MILFORD, OH, 45150
i ##class(DC.Test).Disjoin("KANSAS CITY, MO, 12345",.c,.s,.z) w c,", ",s,", ",z --> KANSAS CITY, MO, 12345
i ##class(DC.Test).Disjoin("KANSAS CITY MO, 12345",.c,.s,.z) w c,", ",s,", ",z --> KANSAS CITY, MO, 12345
i ##class(DC.Test).Disjoin("ST. LOUIS MO, 12345",.c,.s,.z) w c,", ",s,", ",z --> ST. LOUIS, MO, 12345
i ##class(DC.Test).Disjoin("  ST. LOUIS MO, 12345",.c,.s,.z) w c,", ",s,", ",z --> ST. LOUIS, MO, 12345

OK, something like this gives a wrong result...
i ##class(DC.Test).Disjoin("   ST. LOUIS MO, 12345",.c,.s,.z) w c,", ",s,", ",z --> , ST, LOUIS MO, 12345

Great code Julius!

To fix the problem with leading spaces, you can use :

Set data = $zstrip(data,"<>"," ")

Oh, thanks for the hint, I'm aware of that. Actually one should remove the same characters as used in $locate():

if $locate($zstrip(data,"<w",",."), ...)

but the point is, to circumvent such problems, the rule number one in the electronic data processing is: you have to apply for check each and every input (at least) a formal check or you end up with problems like this. So the desired process should be:

read_data --> check_it --> proceed_if_OK_else_back_to_input

The same goes for data during an import process.


Your example is a great usage of the $Piece statement.

Set REC="CANTON,TX.,75103"

Set CTY=$Piece(REC,",",1)

Set STA=$Piece(REC,",",2)

Set ZIP=$Piece(REC,",",3)

Sorry, I didn't see all the responses before answering.


I am a retired Cache/MUMPS developer and have far too much time on my hands.  However, I think I have found a smooth and reliable way to accomplish what you need.  This was fun.  Thanks for the challenge.

 New X,CSZ
 Set CSZ($Increment(CSZ(0)))="CANTON,TX.,75103"
 Set CSZ($Increment(CSZ(0)))="MILFORD, OH 45150"
 Set CSZ($Increment(CSZ(0)))="MILFORD OH 45150"
 Set CSZ($Increment(CSZ(0)))="KANSAS CITY, MO, 12345"
 Set CSZ($Increment(CSZ(0)))="KANSAS CITY MO, 12345"
 Set CSZ($Increment(CSZ(0)))="ST. LOUIS MO, 12345"
 Set CSZ($Increment(CSZ(0)))=" ST. LOUIS MO, 12345"
 For X=1:1:CSZ(0) Write $$CSZ2(CSZ(X)),!
 Set CSZF=$Zconvert(CSZF,"U")
 Set CSZF=$Translate(CSZF,",."," ") // Translate to spaces
 Set CSZF=$Zstrip(CSZF,"<=>"," ") // Remove Leading, Trailing, and multiple spaces.
 Set SC=$Length(CSZF," ") // Count the number of spaces
 Set CITY=$Piece(CSZF," ",1,(SC-2)) // Select the multiple of city pieces
 Set STATE=$Piece(CSZF," ",(SC-1)) // Select one less the the max piece.
 Set ZIP=$Piece(CSZF," ",SC) // Select the max piece.
 Quit CITY_"|"_STATE_"|"_ZIP


Thank you everyone for the suggestions! I have a lot to digest!  :)