How to distinguish the String from Integer in COS

I get a list which is formed like below:

$lb("","2",6,6,6,"3")

I would like to distinguish "2" and "3" from 6 since both of them are String types.

Is there any way to achieve this?

Thanks.

  • 0
  • 0
  • 527
  • 11
  • 2

Answers

try this

 set list=$lb("","2",6,6,6,"3")
   ​for i=1:1:$LL(list) { write !,$li(list,i),?5
      if $lb($li(list,i)) = $lb(+$li(list,i))  write "Integer"
      else  write "String"
   }

 Result:

     String
2    String
6    Integer
6    Integer
6    Integer
3    String 

so the text might be changed to "not Integer" instead of "String"
anyhow $LI($LB(2.3e3)) => 2300   and the "e" is gone

Checking numeric values with condition "if val=+val" is generally bad idea:
Try this for examlpe:

set val="1e78509d-be2f-4164-ac3b-825d6c21c074"
if val=+val write "is number",!

A good friend of mine provided this solution using also  JSON types and internal $LB() types for more detailed type analysis.
Of course there's no guaranty  for the hidden types in $LB() . 
 

types ; Show COS-Datatypes ; kav ; 2018-03-04
 //
 set obj=##class(Person).%New()
 write "VALUE",?15,"JSON",?22,"$LISTBUILD",! ,$TR($J("",32)," ","-"),!
 for val="453","abcd",1234,"abc"_$c(352), -34, 1.3, -7.9, $double(1.25),obj {
   write val,?15,$$TypeOf1(val),?22,$$TypeOf2(val,.txt),.txt! }
 quit
 // Return JSON datatype by the documented way
 //
TypeOf1(val) Public
{
  quit [(val)].%GetTypeOf(0) }
 // Return datatype by the undocumented $LB() way
 //
TypeOf2(val) Public
 if $length(val)>253
    set typ=$ziswide(val)+1
  else 
    set typ=$a($lb(val),2) }
  set txt=$case(typ
    ,1:" 8bitString"
    ,2:" 16bitString"
    ,4:" nonNegativeInteger"
    ,5:" negativeInteger"
    ,6:" nonNegativeFloat"
    ,7:" negativeFloat"
    ,8:" double"
    , :" ??? never seen before")
 
 quit typ
}
d ^types
VALUE          JSON   $LISTBUILD
--------------------------------
453            string 1 8bitString
abcd           string 1 8bitString
1234           number 4 nonNegativeInteger
abcŠ           string 2 16bitString
-34            number 5 negativeInteger
1.3            number 6 nonNegativeFloat
-7.9           number 7 negativeFloat
1.25           number 8 double
1@User.Person  oref   1 8bitString

 

Fixed bugs/typos and added new types.

#include %PVA
types ; Show COS-Datatypes ; kav ; 2018-03-04
 array,i,bitstring
 
 s $bit(bitstring,1) = 1

 "№",?4,"VALUE",?30,"JSON",?45,"$LISTBUILD",!,$TR($J("",55)," ","-"),!
 
 array=[
          (bitstring),
          ($ZBITSET($ZBITSTR(4,1),2,0)),
          ($lb("")),
          null,
          true,
          false,
          (##class(%ZEN.proxyObject).%New()),
          [],
          {},
          "abcd",
          ($wc(35222)),
          "2",
          2,
          -2,
          2.1,
          -2.1,
          ($double(2.1))
          ]
 array."18"=2
 array."19"=-2
          
 i=0:1:array.%Size() {
   i,")",?4,array.%Get(i),?30,$$TypeOf1(.array,i),?45,$$TypeOf2(array.%Get(i)),!
 }
 
 // Return JSON datatype by the documented way
 //
TypeOf1(&array,key)
{
  typ=array.%GetTypeCodeOf(key)
  typ_" "_$case(typ,
    $$$PVVALUENULL:"null",
    $$$PVVALUETRUE:"boolTrue",
    $$$PVVALUEFALSE:"boolFalse",
    $$$PVVALUEINTEGERPOS:"+int",
    $$$PVVALUEINTEGERNEG:"-int",
    $$$PVVALUEUNUSED1:"unused",
    $$$PVVALUEARRAY:"array",
    $$$PVVALUEOBJECT:"object",
    $$$PVVALUETEXT:"text",
    $$$PVVALUENUMBER:"number",
    $$$PVVALUEOVERFLOW:"overflow",
    $$$PVVALUECACHENUMERIC:"cacheNumeric",
    $$$PVVALUEOREF:"oref",
    $$$PVVALUEUNASSIGNED:"unassigned",
    $$$PVVALUELONGPOS:"+long",
    $$$PVVALUELONGNEG:"-long",
    $$$PVVALUEBYTE:"byte[]",
    $$$PVVALUEDATETIME:"dateTime",
    $$$PVVALUEDOUBLE:"double",
    $$$PVVALUESINGLE:"single",
    $$$PVVALUEUTF8:"utf8",
    $$$PVVALUENESTED:"nested",
    $$$PVVALUEEOF:"eof",
    :"unknown")
}
   
 // Return datatype by the undocumented $LB() way
 //
TypeOf2(val)
 {
  i $l(val)>253 {
    typ=$ziswide(val)+1
  else {
    typ=$a($lb(val),2)
  }
  typ_" "_$case(typ
    ,1:"8bitString"
    ,2:"16bitString"
    ,4:"nonNegativeInteger"
    ,5:"negativeInteger"
    ,6:"nonNegativeFloat"
    ,7:"negativeFloat"
    ,8:"double"
    , :"??? never seen before")
}

Result:

USER>^types
№   VALUE                     JSON           $LISTBUILD
-------------------------------------------------------
0)  Ÿ                          8 text         1 8bitString
                              8 text         1 8bitString
2)                            8 text         1 8bitString
3)                            0 null         1 8bitString
4)  1                         1 boolTrue     1 8bitString
5)  0                         2 boolFalse    1 8bitString
6)  4@%ZEN.proxyObject        12 oref        1 8bitString
7)  2@%Library.DynamicArray   6 array        1 8bitString
8)  1@%Library.DynamicObject  7 object       1 8bitString
9)  abcd                      8 text         1 8bitString
10) 視                         8 text         2 16bitString
11) 2                         8 text         1 8bitString
12) 2                         9 number       4 nonNegativeInteger
13) -2                        9 number       5 negativeInteger
14) 2.1                       9 number       6 nonNegativeFloat
15) -2.1                      9 number       7 negativeFloat
16) 2.1000000000000000888     18 double      8 double
17)                           13 unassigned  1 8bitString
18) 2                         3 +int         4 nonNegativeInteger
19) -2                        4 -int         5 negativeInteger
20)                           31 eof         1 8bitString

Comments

Anyone know why we ended up with this strange behaviour? Why doesn't COS store 2 and "2" in the same way in lists? The rest of the programming environment is based around them being the same - (2="2") - as everything is a string until used otherwise. It may use "an optimized binary representation", but surely that's not really an excuse. Just curious.

I hope this can explain

USER>zzdump $lb(100,200,50)

0000: 03 04 64 03 04 C8 03 04 32                              ..d..È..2

USER>zzdump $lb("100","200","50")

0000: 05 01 31 30 30 05 01 32 30 30 04 01 35 30               ..100..200..50

USER>k  set mem=$s,mem=$s set a=1234567890 write mem-$s
8

USER>k  set mem=$s,mem=$s set a="1234567890" write mem-$s
32

storing numbers as numbers in $listbuild, is a bit cheapier in memory

But sorry, I don't understand your worry about, how numbers were stored. It mostly does not matter. If I not mistaken, only one place where numbers as numbers are matter is $ZHEX function, which returns different result for string and for a number.

Just to confirm it:
It's not only space in memory.
much more is the space saved on disk, in buffers and in transfer  with disk (or what used to be a disk wink)
And the speed comes from  disk !!!!

In addition the existing limit of subscripts can also be used much more efficient.

$double and dynamic objects also poke holes in the idea that everything is a string:

USER>w [1,"2"].%ToJSON()
[1,"2"]
USER>s d=1,b=$double(d) w d," ",d/3,!,b," ",b/3
1 .3333333333333333333
1 .33333333333333331482

reason #1) 40 yrs. backward compatibility  
#2) in $LB not everything is a string but has its hidden data types 
while a string is sequence of bytes an integer get#s a binary representation ( ~ 19 digits ==> 8 byte (int64))
try:

set x=$lb(64444,"64444")  zzdump x
04 04 BC FB 07 01 36 34 34 34 34


it's really byte saving

Essay on Types and their Representations within InterSystems Object Script

Backwards compatibility is a reason why some things that looked normal in 1977 (date of first ANSI MUMPS standard) now look unusual in 2017.  InterSystems Object Script is based on the ANSI MUMPS language (more recently called ANSI M) but Object Script has undergone quite a bit of extension beyond that standard (and not all those extensions were designed by InterSystems so there are some inconsistencies, see $ZHEX example below.)

The original MUMPS standard said that a subscript string containing the canonical numeric character representation of a numeric value was identical to that number so the original standard allowed an implementation where the only supported data type was the character string.  When used as global subscripts, the canonical numeric subset of string values were sorted in numeric order before the strings that did not contain the canonical character representation of a number.  Those non-numeric subscript strings sorted in textual order.  Thus when used as a subscript, the string "2", a canonical number, is sorted before "1.0", a text string different from the canonical number "1".  The canonical numeric string "1" does sort before "2".

The original implementation of Caché could use several different internal representations for a numeric value besides also supporting a character string representation.  These additional internal representations helped improve performance when executing Object Script programs. These initial numeric representations included an integer representation and a decimal floating-point representation.

The $LIST family of functions are Object Script extensions that provide a way to encode a list of multiple Object Script values in a single string value.  Internally a $LIST string need not store "identical" values using identical internal $LIST representations.  Avoiding conversions between different representations is done for performance reasons while building a $LIST.  Thus, the following lists are represented by different packed strings $lb("230"), $lb(230), $lb(23e1), $lb(2300e-1) but the $LISTSAME function assumes that all four of these lists are identical.  E.g., so the Object Script statement:

   WRITE $lb(230) = $lb(2300e-1) will write 0 while WRITE $LISTSAME($lb(230),$lb(2300e-1)) will write 1.

There are about 21 strings different from $LB(23) that $LISTSAME will assume are identical to the string value $LB(23).

The ZZDUMP command dumps the internal representation of a list and it can expose the some of the different internal representations that are used for the same value.  Copying a value using string representation will always use string representation as you move it in and out of a $LIST. Copying a value using numeric representation in and out of a $LIST will not change its numeric value but you might get different internal numeric representations on different Caché instances.

InterSystems Object Script avoids conversions that change between internal numeric representations and the equivalent string representations because we have inherited features from other vendors of extended MUMPS implementations that treat a numeric value differently from the corresponding canonical string value.  E.g.,  the function calls $ZHEX(10) and $ZHEX("10") give very different answers.

USER>WRITE $zhex("10"),!,$zhex(10)
16
A

Generally you can apply the unary-plus operator, +, to an Object Script string expression to change it from string representation into an Object Script numeric representation.  The unary-plus operator is a conversion operator so it works on strings that do not contain canonical numeric representation.  (E.g., +"7.0", +"+700E-2" and +"7Dwarves" all convert the canonical numeric value 7.)  However, applying the unary-plus operator to a canonical numeric string will sometimes involve a conversion that changes the value because the various internal numeric representations have a more limited range and a more limited accuracy than that supported by the canonical numeric strings that can be sorted using numeric ordering.

There are other extensions added to InterSystems Object Script that extended the set of values supported by the original MUMPS standard.

InterSystems now supports two different string representations.  There is the original 8-bit character strings and there is also a representation using the 16-bit UTF-16 Unicode encoding.

Consider the Object Script expression ##class(%DynamicArray).%New().  It returns an oref value (different from a numeric or a stsring value) which is a reference to an object data structure defined by the %DynamicArray class.  This particular oref value references structured data that is similar to the JSON array constructor [ ].  The set of Class language defined oref values is the largest extension that InterSystems has made to the original standardized set of MUMPS values.

Also consider $double(230), which returns an IEEE 64-bit binary floating-point value which is equal to the decimal floating-point value 230.  The $DOUBLE( ) extension is useful for applications using scientific data encoded using the representation defined by the IEEE binary floating-point standard.  However, binary floating-point arithmetic gives quite different results than decimal floating-point arithmetic.  E.g.

USER>WRITE $double(230),!,230   ;; Gives identical looking answers
230
230
USER>WRITE $double(230)/100,!,230/100   ;; but this shows different computational results.
2.2999999999999998223
2.3

There are also recent extensions to the Object Script language to support JSON objects.  When a JSON constructor is written using standard JSON constant syntax then the values are stored internally as JSON values.  Retrieving a value from a JSON object or JSON array by using the %Get method in an Object Script expression will need to convert the JSON value to a compatible Object Script value.  When coding Object Script statements and expressions, the Object Script language supports an extended the JSON constructor syntax where a JSON element value in a JSON object or array can be replaced with a parenthesized Object Script expression.  These parenthesized expressions are evaluated using InterSystems Object Script semantics.

E.g.,

USER>SET x=["230",230,23e1,2300E-1]  ;; JSON standard syntax

USER>SET y=[("230"),(230),(23e1),(2300E-1)] ;; extended expression syntax with parentheses

USER>WRITE x.%ToJSON(),!,y.%ToJSON()
["230",230,23e1,2300E-1]
["230",230,230,230]

Note that the %DynamicArray constructor value stored in variable x contains JSON numeric syntax while the %DynamicArray constructor value stored in variable y is using Object Script representation although the string representation has been kept separate from the numeric representations.

E.g., We can use the %Get( ) method to convert a JSON value to an Object Script value:

When the Object Script %DynamicArray method %Get is applied to variable x then the resulting value will be converted to Object Script representation because the %Get method call is part of an Object script expression.  E.g.,

USER>WRITE x.%Get(0),!,x.%Get(3)
230
230

Note:  The two output lines containing "230" look identical but internally the first output line is the result of a string write and the second output line is the result of a numeric write.  Also, note that numbers supported by JSON can exceed the capacity of the internal representations supported by Object Script.  Rounding or overflow can occur when converting a JSON numeric element for use in an Object Script expression.  E.g.,

USER>SET z=[3E400]  ;; No error placing a large JSON number into a %DynamicArray

USER>WRITE z.%Get(0)  ;; But converting the JSON number gets a <MAXNUMBER> signal

WRITE z.%Get(0)
^
<MAXNUMBER>