Question
· May 16, 2022

How to distinguish whether a variable is a pure numeric string or a number, such as "123" and "123".

HI Guys.

I want to know how to distinguish whether a variable is a pure numeric string or a number, such as "123" and "123".

I only know an inelegant way.

ClassMethod IsNumber(n)
{
	s ret=0
	try{
		s ret= $lb(n)=$lb(+n)
	}catch e{
		s ret=0
	}
	q ret
}

Is there any other way to implement it

Product version: Caché 2016.1
$ZV: Cache for Windows (x86-64) 2016.2 (Build 736U) Fri Sep 30 2016 11:46:02 EDT
Discussion (13)2
Log in or sign up to continue
Class DC.Test Extends %RegisteredObject
{
/// Return TRUE if val contains an string
ClassMethod IsString(val) As %Boolean
{
    q $a($lb(val),2)<3
}
/// Return TRUE if val contains a number (int, real or double)
ClassMethod IsNumber(val) As %Boolean
{
    q $a($lb(val),2)>3
}
}

w ##class(DC.Test).IsString("abc") //--> 1
w ##class(DC.Test).IsString("123") //--> 1
w ##class(DC.Test).IsString(123) //--> 0
w ##class(DC.Test).IsNumber(123) //--> 1
w ##class(DC.Test).IsNumber("abc") //--> 0
w ##class(DC.Test).IsNumber("123") //--> 0
w ##class(DC.Test).IsNumber(123_345) //--> 0
w ##class(DC.Test).IsNumber(123+345) //--> 1
w ##class(DC.Test).IsString(123_456) //--> 1
w ##class(DC.Test).IsString(123+456) //--> 0

s x=123, y="123"
w ##class(DC.Test).IsString(x) //--> 0
w ##class(DC.Test).IsString(y) //--> 1

I should mention that the numeric comparisons $A($LB(val),2)<3 and $A($LB(val,2)>3 are not quite legal ObjectScript since the internal specification of $LIST string representation is subject to extension in the future.  In previous Caché and IRIS releases the $LIST code type byte for a string could be 1 or 2.  In the future, $LISTBUILD will support compressing string values and those compressed $LIST string elements will use $LIST code type values greater than 3.

It is legal to use such undocumented $LIST representation techniques when debugging new code on a particular Caché/IRIS release.  However, depending on such internal representations in a production application should be avoided since the internal representations can change in the future.

Example: some programmers have used the string equality operation, ls1=ls2, to test if two $LIST strings contain the same values.  They should have used the $LISTSAME(ls1,ls2) function call.  This use of string equality on $LIST strings has resulted in broken applications in the past when extensions were added to the internal specification of $LIST strings.

Once again I fail to follow the repeated almost religious secret of $LIST() encoding
that is celebrated by Support and Engineering over decades.
Especially when the use of the knowledge is labeled "illegal".
That's just disappointing.

The problem of a possible unexpected change rather indicates incomplete Release Notes to me. 

The original ANSI M (also known as ANSI MUMPS) standard did not distinguish between a canonical numeric string versus a number.  Many early implementations of this standard used strings of numeric characters as the internal representation of a number and did arithmetic on character strings instead of converting the numeric string to a binary representation and then using the hardware binary arithmetic instructions.  The result of a numeric operation would always be a character string using the canonical numeric representation.

In such an implementation there was no way to tell the difference between the character string "123" and the literal 123 since "123" was the canonical numeric string representation for that value.  For performance reasons, InterSystems ObjectScript has several different internal representations it can use for the number 123.  It can use the three character strintg "123"; it can use the 32-bit binary integer 123; it can use one of several representations of 123 in ObjectScripts' decimal floating point representation which has a 64-bit binary integer significand combined with an 8-bit power-of-10 exponent.  Writing 12300E-2 will usually be represented with the decimal floating-point representation and not with the integer repersentation.

The command WRITE $LISTSAME($LISTBUILD("123"),$LISTBUILD(12300E-2)) will write 1 because the string "123" and the numeric literal 12300E-2 are the same value.  However WRITE $LISTBUILD("123")=$LISTBUILD(12300E-2)) will write 0 because the $LISTBUILD will will generate different string encodings in this situation even though the values are considered to be identical.  Although $LISTBUILD could convert different internal representations into identical $LIST strings, it is faster just to place the internal representation into the generated binary string.  This is why you must use $LISTSAME when you want to check if two $LIST strings contain the same value.

There are a very few functions built into ObjectScript that have a behavior that depends on the internal representation of the argument.  All of these function were inherited when InterSystems extended ObjectScript to have a compatible feature with the language implemented by other vendors who produced extended M/MUMPS systems.  The most common function is $ZHEX(x) which turns a hex string to a decimal number and which turns a decimal number into a string.  Thus, WRITE $ZHEX("10") will write number 16 while WRITE $ZHEX(10) will write the string "A".  If you want to convert variable 'x' from an integer to hex then you need to write $ZHEX(+x) to make sure 'x' is using an internal numeric representation.  If you want to convert variable 'x' from hex to an integer then you need to write $ZHEX(x_"") to make sure 'x' is using the string internal representation.

One additional note:  The literal 123.0 will use an internal numeric representation so WRITE "123.0"=123.0 will WRITE 0 since a string equality comparison operator will be done between the literal string "123.0" and the canonical numeric string "123".  However, WRITE +"123.0"=123.0 will WRITE 1 because the unary plus operator converts the first operand to a numeric representation and then the string equality operator converts both numeric operands into their canonical numeric string representations making the string equality operation be the same as "123"="123".

I was told,  it's illegal to use data structure information, which doesn't were changed in the last 25 years (and after this many years, one could think to have the right given by "customary law" to use it), hence I decided for a more "legal" solution for the above problem - although this solution will work for IRIS (and recent Cache systems) only:

Class DC.Test Extends %RegisteredObject
{

/// Return TRUE if val contains a string
ClassMethod IsString2(val) As %Boolean
{
    quit {"a":(val)}.%GetTypeOf("a")="string"
}

/// Return TRUE if val contains a number (int, real or double)
ClassMethod IsNumber2(val) As %Boolean
{
    quit {"a":(val)}.%GetTypeOf("a")="number"
}

}

Actually,  the implementation of the $LIST functions has changed a lot in the past 25 years.  There have been 28 changes to the $LIST kernel code since it became part of IRIS and over 100 changes while it was part of Caché.  The changes were carefully done so that working programs that use functions of the form $LISTxxx, $Lx, and $LxS to manipulate $LIST strings will not notice the changes.  There are $LIST element encodings that are no longer generated but all the functions that examine a $LIST string will still correctly process the obsolete encodings.  New encodings were added to support new data types and to improve performance.

Code to READ compressed encodings for binary floating-point values and for Unicode strings have been available for over 5 years.  $LISTVALID in recent releases of IRIS and Caché would accept such compressed encodings even though they were not being generated.   Only the most recent releases have provided the ability to GENERATE these compressed encodings.  The $system.Process.ListFormat(n) classmethod will allow $LISTBUILD to generate these new compressed encodings.  Currently the new encodings  are turned off by default.  But any user who enables these $LIST optimizations will see new encodings in their $LIST strings.

Like the $ZHEX(n) function, the %DynamicAbstractObject classes that support for JSON representation can do some type conversions that were not inherited from the original ANSI M standard.  It might be slightly faster if you use an array instead of an object to save the conversion from a keyname into an array index.

    quit [(val)].%GetTypeOf(0)="string"

Thanks for the info about the evolution of the LIST functions.

I'm just a developer without any insight information into the internals of LIST functions and, as you wrote, "The changes were carefully done so that working programs that use functions of the form $LISTxxx, $Lx, and $LxS to manipulate $LIST strings will not notice the changes.", hence I do not saw any changes.
OK, there were some enhancements like adding $LV() and $LU() or a third argument to $LTS(), but those changes are not relevant for existing applications.

Regarding the speed gain by using (JSON)array instead of (JSON)object, yes you have right, the array variant is about 5% faster. I just didn't made a speed test for the new solution, the goal was to have a "legal solution" and not an all-time record.

Finally, I'll ask you, why are things like the internal format of $BIT() or $LB() unpublished?
For example, $LB() can't be such a big mystery because a simple ZZDUMP reveals the structure?

There are cases where this information would be (very) helpful. Just to name one, I have a particular case. 

I wrote a simple class, which uses $ZF()-callouts, so the customer can create (i.e. export) data into an excel file (including formatting and colors) as well as read (i.e. import) data from an excel file (again, from *.xls and *.xlsx). The class has methods to read/write individual cells or whole rows or whole columns.

To pass row- and column-data between the application and the callout module, I use $LB().

set exl=##class(%Zu.Library.Excel).%New()
...
// writing data
do exl.WriteNum(row,col,value,format)
do exl.WriteRow(row,data) // data is $lb(col1, col2, ...colN)
// reading
set value=exl.ReadNum(row,col)
set data=exl.ReadRow(row)	// data is $lb(col1, col2, ...colN)
...

To be able to write the corresponding callout module (Windows-DLL as well as Linux-SO), the information about the $LB() structure were esential.
This is just one example. Similar solutions were used for the PDF- and ZIP-callout modules too.
At least for the $LB() it shouldn't be such a big secret, and an official documentation would certainly make more happy customers. But it is not my decision, it must be decided by ISC.

The ZZDUMP command is useful for debugging, especially when a searching for a bug caused by an unprintable character in an ObjectScript string.  The most recent versions of the ZWRITE command are now usually easier to use than the ZZDUMP command when they are used for debugging purposes.

One of the legacy uses for ZZDUMP was to look at corrupted strings, especially when a $LIST string had become corrupted by the use of some non-$LIST compliant string operations.  Back when I first joined InterSystems the documentation of ZZDUMP included a description of some of the $LIST formats so programmers could use ZZDUMP to see if a $LIST string was corrupted.  However, now ZWRITE will use $LB(...) syntax when a string contains a valid $LIST and it will use $C(i,j,k,l,...) in other cases when a string contains unprintable characters.  This makes ZWRITE a much better command than ZZDUMP when checking to see if a $LIST string is corrupted.  ZWRITE can recognize a compressed $BIT string but I have very little experience with how compressed $BIT strings are encoded.  Also ZWRITE can provide a translation of a %Status value.  ZWRITE will change further in the future as we discover additional useful display formats.

Back when ZZDUMP was documenting $LIST format that caused two types of problems:  (1) When new $LIST encodings were added there were complaints that older applications did not understand the new encodings; and (2) When the documentation described multiple encodings that could be used for the same data then the documentation did not include the complicated details about which encodings could be generated and which encodings could not be generated.  In general, when new $LIST encodings were generated as a replacement for older encodings then all the $LIST functions would still accept the older no longer generated encodings as well as the newer encodings for backwards compatibility.  Third party applications that used $LIST encodings without using the corresponding $LIST functions would be broken until they were modified to support both old and new encodings.  However, when there were new $LIST encodings that were not used in all possible cases then the $LIST functions may not properly decode that encoding in cases that were never generated.  Third party software would find that some $LIST functions would not work if third party software made a choice to use a $LIST encoding that had never been generated by the InterSystems supplied $LIST functions.  Having ZZDUMP no longer document $LIST formats resulted in a great reduction in such incompatibilities.

Publishing a full specification of the $LIST encodings complete with usage restrictions would allow ObjectScript to support programs that manipulated these encodings without being limited to using only the $LIST functions.  It would also freeze the $LIST encodings on existing data types and would not allow the specification to be extended to provide future optimizations.  On the other hand, supporting customer debugging often requires the exchange of information involving these internal specifications so we often ask customers to look at these encodings.  Discussing the encoding specifications in a debugging context is useful.  However, depending on the internal details of these specifications not changing in the future is much less useful and that is discouraged.