Replies by Steven Hobbs for InterSystems Developer Community

Steven Hobbs · Dec 17, 2020

Standard class object properties have a limited string length but a property can contain a %Stream.GlobalCharacter oref which %JSON.Adaptor can export as a JSON string.

Another option is to create a %DynamicObject (%Library.DynamicObject) class object or a %DynamicArray (%Library.DynamicArray) class object instead of using a subclass of %JSON.Adaptor. You can create a %DynamicObject/%DynamicArray is ObjectScript by using a JSON object/array literal as an ObjectScript literal. The %FromJSON class method will create a %DynamicObject/%DynamicArray by importing JSON from a %Stream, file or device and the %ToJSON class method will export a %DynamicObject/%DynamicArray to a %Stream, file or device.

On IRIS, the %Set and %Get methods in %DynamicObject-s/%DynamicArray-s have been extended to take type keywords of the form "stream", "stream>base64", "stream<base64" which can transfer %DynamicObject string elements between unlimited length %Stream-s and include the ability to encode or decode Base64 during the transfer. There are also type key words "string", "string>base64", "string<base64" which can set/get ObjectScript string values into/from %DynamicObject elements but ObjectScript strings are currently limited to a length of 3,641,144 characters.

So on IRIS you can do something like:
Set DynObj = {"ID":23, "Name":"John Doe", "BirthDay":"1974-12-15"}
Do DynObj.%Set("DataFile",DataOref,"stream>base64")
Do DynObj.%ToJSON(OutputOref)
where DataOref is a class reference to a %Stream.FileBinary referencing the binary file containing data related to John Doe and where OutputOref is a %Stream.FileCharacter referencing the file that will contain JSON text.

go to post

Steven Hobbs · Aug 4, 2020

Although the replies have already answered the question about creating JSON on Caché, I would look to add some discussion about additional features that support the %Library.DynamicArray and %Library.DynamicObject classes. I am running my examples on IRIS Version 2020.2 (Build 199U) so a few of the features will not be available in older Caché versions.

You asked how does a programmer generate the following JSON: { "MyProperty":"1"}. If you are writing an ObjectScript program then just execute the statement:

SET x = { "MyProperty":"1"}

Whenever an ObjectScript expression contains language syntax which is enclosed either in nested curly brackets, { ... }, or in nested square brackets, [ ... ], then the ObjectScript compiler will parse the contents of the bracketed expression using the rules for a JSON object or a JSON array respectively. The strings and numbers and identifiers inside the bracketed syntax will be JSON literal values with one exception. If a value inside the bracketed expression is enclosed in round parentheses, ( ... ) then the contents of the parenthesized value will have the syntax of an ObjectScript expression which will be evaluated at run time.

The %DynamicArray and %DynamicObject class objects can contain property values using JSON syntax for their representation as well as containing property values which are ObjectScript values. Details of all the methods that can be applied to a %DynamicArray/%DynamicObject class object can be found in Class Reference web pages for the %Libraray.DynamicArray, %Library.DynamicObject and %Library.DynamicAbstractObject classes.

When you use the %ToJSON( ) method on a %DynamicArray/%DynamicObject then all the properties containing ObjectScript values will be translated to the appropriate JSON literal representation. Certain ObjectScript values, such as an oref or $DOUBLE("NaN"), do not have a JSON representation and their occurrence will cause %ToJSON to generate an <ILLEGAL VALUE> signal. When you use the %Get(key) method to evaluate a property containing JSON representation then that JSON representation will first be converted into an ObjectScript value. When you use the %Set(key,value) to modify a property then value is computed as a run-time ObjectScript expression and that ObjectScript computation becomes the value of the property.

The %Get(key,default,type) method call has two optional parameters. The value of the default argument is returned (without any type conversions) if there is no property with the specified key argument. If there is no default argument then an unassigned key argument will return the ObjectScript empty string. The type argument contains a string that specifies what type conversion should be applied to the element with the specified key. An empty string type argument or a missing type argument does the normal default conversion to ObjectScript. The %Set(key,value,type) takes an optional type argument which is a string which specifies how the ObjectScript value argument is converted to a property element. Th supported type arguments have changed over time so check the Class Reference web pages for documentation on what the possible type strings mean.

Consider:

USER>set x=["0.1230",12.30E-2,(.12300),("0.1230")] ;; JSON string, JSON number, ObjScr number, ObjScr string

USER>write x.%ToJSON()
["0.1230",12.30E-2,0.123,"0.1230"] ;; Everything converts to JSON representation (strings are the same)

USER>write x.%Get(0),",",x.%Get(1),",",x.%Get(2),",",x.%Get(3),"," ;; Everything converts to ObjectScript
0.1230,.123,.123,0.1230,

;; Newly added "json" type argument.
;; The 6-char string 0.1230 becomes 8 chars with leading/trailing double-quotes in JSON literal representation
;; The JSON number is not changed
;; The ObjectScript number picks up the leading zero required by JSON representation
USER>write x.%Get(0,,"json"),",",x.%Get(1,,"json"),",",x.%Get(2,,"json"),","
"0.1230",12.30E-2,0.123,
USER>set y=["null",null,""] ;; JSON 4-char string null, JSON null value, JSON empty string

USER>write y.%ToJSON( ) ;; JSON array printed as a %String
["null",null,""]
USER>write y.%Get(0),",",y.%Get(1),",",y.%Get(2),",",y.%Get(3),"," ;; Convert to ObjectScript 4 elements of 3-element array
null,,,, ;; Note that null, "" and unassigned all convert to empty string in ObjectScript
USER>write y.%Get(0,,"json"),",",y.%Get(1,,"json"),",",y.%Get(2,,"json"),",",y.%Get(3,,"json"),"," ;; Convert 4 elements to JSON
"null",null,"",,
;; Note: 4-char string null now is 6 chars with 2 "-chars, null identifier, empty string with only 2 "-chars,
;; and ObjectScript default null string for unassigned value (there is no legal JSON representation)

Unfortunately %Set does not yet support the "json" type argument but maybe that will happen in a future IRIS release.

go to post

Steven Hobbs · May 21, 2020

There are actually two (maybe more) levels of ObjectScript. There is basic ObjectScript, which is an very extended version of the ANSI M language. [ footnote [ANSI M is the successor of the ANSI MUMPS language and that language, without the large number of extensions supported by basic ObjectScript, could be considered to be a third language level although no modern programmer would restrict their code to this much older language definition.] ] And there is the Class Language ObjectScript, which includes things like type-name classes: %Library.String (can be abbreviated %String), %Library.Integer (can be abbreviated %Integer), etc. It also includes Class Methods (with syntax like ##class(Class.Name).ClassMethodName(arg1,arg2)) and there are Object Methods (which look like oref.ObjectMethodName(arg1,arg2), where oref must contains an extended ObjectScript object reference) and there are Object Properties (which look like oref.PropName, where oref must contain an extended ObjectScript object reference.)

An example of a basic ObjectScript statement is SET var1=42,var2="42" Almost every operation in basic ObjectScript thinks var1 and var2 contain identical values. So the string equality operation var1=var2 returns 1 because var1 is converted to a string and "42" equals "42". The numeric comparison operations var1<var2 and var1'<var2 return 0 and 1 because var2 is converted to a numeric and 42<42 is false while 42'<42 is true. We could also evaluate var1+var2 and the result will be 84 (or is that result "84"--who can tell?).

In Class Language ObjectScript you can declare
property LimitedInt : %Integer(MAXVAL=10);
and if you (directly or indirectly) call the %ValidateObject() method on a class object containing the property LimitedInt then the contents of LimitedInt may be checked to make sure they look like an integer with a value not greater than 10. However, only Class Language methods like %SerializeObject, %ValidatObject, %Save, etc. make these checks on the value of LimitedInt. If someone executes
SET oref.LimitedInt=20.95
in basic ObjectScript, the basic ObjectScript execution will not signal an error despite the fact that 20.95 is larger than the MAXVAL and despite the fact that 20.95 is NOT a %Integer. Only executing an appropriate Class Language method will detect that oref.LimitedInt does not contain a valid value. The purpose of %Library.DataType subclasses is to make it possible that a %Save() method does not save invalid property values into a data base.

Certain ObjectScript conversions may change an ObjectScript value. This can happen when changing a well-formed numeric string to be a decimal number because the decimal arithmetic implemented in ObjectScript mathematics supports no more than 19 decimal digits of precision. Consider,

USER>set a="12345678901234567890123",b="12345678901234567890124",c="123456789012345678901230"
USER>write b>a," ",a>b," ",+a," ",+b
0 0 12345678901234567890000 12345678901234567890000

Because the > and < operators are arithmetic-comparison operators the string operands are converted to a numeric value with only 19 significant digits of precision and the resulting numeric values for a and b end up being equal so neither b>a nor a>b are true. However, the sorts-after operator, ]], orders the canonical numeric strings before any non-empty string that does not have canonical numeric syntax. The canonical numeric strings are sorted in numeric order while strings that do not have canonical numeric syntax are sorted in textual string order. This is also the default rule for ordering subscript values in ObjectScript. Operands of ]] are converted to strings before doing this subscript ordering. The values a, b and c are all canonical numeric strings and ObjectScript is perfectly capable of doing the sorts-after string comparisons on very long canonical numeric strings. Consider,

USER>write a]]b," ",b]]a," ",b]]c," ",c]]b
0 1 0 1

Now a and b are both canonical numeric strings with 23 digits and their first 22 digits are equal but the b has a 23rd digit larger than the 23rd digit of a, so b sorts-after a. But b does not sorts-after c because c has more significant digits so the canonical numeric value of c is greater than the canonical numeric value of b.

The basic ObjectScript language has a very small class of built-in types.

(1) There are the ObjectScript string values, which include the subclass of canonical numeric strings.

(2) There are the default decimal floating-point values which do not have more 19 digits of decimal precision (and may sometimes have less than 19 digits of precision because the implemented accuracy of ObjectScript decimal arithmetic operations is approximately 18.96 decimal digits.) Every decimal floating-point value can be converted exactly to a canonical numeric string (but canonical numeric string values with more than 18 digit characters cannot always be converted exactly to a decimal floating-point numeric result but must instead be a decimal numeric value that is an approximation.

(3) Basic ObjectScript supports a third set of "$DOUBLE values" which contains the *binary* floating-point values specified by the IEEE 64-bit binary floating-type type. The 64-bit binary floating-point arithmetic specified by the IEEE standard has approximately 15.95 decimal digits of precision but since the representation is binary, and not decimal, an exact conversion of a 64-bit IEEE binary floating-point value to a decimal string can have over 1000 digits. Now every $DOUBLE binary floating-point value could be exactly converted to a canonical numeric string but it is not reasonable to have such a conversion produce such long strings. The default conversion of a $DOUBLE value to a canonical numeric string will have no more than 20 significant digits, and the approximated 20th significant digit will never be a 5 or 0 (unless that 20th digit results in an exact conversion.) Using a default canonical numeric string with 20 significant digits for $DOUBLE conversions means the ]], sorts-after, operator will correctly order a $DOUBLE binary floating-point value in-between the adjacent 19-digit decimal floating-point values of the ObjetScript default decimal numeric type.

(4) There are also the oref values which are the basic ObjectScript values created by Class Language ObjectScript. Basic ObjectScript can do property evaluation, property assignment and method calls using the basic ObjectScript oref type. Basic ObjectScript can convert oref values to the ObjectScript string type and the ObjectScript decimal arithmetic type but neither of these conversions is particularly useful unless you are debugging.

go to post

Steven Hobbs · Apr 13, 2020

My comment applies generally to this entire discussion and I know your comment was not talking about Unicode character conversion. However, your comment did include actual code with a loop. I wanted to make it easy for readers to see your Base64 encoding which is looping over a %Stream while I discussed the issues involved with adding calls to $ZCONVERT(UnicodeText,"O","UTF8",handle) to convert 16-bit UTF-16 characters into 8-bit UTF-8 bytes which could then be Base64 encoded.

go to post

Steven Hobbs · Apr 9, 2020

I should point out that if you have a Unicode character with an encoding larger than 65535 (i.e., encoded in UTF-16 using a surrogate pair of two adjacent 16-bit characters) then the statement SET BinaryText=$ZCONVERT(UnicodeText,"O","UTF8") in the encoding loop will also need a fourth "handle" argument to handle the case where the UnicodeText substring ends with the leading half of a surrogate pair. Characters with Unicode encoding greater than 65535 consist mostly of less frequently used Chinese, Korean and Japanese ideograms but also many of the emojis.

go to post

Steven Hobbs · Apr 9, 2020

Base64 encoding only works on strings of 8-bit bytes. If you have a Unicode string with a character with an encoded value greater than 255 then direct Base64 encoding is not possible. The following documentation excerpt from the %SYSTEM.Encryption.Base64Encode Class Reference page will describe a first step at turning Unicode into a UTF-8 byte string and then applying the Base64 encoding to that byte string (but if your encoded stream exceeds the string limit in size then you will need to do more.)

Note: Base 64 encoding is not able to encode a string which contains unicode (2 byte) characters. If you need to Base 64 encode an unicode string, you should first translate the string to UTF8 format, then encode it.

s BinaryText=$ZCONVERT(UnicodeText,"O","UTF8")
s Base64Encoded=$system.Encryption.Base64Encode(BinaryText)
Now to Decode it:
s BinaryText=$system.Encryption.Base64Decode(Base64Encoded)
s UnicodeText=$ZCONVERT(BinaryText,"I","UTF8")

Now if your Base64Encoded, UTF-8 encoded string cannot be longer than 3,641,144 bytes then you can ignore the rest of this reply. Just execute the code excerpt from the %SYSTEM.Encryption.Base64Encode Class Reference documentation. If you have a %Stream that is too long then you will need to loop over substrings of the %Stream and you should read on.

If you read from a long %Stream a sequence of substrings then it might not be possible to simply call $system.Encryption.Base64Encode(BinaryText,...) and $ZConvert(BinaryText,"I","UTF8") and because the substrings can be broken between binary bytes that must be combined in order to do the conversion. So you will need to workaround some issues in the code that was part of the %SYSTEM.Encryption.Base64Encode Class Reference documentation.

Issue one: the SET BinaryText=$ZCONVERT(UnicodeText,"O","UTF8") statement can return more characters of BinaryText than there are characters of UnicodeText. The new string length may not be a perfect multiple of 3 characters long and $system.Encryption.Base64Encode converts sequences of 3 bytes into 4 bytes so when the Base64Encode(substring) method is called with a substring that has a non-multiple of 3 byte length then the extra 1 or 2 bytes at the end of the substring must be saved and then concatenated onto the beginning of the next substring to be passed to the Base64Encode(substring) method. Only the very last call on the Base64Encode(substring) method can have a substring byte length which is not a multiple of 3.

Issue two: the "SET UnicodeText=$ZCONVERT(BinaryText,"I","UTF8")" statement might be given a BinaryText substring that ends with an incomplete UTF-8 sequence of characters. That incomplete sequence must be concatenated onto the beginning of the next BinaryText substring. Fortunately, the $ZCONVERT function takes an optional fourth argument which is a local variable name. Evaluating $ZCONVERT(UnicodeText,"O","UTF8",handle) will do its conversion on an input value containing the concatenation of handle with UnicodeText. When $ZCONVERT is done then the new value of handle will either be the empty string or handle will contain the unconverted substring at the end of the input value. So before your loop which decodes the %Stream containing Base64 encoded UTF-8 bytes you should execute SET handle="" and inside your decoding loop you should call $ZCONVERT using handle as a forth argument variable. If at the exit of your decoding loop the variable handle does not contain the empty string then your input stream was ill-formed.

Note that when your decoding loop reads a Base64Encoded substring from the %Stream then I am assuming you read a perfect multiple of 4 bytes from the %Stream before you execute SET BinaryText=$system.Encryption.Base64Decode(Base64Encoded). [[ If the Base64 encoded %Stream was not generated using the code described above then I am also assuming the Base64 encoded BinaryText does not contain any additional white-space characters, or if it does contain white space then those white-space characters were removed before building a substring that contains a perfect multiple of 4 bytes. ]] Each sequence of 4 bytes in Base64Encoded will be turned into 3 bytes in BinaryText.

go to post

Steven Hobbs · Feb 10, 2020

Extension of the %Set and %Get methods of the %DynamicAbstractObject classes to support type parameters (such as using "stream" as a type) is an IRIS feature and not a Caché feature. In Caché the %ToJSON and %FromJSON methods can accept %File-s and %Stream-s. These methods are generally limited to Dynamic Objects/Arrays involving sizes related to the largest 32-bit signed integer. But in Caché the %ToJSON/%FromJSON methods work only on entire Dynamic Objects/Arrays. Using %Set/%Get in Caché to modify an element of a Dynamic Object/Array is limited by the length of an ObjectScript %String (currently 3,641,144 characters in Caché.)

In recent and future IRIS releases the sizes of a %DynamicAbstractObject will be limited by the amount of virtual memory available to the process. (Please consider avoiding the activation of many multi-gigabyte Dynamic objects/arrays at the same time.) The future IRIS versions of the %Get/%Set methods will support type keywords involving streams which will allow a %Get/%Set to access Dynamic Object/Array elements involving any size that can be fetched from (or stored into) a %Stream. Future versions of these methods will also encode and decode Base64 representation while transferring byte data between a %Stream and an element of a Dynamic Object/Array.

go to post

Steven Hobbs · Apr 30, 2019

For Set MyObj.JSON=JSONString.%ToJSON() to signal <INVALID OREF> then either local variable MyObj is not an oref or local variable JSONString is not an oref. Make sure that JSONString is really a %DynamicObject by doing a WRITE JSONString before doing the Set command. The WRITE statement should write some like 99@%Library.DynamicObject .

go to post

Steven Hobbs · Apr 27, 2019

I agree that comments here will have little effect on developers other than InterSystems developers or InterSystems customers. However, processing of JSON is controlled by the European standard "ECMA-404 The JSON Data Interchange Format". If a JSON decoder deliberately ignores this standard specification then it is not a *JSON* decoder. If any customer discovers that InterSystems's %FromJSON method is violating the ECMA-404 standard then I am sure InterSystems will fix this situation. I suspect that third-party developers would also fix such a violation of ECMA-404. The ECMA-404 document actually contains examples of 4 different strings that are identical to "/". ECMA-404 does not require that JSON encoders generate the "\/" string syntax but this standard does require that all decoders accept this string syntax.

go to post

Steven Hobbs · Apr 25, 2019

There are some issues with above C code for manipulating a string of $LIST elements.

I do not believe the above code will work on a big-endian platform, such as PowerPC running the AIX operating system. Conversions from $LIST representation to numeric representation will order the bytes backwards on big-endian hardware.

Decimal floating-point values in IRIS and Caché have almost 19 decimal digits of precision while the above code translates these numbers to IEEE binary double-precision floating-point values which have less than 16 decimal digits of precision. This means that $LISTBUILD(0.3) will be converted by the above code into the value 0.29999999999999998889... . The above code also introduces a double-round error so that decoding $listbuild(x) and $listbuild($double(x)) will not always be equal because the $double(x) function in IRIS/Caché will do the conversion from decimal to binary without a double round.

The above code is inconsistent in resulting type of integer values. Consider,
USER>set L5A=$LISTBUILD(5),L5B=$LISTBUILD(50/10)
USER>WRITE $LISTSAME(L5A,L5B)
1

The $LIST elements in L5A and L5B contain the same value, 5. However, the above code will convert L5A to the C int64_t type while it will convert L5B to the C double type. If the value of the integer is greater than 2**53 then the above C code can convert the identical integer values into different integer values.

Also, the above code does not correctly handle all the special cases when (type==FLOAT) is true.

The IRIS/Caché $LIST representation is not as simple as most people think. There are some unusual rules that must be followed if you want to get the same results as the $LISTxxx functions get in IRIS/Caché.

go to post

Steven Hobbs · Apr 25, 2019

Agreed that there is nothing wrong with escaping a solidus. However, saying "most decoders" will unescape the solidus correctly" is not quite correct. "*ALL* decoders will unescape the solidus correctly." If a decoder does not correctly unescape the solidus then it is *NOT* a JSON decoder. The %FromJSON method in the %DynamicObject class in IRIS and Caché versions after 2016.1 will do the following:

USER>write x
{"FileStatus":"P","Path":"\/somepath\/test\/test123\/filename.txt","InterchangeOID":"100458"}
USER>set y={}.%FromJSON(x)

USER>write y.Path
/somepath/test/test123/filename.txt

So %FromJSON is the method that breaks JSON apart and removes all escaping from JSON string values. (I assume that $fromJSON will do the same in Caché version 2016.1. The JSON manipulation added to version 2016.1 was changed significantly in all future releases. Most JSON manipulation programs written in version 2016.1 will require rewriting when porting to a later release.)

go to post

Steven Hobbs · Jan 14, 2019

The ClassMethod IsNumeric(...) seems to contain three SET statements that do nothing more than copy data around. The following is equivalent (without data copies)

ClassMethod IsNumeric(value As %String) As %Boolean
{ QUIT $ISVALIDNUM(value) }

And this implementation of IsNumeric is equivalent to the first answer, ClassMethod IsValidNumber(...) by Eduard Lebedyuk except the IsNumeric ClassMethod does not have the [ FINAL ] attribute.

go to post

Steven Hobbs · Nov 28, 2018

Let us assume that you are using the %DynamicAbstractObject class and its subclasses, which include %DynamicArray and %DynamicObject (instead of using the legacy support for JSON in the %ZEN package.)

If you have an element of a JSON array or JSON object that contains more characters than are supported by an ObjectScript string then the way to manipulate that element is to send it to a %File or a %Stream. In the present releases of Caché and IRIS Data Platform the only way to do that is to use the %ToJSON(outstream) method call. If the element in question is in a nested sequence of JSON elements then you can first use a sequence of %Get method calls to access the closest containing %DynamicArray or %DynamicObject. You can then use %ToJSON(outstream) method call to create a %Stream containing the textual JSON of that entire object. Then process the contents of that %Stream using the Read(len,...)/Readline(len,...) methods, with the 'len' parameter chosen to provide blocking that will prevent exceeding ObjectScript string length limits. Some JSON parsing will be necessary to find the beginning and the ending of the JSON formatted string containing the element in question so that the required element can be copied as the only contents of a new %Stream.

If you can control how the JSON is formatted then you can choose to enclose the long JSON string element inside a JSON array containing just that one element--the long string value requiring additional manipulation. Parsing such an %DynamicArray element would be easy--send the Array to a %Stream using %ToJSON and then ignore the first two characters, [", and the last two characters, "], of the %Stream. What remains is contents of the string element with a few special characters encoded using \-sequences. If you cannot control how the JSON is formatted then it still might be possible to use knowledge of the application that is using JSON representation to short cut the parsing issues.

In a future InterSystems product release there will certainly be an extension to the %Get method that will allow any %DynamicAbstractObject element (including string and numeric elements) to be sent to a %Stream using either Raw String representation or using JSON representation. You will also be able to copy such an element from one JSON array/object into another JSON array/object without needing to copy the characters to a %Stream that is using an external representation.

go to post

Steven Hobbs · Mar 1, 2018

Essay on Types and their Representations within InterSystems Object Script

Backwards compatibility is a reason why some things that looked normal in 1977 (date of first ANSI MUMPS standard) now look unusual in 2017. InterSystems Object Script is based on the ANSI MUMPS language (more recently called ANSI M) but Object Script has undergone quite a bit of extension beyond that standard (and not all those extensions were designed by InterSystems so there are some inconsistencies, see $ZHEX example below.)

The original MUMPS standard said that a subscript string containing the canonical numeric character representation of a numeric value was identical to that number so the original standard allowed an implementation where the only supported data type was the character string. When used as global subscripts, the canonical numeric subset of string values were sorted in numeric order before the strings that did not contain the canonical character representation of a number. Those non-numeric subscript strings sorted in textual order. Thus when used as a subscript, the string "2", a canonical number, is sorted before "1.0", a text string different from the canonical number "1". The canonical numeric string "1" does sort before "2".

The original implementation of Caché could use several different internal representations for a numeric value besides also supporting a character string representation. These additional internal representations helped improve performance when executing Object Script programs. These initial numeric representations included an integer representation and a decimal floating-point representation.

The $LIST family of functions are Object Script extensions that provide a way to encode a list of multiple Object Script values in a single string value. Internally a $LIST string need not store "identical" values using identical internal $LIST representations. Avoiding conversions between different representations is done for performance reasons while building a $LIST. Thus, the following lists are represented by different packed strings $lb("230"), $lb(230), $lb(23e1), $lb(2300e-1) but the $LISTSAME function assumes that all four of these lists are identical. E.g., so the Object Script statement:

WRITE $lb(230) = $lb(2300e-1) will write 0 while WRITE $LISTSAME($lb(230),$lb(2300e-1)) will write 1.

There are about 21 strings different from $LB(23) that $LISTSAME will assume are identical to the string value $LB(23).

The ZZDUMP command dumps the internal representation of a list and it can expose the some of the different internal representations that are used for the same value. Copying a value using string representation will always use string representation as you move it in and out of a $LIST. Copying a value using numeric representation in and out of a $LIST will not change its numeric value but you might get different internal numeric representations on different Caché instances.

InterSystems Object Script avoids conversions that change between internal numeric representations and the equivalent string representations because we have inherited features from other vendors of extended MUMPS implementations that treat a numeric value differently from the corresponding canonical string value. E.g., the function calls $ZHEX(10) and $ZHEX("10") give very different answers.

USER>WRITE $zhex("10"),!,$zhex(10)
16
A

Generally you can apply the unary-plus operator, +, to an Object Script string expression to change it from string representation into an Object Script numeric representation. The unary-plus operator is a conversion operator so it works on strings that do not contain canonical numeric representation. (E.g., +"7.0", +"+700E-2" and +"7Dwarves" all convert the canonical numeric value 7.) However, applying the unary-plus operator to a canonical numeric string will sometimes involve a conversion that changes the value because the various internal numeric representations have a more limited range and a more limited accuracy than that supported by the canonical numeric strings that can be sorted using numeric ordering.

There are other extensions added to InterSystems Object Script that extended the set of values supported by the original MUMPS standard.

InterSystems now supports two different string representations. There is the original 8-bit character strings and there is also a representation using the 16-bit UTF-16 Unicode encoding.

Consider the Object Script expression ##class(%DynamicArray).%New(). It returns an oref value (different from a numeric or a stsring value) which is a reference to an object data structure defined by the %DynamicArray class. This particular oref value references structured data that is similar to the JSON array constructor [ ]. The set of Class language defined oref values is the largest extension that InterSystems has made to the original standardized set of MUMPS values.

Also consider $double(230), which returns an IEEE 64-bit binary floating-point value which is equal to the decimal floating-point value 230. The $DOUBLE( ) extension is useful for applications using scientific data encoded using the representation defined by the IEEE binary floating-point standard. However, binary floating-point arithmetic gives quite different results than decimal floating-point arithmetic. E.g.

USER>WRITE $double(230),!,230 ;; Gives identical looking answers
230
230
USER>WRITE $double(230)/100,!,230/100 ;; but this shows different computational results.
2.2999999999999998223
2.3

There are also recent extensions to the Object Script language to support JSON objects. When a JSON constructor is written using standard JSON constant syntax then the values are stored internally as JSON values. Retrieving a value from a JSON object or JSON array by using the %Get method in an Object Script expression will need to convert the JSON value to a compatible Object Script value. When coding Object Script statements and expressions, the Object Script language supports an extended the JSON constructor syntax where a JSON element value in a JSON object or array can be replaced with a parenthesized Object Script expression. These parenthesized expressions are evaluated using InterSystems Object Script semantics.

E.g.,

USER>SET x=["230",230,23e1,2300E-1] ;; JSON standard syntax

USER>SET y=[("230"),(230),(23e1),(2300E-1)] ;; extended expression syntax with parentheses

USER>WRITE x.%ToJSON(),!,y.%ToJSON()
["230",230,23e1,2300E-1]
["230",230,230,230]

Note that the %DynamicArray constructor value stored in variable x contains JSON numeric syntax while the %DynamicArray constructor value stored in variable y is using Object Script representation although the string representation has been kept separate from the numeric representations.

E.g., We can use the %Get( ) method to convert a JSON value to an Object Script value:

When the Object Script %DynamicArray method %Get is applied to variable x then the resulting value will be converted to Object Script representation because the %Get method call is part of an Object script expression. E.g.,

USER>WRITE x.%Get(0),!,x.%Get(3)
230
230

Note: The two output lines containing "230" look identical but internally the first output line is the result of a string write and the second output line is the result of a numeric write. Also, note that numbers supported by JSON can exceed the capacity of the internal representations supported by Object Script. Rounding or overflow can occur when converting a JSON numeric element for use in an Object Script expression. E.g.,

USER>SET z=[3E400] ;; No error placing a large JSON number into a %DynamicArray

USER>WRITE z.%Get(0) ;; But converting the JSON number gets a <MAXNUMBER> signal

WRITE z.%Get(0)
^
<MAXNUMBER>

go to post

Steven Hobbs · Sep 1, 2017

In regard to: "Calculating what day of the week Columbus reached on Oktober 12, 1492 in America might be incorrect." Columbus would have been using the "Julian Calendar", which reckons dates quite differently than using the "Julian Day Number", despite the similarity of the names. Note that Julian Day number changes at Noon UTC rather than changing at Midnight local time where the Gregorian calendar and Julian Calendar assume the date changes. October 12, 1492 (Julian Calendar) is October 21, 1492 (Gregorian), a 9 day difference, since there was a 10 day difference when the Gregorian calendar started on October 15, 1582 (October 5, 1582 Julian Calendar) and February, 1500 was a leap year in Julian Calendar but not in the Gregorian Calendar. Both the Julian Calendar and the Gregorian Calendar (and the Islamic and Hebrew calendars) would agree that the first Columbus day was a Friday. I.e., whenever any calendar adds a leap day/month or skips/adds days to switch between calendars, the days of the week just change by 1 normal day when we switch from one sunrise to the next. Religions and countries may argue over what year of the calendar it is and what month of the year it is and what day of the month it is but there is much less argument over which day of the week it is.

go to post

Steven Hobbs · Jun 10, 2016

Yes, in my original posting I meant to use the "sorts after" operator, ]], when I incorrectly used the "follows" operator, ]. I have corrected the original post.

go to post

Steven Hobbs · Jun 10, 2016

Comment on the statement that decimal numbers with more than 18 significant digits are strings. This is often true but it does not affect the accuracy of collating canonical numeric strings.

(1) Strings containing *canonical* representations of decimal numbers collate according to their numeric value and *NOT* according to their string value. (Every non-repeating decimal value has a unique canonical representation in COS. See http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=... for a description of the canonical numeric representation used by Caché.) This collation rule applies even for canonical numeric strings with many more that 19 significant digits.

(2) When Caché does arithmetic on decimal values it generates results with approximately 18.96 decimal digits of precision. I.e. all the integers between 1 and 9223372036854775807 can be represented as an arithmetic result. All the integers between 1000000000000000000 and 9223372036854775807 are integers with 19 digits of precision that can be the result of a Caché arithmetic computation. But the next larger decimal numbers that can result from a Caché arithmetic computation are 9223372036854775810, 9223372036854775820, 9223372036854775830 and so on up to 9999999999999999990. This range of decimal arithmetic will only have 18 digits of precision even though they represent 19 digit integer values. Then come 20 digit integers which go back to having 19 digits of precision starting with 10000000000000000000, followed by 10000000000000000010, 10000000000000000020, 10000000000000000030 and so on up to 92233720368547758070, which is largest 20 digit decimal integer with 19 digits of precision, the next possible arithmetic result is 92233720368547758100, which is a 20 digit number with only 18 digits of precision and which is followed by 92233720368547758200, etc. [[You can ignore these details unless you really need more than 18 digits of precision when doing arithmetic computations in the COS language.]]

Now for an example using canonical numeric strings with 20 digits with a mixture of 20, 19 and 18 significant digits. In this range of values Caché supports 19 digits of precision. We also have two non-canonical numeric strings which do not collate in numeric order but instead collate in character value order after all the canonical numeric strings have collated.

USER>set ^a("09")="noncanonical index"
USER>set ^a("01234567890123456780")="noncanonical index"
USER>set ^a("12345678901234567800")="numeric index"
USER>set ^a("12345678901234567870")="numeric index"
USER>set ^a("12345678901234567874")="string index"
USER>set ^a("12345678901234567876")="string index"
USER>set ^a("12345678901234567880")="numeric index"
USER>set ^a("12345678901234567890")="numeric index"
USER>set ^a("12345678901234567900")="numeric index"
USER>zw ^a
^a(12345678901234567800)="numeric index"
^a(12345678901234567870)="numeric index"
^a("12345678901234567874")="string index"
^a("12345678901234567876")="string index"
^a(12345678901234567880)="numeric index"
^a(12345678901234567890)="numeric index"
^a(12345678901234567900)="numeric index"
^a("01234567890123456780")="noncanonical index"
^a("09")="noncanonical index"

Note that strings containing canonical numbers with 20 significant digits (i.e., "12345678901234567874" and "12345678901234567876") have quotes around them because Caché cannot convert them to numeric format without rounding the low order significant digit into a zero digit. Note also that strings containing non-canonical numeric representation (i.e., "01234567890123456780" and "09") appear last with quotes around the subscript values because these string indices collate as string values. If you do arithmetic on the 20 digit canonical arithmetic strings (e.g., +"12345678901234567874" and +"12345678901234567876") then they will be treated as having the value of nearest decimal number with 19 significant digits (i.e., values 12345678901234567870 and 12345678901234567880 respectively.)

Note: given two canonical numeric strings then the COS "sorts after" operator, ]], returns 1 if the first canonical numeric string operand is greater-than than the second canonical numeric string operand. (I.e., the "]]"-operator returns 1 if the first string operand collates after the second string operand.) This greater-than collation when applied to canonical numeric strings is correctly computed regardless of the number of significant digits in the canonical numeric strings. However, using the COS numeric "greater-than" operator, >, will first convert the operands to numeric representation with only 18.96 digits of precision before doing the numeric comparison.

Examples:
USER>write "12345678901234567874" ]] "12345678901234567870"
1
USER>write "12345678901234567874" > "12345678901234567870"
0
The second write statement returns 0 because the numeric ">" operator rounds it first operand to 19 significant digits which will compare equal-to the numeric value 12345678901234567870.

Summary: When it comes to collating canonical numeric strings, the number of significant digits supported is limited only by the string length. When it comes to doing decimal arithmetic in Caché, the computational results are rounded so they have less than 19 decimal digits of precision.