Runtime type detection

Question

Question

Sean Connelly · Aug 25, 2017

#Object Data Model #ObjectScript #Caché

SOLVED

tl;dr how can you tell if a number is really a string

The original question has been updated/improved.

Equality comparisons on floating point numbers will produce different results...

"1.1"=1.1 //is true!

"0.1"=0.1 //is not true :(

This second comparison can be fixed with...

+"0.1"=+0.1 // is true!

The problem is, what if we don't realise that a value is a stringy number, or just overlook implementing this defensive check.

One solution would be to lint check %Float properties and return types that should originate from a number and not a string, as discussed here...

https://community.intersystems.com/post/compilation-gotchas-and-request-change

A second approach is to use Unit Testing to ensure not only the values of a test are the same, but also the types of those values.

If for instance, a method should return a value of type %Float, then instead of using a normal AssertEquals() method, the unit test could implement an AssertFloatEquals() or AssertNumberEquals() which would check the return value is a pure number and not a stringy number. This would fix problems upstream before they can happen.

So, boiling all of this down, how can you tell if a number is really a string.

A simple condition for the solution should produce a false (zero) for both of these tests

$$$AssertNumberEquals("0.1",0.1)
$$$AssertNumberEquals("1.1",1.1)

Answers that are not hitting the mark...

1. Implement (+a=+b) in the AssertNumberEquals() method.

This will create a false positive test for (+"0.1"=+0.1), the point is that these need to fail. It also opens up tests to incorrectly pass values such as "1B" and 1.

2. Use "sort by", such that "0.12345"]]$c(0) returns 1 and 0.12345]]$c(0) returns zero.

Whilst a brilliant and innovative answer, it turns out that it of course only works for floating numbers with a leading zero.

It also turns out that $length(+num)=$length(num) will also do the same thing without the collation problems described below.

3. Use $IsValidNum

Whilst this will determine if a string contains a valid number, it does not tell us if the number is contained within a string.

4. Use ["0.12345"].%GetTypeOf(0) which will return "string"

I got this to work with the latest versions of Caché, but I was unable to find anything that was backwards compatible.

Discussion (31)2

Log in or sign up to continue

Sean Connelly · Aug 31, 2017

Turns out that stringy numbers are treated as strings by $lb, so a simple string test can be created . ..

ClassMethod IsString(val) As %Boolean
{
quit $lb(val)[val
}

ClassMethod AssertNumberEquals(val1 As %Float, val2 As %Float) As %Boolean
{
if ..IsString(val1) quit 0
if ..IsString(val2) quit 0
if val1'=val2 quit 0
quit 1
}

0 0

score 9 · Answer 1 · 2017-08-25T13:16:22-04:00

How about using the "sorts after" operator ]] for this?

USER>w "0.12345"]]$c(0)
1
USER>w ".12345"]]$c(0)
0
USER>

score 0 · Answer 2 · 2017-08-25T13:21:45-04:00

Sean Connelly · Aug 25, 2017

Boooooom, John with an epic Friday afternoon answer!

Thanks John, that deserves a beer...

0 0

score 1 · Answer 3 · 2017-08-25T14:03:51-04:00

Robert Cemper · Aug 25, 2017

wow, I wish a had more than 1 thumbs up

1 0

score 1 · Answer 4 · 2017-08-25T15:27:12-04:00

Notice, that in some cases it might produce different results:

USER>write ##class(%Library.Collate).SetLocalName("Cache standard"),!
1

USER>write "0.12345"]]$c(0)
1
USER>write ".12345"]]$c(0) 
0
USER>write ##class(%Library.Collate).SetLocalName("Cache standard string"),!
1

USER>write "0.12345"]]$c(0)                                                  
1
USER>write ".12345"]]$c(0)                                                   
1

score 1 · Answer 5 · 2017-08-29T05:19:54-04:00

This is a good point. I was relying on the doc at http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=... being correct when it says:

Binary Sorts after tests whether the left operand sorts after the right operand in numeric subscript collation sequence. In numeric collation sequence, the null string collates first, followed by canonical numbers in numeric order with negative numbers first, zero next, and positive numbers, followed lastly by nonnumeric values.

score 1 · Answer 6 · 2017-08-29T10:30:14-04:00

I'm not sure that I have a precise definition of what you are trying to achieve. If you can define it, I might be able to help more. However, there is some confusion in your example that I think needs clarification.

What you're dealing with here is the rules about canonical numbers. (x=+x) will indeed evaluate whether a number is in canonical form because the equals operator test for exactly equal strings and the unary + converts a value to a number in canonical form. The reason your first example above returns true is just that you set x equal to a numeric literal, so it got converted to canonical form before it even got set into the variable. (if you look at the value in x, it would not have had a leading zero)

If you haven't read it before, this portion of the doc (along with the linked references) is a pretty good treatment of this subject. http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=...

String-to-Number Conversion

score 1 · Answer 7 · 2017-08-29T21:59:44-04:00

See my other comment above, but I don't think relying on what the dynamic array implementation picks for a type to convert to is a great idea. I'd like to see you find a solution in the core of the (typeless) language. If you really are just trying to implement AssertNumericEquals(actual,expected), then that's simply 'if +actual'=+expected { FAILED }'. This will pass any value 'actual' that would evaluate in an arithimatic operation as the value 'expected' would. Similarly, if you are trying to implement AssertEqualsCanonicalNumber(actual,expected), then it's 'if actual'=+expected { FAILED }'. That one will pass only the value 'actual' if it exactly is the canonicalized expected value (and thus could be compared to that number with the = operator). If you want AssertIsCanonical(actual), that's 'if actual'=+actual { FAILED }'. That one, of course will pass any number in its canonical form.

score 0 · Answer 8 · 2017-08-30T05:32:19-04:00

Thanks Ray, but coercing the unit test to force a pass will cloak the underlying problem.

Let me expand on my original post, we know in COS that many variables start out as stringy values, no matter if they contain a number or not...

set price=$piece(^stock(321),"~",2)
set stock.price=%request.Data("price",1)

COS coercion operators do a pretty good job at dealing with numbers inside strings, except that there is an inconsistency in the equality operator when comparing floating point numbers. If "1.5"=1.5 is true, then arguably "0.5"=0.5 should also be true, but it is not. This means that developers should be wary of automatic equality coercion on floating point numbers.

To compound this problem, the COS compiler will ignore the following two potential problems...

1. Assigning a stringy value to a %Float property
2. Returning a stringy value from a method with return type of %Float

Which can lead to a false understanding of what a developer is dealing with.

To make things a little more interesting, a persistent object will automatically coerce a %Float property to a true number value when saved. That's fine, but what if the developer is unaware that he / she is assigning a stringy float value and later performs a dirty check between another stringy float value and the now saved true float number. The code could potentially be tripped up into processing some dirty object logic when nothing has changed.

As developers we need to code defensively around this problem, probably the best thing that we can do is always manually coerce a variable / property at source when we know it's a floating point value...

set price=+$piece(^stock(321),"~",2)
set stock.price=+%request.Data("price",1)

But, since we are not perfect, and the compiler won't help us, it's easy to consider that a few bugs might slip through the net.

This is where unit testing and good code coverage should highlight these exact types of problems. In this instance, a unit test should fail if the two values are not both the same type and value. So the implementation of AssertNumberEquals should check both type and value. Therefore, both of the following comparisons should fail...

"1"=1

"0.12345"=0.12345

This is why as I originally posted that (+"0.12345"=0.12345) is not the right answer, as it will create a false positive.

So the question boils back to, how do you detect the runtime type of a variable or property.

One solution that I have come up with so far would roughly look like...

ClassMethod AssertNumberEquals(v1, v2) As %Boolean
{
   set array=[]
   set array."0"=v1
   set array."1"=v2
   if array.%GetTypeOf(0)'="number" quit 0
   if array.%GetTypeOf(1)'="number" quit 0
   if v1'=v2 quit 0
   quit 1
}

Except that it is dependent on recent versions of Caché.

What I need is a similar solution that would be backwards compatible.

score 0 · Answer 9 · 2017-09-01T04:01:14-04:00

Agreed, this is a highly specialised use case, specifically for unit testing against potential floating point equality errors. Using IsString() as a day to day function would in most cases be a bad thing.

Just to clarify, the sorts after suggestion does NOT work, whilst it can detect stringy fractions, it does not work for even the simplest floating point number...

>w "1.1"]]$c(0)
0

> The difference is that the method above will fail numbers in canonical form

Do you have a specific condition where it will fail, I tested 1.6e+8 fractional number tests without any problem, so obviously concerned that there are conditions where it fails that I have not thought about yet.

score 2 · Answer 10 · 2017-08-28T09:36:33-04:00

I hesitate to comment on this because you know the answer, but it seems that if you're trying determine if a value is a number in canonical form, it's hard to beat testing that (x=+x).

I don't think we should be so excited about the suggestion for sorts after $c(0), because that introduces dependencies on the the current local collation strategy. Whatever answer you choose, I think you should require it to be invariant

score 0 · Answer 11 · 2017-09-01T09:57:25-04:00

On reflection I agree, unit testing simple return types is pointless.

It's only return objects with %Float properties that would need to be unit tested for type as well as value...

Class Test.Types Extends %Persistent
{

Property Amount As %Float;

ClassMethod foo() As Test.Types
{
   set data="0.0,0.1,0.2"
   set test=..%New()
   set test.Amount=$p(data,",",2)
   quit test
}

}

>s x=##class(Test.Types).foo()
>w x.Amount
0.1
>w x.Amount=0.1
0

score 1 · Answer 12 · 2017-09-01T10:19:24-04:00

Continuing this with your example, I'm saying you have to consider what you'd want the following to return.

>s y=+x.Amount w y ; y is now canonical form, and internally a float
.1
>s $p(tmp,",",2)=y,y=$p(tmp,",",2) w y=0.1; y is still canonical so it's =, but internally a string
1
>w AssertNumberEquals(y,0.1)
???? what's this going to return

score 0 · Answer 13 · 2017-09-01T10:40:07-04:00

I want the float member to be a canonical number, not a string.

So a unit test would look like...

AssertNumberEquals(x.Amount,0.1)

which would fail, this would require a change in the method code to...

set test.Amount=+$p(data,",",2)

Which means the unit test will now pass, and quirky things won't happen downstream.

score 1 · Answer 14 · 2017-09-01T10:50:05-04:00

Ah, I think we found the confusion! Canonical number and internal type are different concepts. A canonical number can have internal string type. An internal numeric type (int, float, double) will always be canonical. What do you want your assert to say if your method did this...

 set $p(canonicaldata,",",2)=+$p(data,",",2)
 set test.Amount=$p(canonicaldata,",",2)

Now test.Amount is canonical, but also a string so

>w test.Amount=0.1,!,test.Amount=.1,!,test.Amount=".1"
1
1
1

What should your assert method say about that? OK or NOT OK. If OK, then you want v=+v. If not OK, then you want one of the tricks that breaks this abstraction

score 1 · Answer 15 · 2017-09-01T10:51:45-04:00

Ah, I think we found the confusion! Canonical number and internal type are different concepts. A canonical number can have internal string type. An internal numeric type (int, float, double) will always be canonical. What do you want your assert to say if your method did this...

 set $p(canonicaldata,",",2)=+$p(data,",",2)
 set test.Amount=$p(canonicaldata,",",2)

Now test.Amount is canonical, but also a string so

>w test.Amount=0.1,!,test.Amount=.1,!,test.Amount=".1"
1
1
1

What should your assert method say about that? OK or NOT OK. If OK, then you want it to test that actual=+expected. If not OK, then you want one of the tricks that breaks this abstraction

score 0 · Answer 16 · 2017-09-01T12:52:41-04:00

Sean Connelly · Sep 1, 2017

Thanks for the help Ray, it's been a really interesting conversation.

0 0

score 1 · Answer 17 · 2017-09-01T08:58:48-04:00

This is just definitional. By "fail" I meant generate an assertion failure and it will do so for any canonical number if it happens to be stored internally as a string. You've recently been saying this is what you want so I accept that. This is going full circle again, but on the off chance that this is helpful to you or someone else, I'll take one last shot at explaining why I think that definition is not desirable. Consider I write the following method

ClassMethod foo() As %Float {
  set x=1.1 ; x is a number in canonical form
  set $piece(a,",",1)=x
  ...  other stuff ...
  quit $piece(a,",",1)
}

This method is perfectly correct in returning a floating point number. It will also be in canonical form, so that it will test as = against any other canonical copy of 1.1 that you have. But your assertion code will say the return value of my method doesn't equal 1.1 because it happens to internally have string type. You would tell me that I should change my code to return +$piece(a,",",1) instead, but that is strictly not necessary. The difference is only visible if you break the typeless abstraction layer and find a trick (like you've done) to peek into the internals.

You can certainly define your requirement to be stricter than this as you have and say that you want to require that the number would act as a number in one of the special functions that can tell the difference ($LB, $ZH, $ZB(), dyn arrays). That's a fine definition, but it is special. So it comes down to where you check this assertion. Most COS programmers I know would not use the unary + in my method; rather they would use the unary + upon passing that value to one of aforementioned special functions.

The definition I thought you were originally going for (when you liked sorts after) would be to accept any number that will evaluate as = to a copy of itself that had been passed through arithmetic operators, and for that the answer is to test value=+value. (Side note: v=+v is better for this than sorts after $c(0) because it is invariant and meets my definition for things like "1111222233334444555566667777".)

score 0 · Answer 18 · 2017-08-30T09:38:18-04:00

Thanks Alexander, but the AssertNumberEquals method should create a failed unit test when the values are of different type. I have updated my question to be a little more clear on this.

score 0 · Answer 19 · 2017-08-31T09:40:36-04:00

Hi Sean,

OK. I don't know of any direct way to access a variable's type. Last little bit of food for thought...

Even if there were such a function, though, I'd consider it an internal detail that wouldn't necessarily be reliable. Take as a trivial example 'set x="1234",x=x+0'. Today, under the covers, x starts out as a string and then changes to an integer when it gets assigned the result of the addition operation. You could imagine a future where a compile- or run- time optimization notices that it can just leave x unchanged as its string type 1234. This is entirely an implementation detail and the optimization wouldn't violate any rules of the language. Note that in the case of "set x="0.5",x=x+0", we would be obligated to leave x as having value ".5", not "0.5" due to the canonicalization rules, but even then we're not obligated to internally make it a floating point type rather than a string type.

Would we ever really do this? I don't know. Unfortunately because there are things like $LB and $ZHEX that expose bits of these internal details in some fashion, you'd worry about compatibility implications. But fundamentally, the internal type is just a detail for the Caché virtual machine to manage internally in doing whatever it needs to do to present the typeless COS language to the application.

score 0 · Answer 20 · 2017-08-31T09:47:26-04:00

Thanks Ray.

Btw, I found a solution earlier, I've added an answer to the post.

I accept that implementations like $lb might change in the future, but I now have a backwards compatible solution that can work along side the dynamic object solution on future releases. Comparing the two outputs in itself will make a good unit test of the unit tester on installation.

score 0 · Answer 21 · 2017-08-31T02:40:44-04:00

Evgeny Shvarov · Aug 31, 2017

Hi, Ray! I Think it is fixed now.

0 0

score 1 · Answer 22 · 2017-08-31T02:35:59-04:00

Hi Ray, thanks for the long responses, these will be great for anyone new to Caché.

No imposing coding convention here, just 20 years on the rock face with Caché/COS and a good pattern of the trip hazards inherent in the language, as all languages do (love COS, no bashing here).

I've been evolving a new version of a unit test framework I have been using for years and want to make sure that it handles some of these regular trip hazards.

In this instance, I have my own backwards compatible JSON library that failed a test because it was assigning a stringy number to a %Float property in its own normalisation method...

https://github.com/SeanConnelly/Cogs/blob/master/src/Cogs/Lib/Json/Cogs.Lib.Json.ClassDeserializer.cls

If I can add a new assert method as described earlier, I can catch this type of problem upstream and prevent potential bugs leaking out into live code.

So back to the simple question, would be great if anyone at Intersystem's knows any way to check the type of a variable, I can't think of anything from my legacy ANSII M days, perhaps there is a $zu function or similar?

score 0 · Answer 23 · 2017-08-30T07:16:12-04:00

Have you looked at $IsValidNum?

ClassMethod AssertNumberEquals(v1, v2) As %Boolean
{
    if '$isvalidnum(v1) quit 0
    if '$isvalidnum(v2) quit 0
    //both are numbers -- let's compare them as numbers
    quit +v1=+v2
}

score 0 · Answer 24 · 2017-08-30T11:33:48-04:00

Sorry for all the duplicate replies... It won't seem to let me place the comment in the right place!

score 0 · Answer 25 · 2017-08-30T11:34:25-04:00

Sorry for all the duplicate replies... It won't seem to let me place the comment in the right place!

score 0 · Answer 26 · 2017-08-30T13:03:00-04:00

Sean, I think your post reveals a couple misunderstandings that relate to this problem. Let me comment on a couple, though at this point, I'm not sure how helpful I'm being to you...

If "1.5"=1.5 is true, then arguably "0.5"=0.5 should also be true, but it is not. This means that developers should be wary of automatic equality coercion on floating point numbers.

It's very important to understand what's going on here because it's central to your question. "1.5"=1.5 because 1.5 is a number in canonical form. "0.5" does not equal 0.5 because 0.5 is a numeric literal, and so that literal 0.5 gets canonicalized before being evaluated in the equals. This is exactly expected and well-defined and not really arguable. Literals are one thing, but programs are going to most likely get both sides of the equality from some calculation, string extraction, or user input. If one side of the equality was either a numeric literal or came through some numeric operation, then it is canonicalized, whereas the other side may or may not be, thus possibly failing the equality check unless you explicitly use the unary +.

To make things a little more interesting, a persistent object will automatically coerce a %Float property to a true number value when saved. That's fine, but what if the developer is unaware that he / she is assigning a stringy float value and later performs a dirty check between another stringy float value and the now saved true float number. The code could potentially be tripped up into processing some dirty object logic when nothing has changed.

I understand exactly what you're saying here, but I want to make sure that this behavior doesn't seem mysterious. All that's going on here is that saving an object invokes %Normalize for all the object properties before saving. You can do the same any time you want if you have a need to do so. Remember though that COS is a typeless language so developers should absolutely NOT expect to need to manage the type of their data. Consider that I store an integer as second comma-delimited piece of a string. Now I have a %Integer method where I'll return that piece. All is well and I do not need to use the unary +. However, your sample assert method would generate a false positive failure because the number I returned in this way internally has string type. That's not correct though, and you should not be writing code to try to expose the internal type of local variables. The fact that certain special operations must expose the internal type (like the internal $listbuild structure, $zhex, and this dynamic array typing stuff) is a detail specific to those particular functions and shouldn't be considered a backdoor to imposing types on COS, which is typeless. (BTW, I'm not 100% convinced that it's correct for "1" to become a string in these dynamic arrays, but I'm not going to get into that!)

If I can interpret your goals more generally, it sounds like you're trying to impose a coding convention that at certain places in your application, you want certain value to have been already normalized through the appropriate normalization for their datatype class, so that evaluation with the = operator can be used for logical equality. You're using %Float as a specific example of that which is interesting in that it gets into how the language canonicalizes number. But, one could easily imagine wanting the same thing for any arbitrary data type for which only the %Normalize method will do. If that's what you're really after, then you could easily write an AssertNormalizedValue(value,datatype) which generates an asssertion failure if value'=$classmethod(datatype,"%Normalize",value)... or something like that.

score 0 · Answer 27 · 2017-08-29T02:44:27-04:00

Hi Ray,

The trouble is determining if a number value is also a string type or a special number type, as comparisons can give different answers for numbers starting with a zero...

USER>s x=0.12345

USER>w (x=+x)
1
USER>s x="0.12345"

USER>w (x=+x)
0

The obvious answer is to do (+x=+x) but this does not solve how to unit test the type and value.

I agree that on reflection the dependency on local collation would not work for my unit test framework as it would restrict its scope of use, but still an interesting answer.

Any more suggestions...

score 1 · Answer 28 · 2017-08-29T05:30:45-04:00

It'd be easy enough to write a method that notes the current collation of locals for the process - ##class(%Library.Collate).GetLocalNumber() - and if it's not 5 (the number for "Cache standard") then temporarily set it - ##class(%Library.Collate).SetLocalNumber(5) - before doing the ]] test, then reinstate the noted collation if necessary.

score 0 · Answer 29 · 2017-08-29T06:08:11-04:00

I figured out that $length can detect a stringy number starting with zero that is not dependent on local collation...

Is a string type...

USER>s x="0.12345"

USER>w $l(x)'=$l(+x)
1

Is not a string type...

USER>s x=0.12345

USER>w $l(x)'=$l(+x)
0

BUT, this or "sort after" will only work for values starting with a zero.

I could use this to fix the specific generic assertion test failure I have, but it would be nice to expand the unit test methods to have an AssertNumberEquals().

It might be that I have to settle on...

>w ["1"].%GetTypeOf(0)
string

And only enable this method in supported versions.

score 1 · Answer 30 · 2017-08-31T16:17:36-04:00

I promise this is the last thing I'll say on this topic :) But..

1. This has different results than John Murray's sorts-after suggestion that you originally liked so much. And now that I understand what you're doing, I too like that suggestion much better (just make sure the local collation is what you want) since it at least plays by the COS rules. The difference is that the method above will fail numbers in canonical form just because they happen to have string type under the covers. John's suggestion will properly pass all canonical numbers regardless of how they came to be.

2. For anyone who might come along later and encounter this answer, we should warn them that this is for Sean's highly specialized purposes, relies on internal implementation details that may change, and in general is specifically intended to break an abstraction layer that COS otherwise provides.