Compilation gotchas and a request for change

Article

Sean Connelly · Apr 13, 2017 3m read

On the back of my recent post on writing bug-less code I wanted to raise a few suggestions (to ISC) that would help prevent certain types of bugs at compile time. I've probably missed a few, but these are the main ones in my mind. Please contribute more suggestions.

Btw, these also serve as potential gotchas for new COS developers.

I except that introducing these types of changes can cause legacy code problems. Particularly where developers do some interesting overloading of method arguments. Therefore I would see these as an option that would need to be enabled and that it won't prevent compilation (it would warn not error).

As an alternative I could see these being implemented as a Lint tool. This is something I (and perhaps the community) would be willing to do if we had an open source AST tree as a starting block (if anyone is aware of one in existence for COS).

Apologies if many of these have been raised before, but it never does any harm to keep raising them. They might seem like small issues, but the accumulative time saved by the community in tracking down these bugs will be in the thousands of man days.

1. Warn on potential <COMMAND> errors

If a method has a return type then the compiler should warn of any quit code that does not return a value.

ClassMethod HelloWorld() As %String
{
if 1 quit // <- this will cause a command error
quit "Hello World"
}

(Its interesting that studio will complain when you try and quit a for loop with a value, but it wont complain when you try and quit with nothing on the method.)

The same problem happens in reverse, if a method does not return a value then the compiler should warn if the calling code is trying to use a return value

ClassMethod HelloWorld()
{
quit //with nothing
}

Elsewhere...

set foovar=##class(Example).HelloWorld() // <- the compiler should warn about this, as it will cause a command error

2. Warn on incorrect return type

If a method has a return type (e.g. As %String) then the compiler should warn of any quit code that returns the wrong type.

3. Warn on incorrect argument type

Arguments in a method can have a declared type. Unfortunately, these type declarations are not enforced in any way. If the calling code passes in the wrong type then it can cause a) crashes, b) worse, silent problems.

ClassMethod HelloWorld(pObject As Foo.Bar) As %String
{
Write !, pObject.foo // this will fall over if passed a string
}

Elsewhere...

Do ##class(Example).HelloWorld("Hello") // this should cause a compiler warning

4. Warn on incorrect number of parameters

Calling code should supply the correct number of parameters unless the Formal Spec of the arguments include a Variadic argument, e.g.

ClassMethod HelloWorld(...) As %String

However, this would also require some kind of notation that would allow arguments to be optional, for instance TypeScript fixes this problem in JavaScript by implementing a question mark after the argument.

ClassMethod HelloWorld(pRequired As %String, pAlsoRequired As %String, pOptional? As %String) As %String

intializers and optional's would not be allowed to work together.

ClassMethod HelloWorld(pRequired As %String, pAlsoRequired As %String, pOptionalBad? As %String="") As %String

As an alternative, just an empty intializer could be used to mark an argument as optional

5. Enable auto suggestion for methods that return objects as an output of the arguments.

Do ##class(Foo.Bar).DoThing(pArg1, .pArg2)

Currently if pArg2 is an object then we have to add a #dim pArg2 to the code to get the auto suggest to work

6. Warn on properties and methods that are not members of the object.

Currently the compiler will error on trying to access a non existent property or method of this class, e.g.

Do ..ThisDoesNotExist() //causes compilation error

But if I do the same on an object instance, then I will not get any type of warning...

ClassMethod HelloWorld(pObject As Foo.Bar) As %String
{
Do pObject.ThisDoesNotExist() //wont fail until run time
}

1 blocker 1 Method declared as "ProcedureBlock = 0" User.test.cls(+6): ClassMethod RestVars(zzzzzzsource As %String, bOref As %Boolean) As %Status [ ProcedureBlock = 0 ] 2 major 2 Usage of QUIT to exit a method User.test.cls(+24): quit $$$OK User.test.cls(+28): quit $$$ERROR($$$CacheError, $ze)

Clark Matthews · Apr 14, 2017

Sean,

Mapping was just the first example that came to my mind, and probably not the best one. From two different namespaces with different mappings you could end up using different versions of some routines or classes. I'm not saying it's good practice, just that it can happen.

In addition to mappings there are a number of other things to consider, like the fact that the target can be changed after compilation. There are object code only routines, or routines where, because something has not been re-compiled, the source and object do not match, so which does the compiler use? What if the target is changed after the caller was compiled and now the call is incorrect? What if the target does not exist on the development or build machine, and will only work when combined with object code from other modules of a product in a testing environment?

One likely case where the caller and function are not both available is where hooks are implemented (like ^ZJRNFILT or ^%ZSTART in the kernel) and a product calls a method or routine in a call frame specified by it, if it exists. Then someone else is free to write the function so long as it conforms to what is expected. The module making the call would either be getting compile errors, or they would be compiling using some sort of dummy to eliminate the errors. In any case the check is useless for the developer of the caller, and the developer of the target is unlike to re-compile after each upgrade unless it is specifically documented that the call had changed.

At best, in any of these cases you would get numerous "false" errors that are going to make it easier to miss "real" compilation errors. I am not sure you really want to validate the external calls in the same step as compilation. Wouldn't you want to do this once the module is complete and connected with any other modules required for the entire package to work? This way you can be sure that all the inter-object references are correct in the environment being tested.

Clark

0 0

Arto Alatalo · Apr 13, 2017

yes, I agree, all this would be really useful to have. At least as an external Lint tool.

But I know what they answer about (6): "it is not possible because of Dynamic Dispatch".

While Dynamic Dispatch is almost never needed but <method not exist> is every day experience

1 0

Evgeny Shvarov · Apr 13, 2017

Thanks, Sean!

Have you seen the CachéQuality linter?

From CacheQuality web site:

Pricing
We have different pricing model accordingly with your needs, starting at a rate of 4.200$

To me this is a way to expensive just to know that method or property doesn't exist. Moreover, I want to know it immediately at project compile time. An external tool could parse the code during the night and tell some complex statistics but simple things must tell compiler

2 0

Evgeny Shvarov · Apr 14, 2017

An external tool could parse the code during the night and tell some complex statistics but simple things must tell compiler

@Dmitry Maslennikov mentioned that guys have simple linter addon to the Studio which does the job within compilation.

Alexey Maslov · Apr 14, 2017

I've tried it with Cache 2017.1.0. The first impressions are:
- It completely ignored all "blockers" in the class with 300+ methods, while there was a many of them it should not pass by: 'not procedureblock', legacy flow control, etc; maybe some internal limit was exceeded.

- On the smaller class it popped rather funny messages, e.g.

Well, 'ProcedureBlock = 0' can be considered not a good style, but what if I need it in some special case? This very method restores local variables stored in a global, so it would loose all its sense be it 'procedureblock'. IMHO, there should be a facility to protect such special case methods.

'Usage of QUIT to exit a method': QUIT is standardized command to exit methods/routines/etc. IMHO it's better than RETURN as it prevents coders to exit their methods from inside the loops and try/catches constructs, so it encourages the modular coding: each module (method) should have one enter and one exit.

So, many of such rules seem to be subjective. Without a facility to customize the rules this linter seems to be no more than a demo of some commercial product.

P.S. As I noticed later, it never clears 'mgr/Temp/' sub-folders it creates.

Dmitry Maslennikov · Apr 14, 2017

Looks like, do you use integration in Studio? It is a Java application, and as you class is so big, maybe it working so long, and you did not wait for the result.

Looks like, do you use integration in Studio?

Yes, I do.

...maybe it working so long, and you did not wait for the result.

Not too long, just a few seconds. It provided an empty report, and AFAIR there was some record in its internal log that no problems were found.

Sean Connelly · Apr 13, 2017

It does look interesting. I might find a few of the rules a little querulous, but there are a some gems in there, this ones priceless :)

Property of type %String without a MAXLEN

For what I want a 100 line linter would suffice for studio output. But I can see value in cachéquality if I was back managing a large team of developers again.

Alexey Maslov · Apr 13, 2017

... studio will complain when you try and quit a for loop with a value

QUIT from inside a loop is considered quitting a loop rather then a function, so it should always be without a value.

...but it wont complain when you try and quit with nothing on the method

According to language definition, each function can be called as a function (set x=$$function(...)) or as a routine (do function(...)). The call type can be recognized inside the function using a construct:

quit:$quit ReturnValue quit ; $quit=1 if called as a $$function()

Methods are compiled to functions and behave the same way.

> QUIT from inside a loop is considered quitting a loop rather then a function, so it should always be without a value.

100% agreed.

My point was if the compiler will go as far as protecting the developer from this type of quit misshap, then could the compiler not also warn on potential quit <COMMAND> errors.

My point was if the compiler will go as far as protecting the developer from this type of quit misshap

It seems that this kind of checkup would be difficult to implement because of variety of methods how the code could be branched depending on its call type. E.g.

 if $quit {
     ...
     quit rc
 } else {
     ...
     quit
 }

 quit:$quit rc
 ...
 ...
 quit

Of course, both code fragments demonstrate not very good coding style, but they are semantically correct. If one gets many false positives from a (hypothetical) code checker, he would likely drop it.

I'm still trying to get my head around this.

Are you saying that developers actively use $quit inside method code?

It may depend on local coding traditions. Just for example, quick search using Studio's <Ctrl-F> through one of our apps (comprised 1000+ classes) showed:

 q:$q <lvn> q            ; Found 4501 occurrence/s in 90 file/s
 quit:$quit <lvn> quit   ; Found 126 occurrence/s in 28 file/s.
 if $quit                ; Not found.

So, it turned that about 10% of our classes used $QUIT. Not too many, but not negligible few.

wow, that's interesting, never seen it used in methods that way, I guess its one way to work around the command error

so as a fringe case, if the lint tool found a $quit in a method then it would assume the developer knows what they are doing and ignore the whole method, that way no false postives

I just wanted to add that some occurrences of `quit:$quit value quit` idiom was probably inherited from legacy coding practices, as Caché allows calling methods/functions using DO command (thus ignoring the value being returned).

From the other hand, using of `quit:$quit "" quit` construct looks like an attempt to amend erroneous $$ calls of methods/routines, which did not return any value by their initial design. One might say that such design was initially wrong, as any method should return something, at least %Status which is always $$$OK.

Clark Matthews · Apr 13, 2017

HI,

The first though I have is how, for any reference outside of the routine or class being compiled, could the compiler be sure of what the routine and package mappings are going to be like at run time. Most likely in a different environment.

For example, even in a single instance of Cache, it is possible that for one user an argument type, argument, or return type would be correct and for another it would be incorrect. This is because they are each using different mappings, maybe running is different namespaces, at run time. You will notice that if you DO a non-existent routine or label in another routine it will compile, however if you call a non-existent label within the current routine you get a <NOLINE> at compile time.

Having this only work within the current item seems to me to be of limited value. So I am thinking that this would be a show-stopper for doing this at compile-time. You would need something, like a 'lint', that could be run in a full test environment or even in production.

Sean Connelly · Apr 14, 2017

Hi Clark,

Can you provide a more concrete example.

In particular, why would mappings change the formal spec or return type of a method.

Are you using mappings in some kind of environmental polymorphism?

For me the implementation should never be affected by mappings. Mappings are just a convenience, not a logical influence.

I'm struggling to understand the limitation that you are suggesting?

Sean Connelly · Apr 15, 2017

All of the uses cases are around the implementation of instance methods and static methods.

In all cases class A either extends or uses class B. For class A to compile, class B must be present and compiled first, no matter what module or namespace it belongs to. If class B implements a generic interface, then all variances of class B should adhere to the same interface (despite being a manual verification check at the moment).

The types of examples you have raised sound like you are hot calling functional code, in which case I agree it would be very hard to lint this style of coding.

Fabian Haupt · Apr 17, 2017

Hi,

your suggestions #2 and #3 would work against the weak typed character of the language. I.e. a method like this:

ClassMethod test() as %String { q 23}

Will work perfectly fine, no matter if you use the return value a string or integer. You can actually also return values from methods that don't define a return type at all.
As you already point out, changing any of this behaviour would likely break quite a lot of existing software, so it's not really feasible.

Arto Alatalo · Apr 17, 2017

so it's not really feasible.

It is not possible to change the behaviour but I think it is possible to add some optional compiler function to give a warning when any type mismatch detected

Tomas Vaverka · Apr 18, 2017

I think you should also consider other factor - performance. If we implemented all kinds of these checks/warnings, it would slow down the compiler, which is not the way we would like to go.

Sean Connelly · Apr 18, 2017

Hi Tomas,

Its a good point, but performance doesn't have to be an issue.

In the main, the lint tool would only need to check the class that is being edited and compiled at that time. If coded optimally (e.g. memoize dictionary lookups etc) then is should only add milliseconds to a compilation cycle. This would hardly be noticed by the developer.

There is a use case where the edited class could break implementations of that class. In this instance a "lint all" process could be hooked into various other non disruptive steps, such as running a full unit test.

Losing milliseconds at compile time is a small trade off to the collective time lost on supporting these types of errors.

Evgeny Shvarov · Apr 18, 2017

Lint can be time-consuming for sure. But it can be called not with every compile but e.g. before every commit.