· Nov 30, 2016

Persistent Classes and Transient SQL Calculated/Computed properties

Hello Fellow Caché Developers,

The purpose of this post is to ask for everyone's thoughts and input around the use of transient, SQLCalculated/Computed properties within persistent classes.

This approach allows for extra data values needed in SQL queries to be available without having to join to other tables.

Very simple/basic example to illustrate the topic:

Let's say I have a persistent class named ICDAutoCodeDefn to hold ICD Auto-Coding definitions, such as:

Class ApplicationName.DB.MedicalCoding.ICDAutoCodeDefn Extends %Persistent

That has a few properties:

SpecimenPart As %String

BiopsyLocation As %String

DxCategoryCode As %String

ICDCode As %String

IsActive As %Boolean

and so on....

And let's say I would like a couple of additional (Transient - non-stored) values available for SQL queries run against this table:

DxCategoryDescription As %String [Transient, SqlComputed, Calculated, SqlComputeCode = { Set {*}=##class(AppName.BO.DatabaseFunctions).GetDxCategoryDesc({DxCategoryCode}) } ]

ICDDescription As %String [Transient, SqlComputed, Calculated, SqlComputeCode = { Set {*}=##class(AppName.BO.DatabaseFunctions).GetICDDescription({ICDCode}) } ]

Thoughts?  Comments?   About using this approach versus having to implement JOINs to other code tables?  This keeps the SQL queries simple, easy to follow, and easy to maintain.

Are there any known drawbacks to this approach?

Much Thanks in advance for everyone's time and input!

Happy Coding!!

Discussion (4)0
Log in or sign up to continue

Hi John,

Great question! There is a lot of misinformation floating around about the general topic of CALCULATED, TRANSIENT, SQLCOMPUTED, etc. I'll try to clear the air and lay a clean foundation. First, let's discuss CALCULATED and TRANSIENT.

Each property defined in a class is accessed by a "getter" and a "setter", implemented as <property>Get and <property>Set methods respectively. In most common cases these accessor methods are trivial and are implemented at a very low level in Caché simply retrieve or store a value in the instance memory allocated to the property. In fact, the <property>Get and <property>Set method generators depend on instance memory. If no instance memory is allocated then the method generators have no clue where to retrieve the property's value from or where to set (store) the property's value. The allocation of instance memory is controlled by the CALCULATED keyword. If CALCULATED is true then no instance memory is allocated and the only way to access the property value is to override the accessor methods. Commonly, only the <property>Get method is overridden. We refer to such a property (CALCULATED with overridden Get method) as a "derived property". A CALCULATED property is not implicitly settable but, in theory, such a property can be set if the <property>Set method is implemented. I don't know offhand if the theory works in practice.

Now on to TRANSIENT. TRANSIENT simply means that the property is not stored when the containing object is saved. For a subclass of %SerialObject it means the property is not part of the serialized state produced by %GetSerial. There is instance memory allocated for a TRANSIENT property, assuming the user hasn't also defined it as CALCULATED. BTW, CALCULATED implies TRANSIENT but the reverse is not true. Since instance memory is allocated to a TRANSIENT, NOT CALCULATED property, the system does generate accessors (Get and Set) and such a property is implicitly settable. 

Users can override the Get/Set accessors to inject custom behaviors for any property, CALCULATED or not, but doing so directly causes problems for SQL. (Property accessor methods run in an instance context and SQL does not.) Caché provides a mechanism for deriving a property value that can be used outside of an instance context. That mechanism is triggered by SQLCOMPUTED and SQLCOMPUTECODE. I won't go into details of how to define SQLCOMPUTECODE here.

If a property is defined as CALCULATED and SQLCOMPUTED then we refer to that property as "always computed", meaning that every <property>Get reference will cause the SQLCOMPUTECODE to be executed (more on that in a bit) to derive the current value of the property. 

If the property is defined as NOT CALCULATED and SQLCOMPUTED then we refer to that property as "triggered computed" and its value will only be derived when certain conditions are true. First, if the current value of the property is NULL then the SQLCOMPUTECODE is called to derive the current value. Second, if the value of any property in the SQLCOMPUTEONCHANGE list of properties is changed then the SQLCOMPUTECODE is called to derive an updated value. Finally, there are two special tokens that can be specified in SQLCOMPUTEONCHANGE - %%INSERT and %%UPDATE. These tokens will trigger the SQLCOMPUTECODE to derive an updated value for the property when inserting (%%INSERT)  a new instance into the extent of the class (or a new row in the SQL table) or updating (%%UPDATE) and existing instance or row.

That's the foundation. 

Determinism. The function used to derive a value is only deterministic if that function returns the same value each time it is invoked with the same inputs. (if a = b then f(a) = f(b)). A property that is NOT CALCULATED + TRANSIENT is implicitly non-deterministic in an instance context. A CALCULATED property can be deterministic but it depends on the implementation of the function to derive the value. A property whose value is derived by overriding Get, Set or both is implicitly non-deterministic in an SQL context. (this concept is similar to third and fourth normal forms in relational systems). A simple non-deterministic function is now() implemented in Caché as "return $h". A more complex example would be a function that returns a value retrieved from another table. That function is non-deterministic because the value retrieved can be updated at any time, that is, passing the same input argument to the function multiple times can produce different results.

And the short answer to your question - if you define derived properties in your class for the purpose of simplifying SQL queries then make certain you understand the implications of non-deterministic derivation. Not all non-deterministic derived values are bad. Take Age() as an example. If Age is computed based on current time then it is non-deterministic. That makes it a bad candidate for a NOT CALCULATED property but showing Age in a query is a good thing.



Another option to consider, if you have the flexibility to do so, would be putting an [IDKey] index on the code property of the code tables, having the code properties in ICDAutoCodeDefn refer to the code table classes rather than being of type %String, and then using implicit joins. I suspect (but haven't verified) that this would perform better than calculated/transient properties, and it's much easier to follow/maintain than normal JOINs.

Here's a full example. Looking at a general code table class:

Class DC.Demo.CodeTables.CodeTable Extends %Persistent [ Abstract, NoExtent ]
Index Code On Code [ IdKey ];
Property Code As %String;
Property Description As %String;

In this example, a specific code table would then extend that class, but don't need to add anything:

Class DC.Demo.CodeTables.Team Extends DC.Demo.CodeTables.CodeTable

Class DC.Demo.CodeTables.Position Extends DC.Demo.CodeTables.CodeTable

Then, in the class that refers to these code tables:

Class DC.Demo.CodeTables.Players Extends %Persistent
Property Name As %String(MAXLEN = 100);
Property Position As DC.Demo.CodeTables.Position;
Property Team As DC.Demo.CodeTables.Team;
ForeignKey Position(Position) References DC.Demo.CodeTables.Position();
ForeignKey Team(Team) References DC.Demo.CodeTables.Team();

(It's worth considering using foreign keys in cases like this.)

To demonstrate how this ends up working from an SQL perspective:

Class DC.Demo.CodeTables.Driver

ClassMethod Run()
    Do ##class(DC.Demo.CodeTables.Players).%KillExtent()
    Do ..ShowQueryResults("insert or update into DC_Demo_CodeTables.Position (Code,Description) values ('TE','Tight End')")
    Do ..ShowQueryResults("insert or update into DC_Demo_CodeTables.Team (Code,Description) values  ('NE','New England Patriots')")
    Do ..ShowQueryResults("insert into DC_Demo_CodeTables.Players (Name,Position,Team) values  ('Rob Gronkowski','TE','NE')")
    Do ..ShowQueryResults("select Name,Position->Description ""Position"",Team->Description ""Team"" from DC_Demo_CodeTables.Players")

ClassMethod ShowQueryResults(pQuery As %String, pParams...)
    Write !,"Running query: ",pQuery,!
    Do ##class(%SQL.Statement).%ExecDirect(,pQuery,pParams...).%Display()


The output from the Run() method is:

USER>Do ##class(DC.Demo.CodeTables.Driver).Run()

Running query: insert or update into DC_Demo_CodeTables.Position (Code,Description) values ('TE','Tight End')
1 Row Affected
Running query: insert or update into DC_Demo_CodeTables.Team (Code,Description) values  ('NE','New England Patriots')
1 Row Affected
Running query: insert into DC_Demo_CodeTables.Players (Name,Position,Team) values  ('Rob Gronkowski','TE','NE')
1 Row Affected
Running query: select Name,Position->Description "Position",Team->Description "Team" from DC_Demo_CodeTables.Players
Name    Position    Team
Rob Gronkowski    Tight End    New England Patriots

1 Rows(s) Affected

I like Tim's method better.  While I understand the desire to hide complexity in the SQL you are also hiding information about the work that is needed to resolve the query.  What if it would be faster if the query first looked at the AppName.BO.DatabaseFunctions table and then looked at the ApplicationName.DB.MedicalCoding.ICDAutoCodeDefn table?  When you are hiding the join in the compute code that is not an option.  Using the -> syntax keeps the query pretty simple but still leaves all options open to the Query Optimizer.