Article
Timothy Leavitt · Jun 28 2m read

Unique indices and null values in InterSystems IRIS

An interesting pattern around unique indices came up recently (in internal discussion re: isc.rest) and I'd like to highlight it for the community.

As a motivating use case: suppose you have a class representing a tree, where each node also has a name, and we want nodes to be unique by name and parent node. We want each root node to have a unique name too. A natural implementation would be:

Class DC.Demo.Node Extends %Persistent
{

Property Parent As DC.Demo.Node;
Property Name As %String [ Required ];
Index ParentAndName On (Parent, Name) [ Unique ];
Storage Default
{
<Data name="NodeDefaultData">
<Value name="1">
<Value>%%CLASSNAME</Value>
</Value>
<Value name="2">
<Value>Parent</Value>
</Value>
<Value name="3">
<Value>Name</Value>
</Value>
</Data>
<DataLocation>^DC.Demo.NodeD</DataLocation>
<DefaultData>NodeDefaultData</DefaultData>
<IdLocation>^DC.Demo.NodeD</IdLocation>
<IndexLocation>^DC.Demo.NodeI</IndexLocation>
<StreamLocation>^DC.Demo.NodeS</StreamLocation>
<Type>%Storage.Persistent</Type>
}

}

And there we go!

But there's a catch: as it stands, this implementation allows multiple root nodes to have the same name. Why? Because Parent is not (and shouldn't be) a required property, and IRIS does not treat null as a distinct value in unique indices. Some databases (e.g., SQL Server) do, but the SQL standard says they're wrong [citation needed; I saw this on StackOverflow somewhere but that doesn't really count - see also @Dan Pasco 's comment below on this and the distinction between indices and constraints].

The way to get around this is to define a calculated property that's set to a non-null value if the referenced property is null, then put the unique index on that property. For example:

Property Parent As DC.Demo.Node;
Property Name As %String [ Required ];
Property ParentOrNUL As %String [ Calculated, Required, SqlComputeCode = {Set {*} = $Case({Parent},"":$c(0),:{Parent})}, SqlComputed ];
Index ParentAndName On (ParentOrNUL, Name) [ Unique ];

This also allows you to pass $c(0) to ParentAndNameOpen/Delete/Exists to identify a root node uniquely by parent (there isn't one) and name.

As a motivating example where this behavior is very helpful, see https://github.com/intersystems/isc-rest/blob/main/cls/_pkg/isc/rest/resourceMap.cls. Many rows can have the same set of values for two fields (DispatchOrResourceClass and ResourceName), but we want at most one of them to treated as the "default", and a unique index works perfectly to enforce this if we say the "default" flag can be set to either 1 or null then put a unique index on it and the two other fields.

5
0 421
Discussion (8)3
Log in or sign up to continue

Maybe I'm missing something, but, beyond how nulls are treated, if you want parent to be unique within this definition you must define a unique index on parent (alone). The index you have defined only guarantees that the combination (parent, name) will be unique.

Even if you declare the property as required it wouldn't still solve the uniqueness requirement. 

parameter INDEXNULLMARKER;

Override this parameter value to specify what value should be used as a null marker when a property of the type is used in a subscript of an index map. The default null marker used is -1E14, if none is specfied for the datatype. However %Library.PosixTime and %Library.BigInt datatypes could have values that collate before -1E14, and this means null values would not sort before all non-NULL values.

For beauty, I would also use the value of this parameter, for example:

Class dc.test Extends %Persistent
{

Property idp As dc.test;

Property idpC As %Integer(INDEXNULLMARKER "$c(0)") [ CalculatedPrivateRequiredSqlComputeCode = {{*}=$s({idp}="":$c(0),1:{idp})}, SqlComputed ];

Property Name As %String Required ];

Index iUnq On (idpC, Name) [ Unique ];

ClassMethod Test()
{
  ..%KillExtent()
  
  &sql(insert into dc.test(Name,idp)values('n1',1)SQLCODE,! ;0
  &sql(insert into dc.test(Name,idp)values('n2',1)SQLCODE,! ;0
  &sql(insert into dc.test(Name,idp)values('n2',1)SQLCODE,!! ;-119

  &sql(insert into dc.test(Name,idp)values('n1',null)SQLCODE,! ;0
  &sql(insert into dc.test(Name,idp)values('n2',null)SQLCODE,! ;0
  &sql(insert into dc.test(Name,idp)values('n2',null)SQLCODE,!! ;-119
  
  zw ^dc.testD,^dc.testI
}

}

Output:

USER>##class(dc.test).Test()
0
0
-119
 
0
0
-119
 
^dc.testD=4
^dc.testD(1)=$lb("",1,"n1")
^dc.testD(2)=$lb("",1,"n2")
^dc.testD(3)=$lb("","","n1")
^dc.testD(4)=$lb("","","n2")
^dc.testI("iUnq",1," N1",1)=""
^dc.testI("iUnq",1," N2",2)=""
^dc.testI("iUnq",$c(0)," N1",3)=""
^dc.testI("iUnq",$c(0)," N2",4)=""

Or

Property idpC As %Integer CalculatedPrivateRequiredSqlComputeCode = {{*}=$s({idp}="":$$$NULLSubscriptMarker,1:{idp})}, SqlComputed ];

Output:

USER>##class(dc.test).Test()
0
0
-119
 
0
0
-119
 
^dc.testD=4
^dc.testD(1)=$lb("",1,"n1")
^dc.testD(2)=$lb("",1,"n2")
^dc.testD(3)=$lb("","","n1")
^dc.testD(4)=$lb("","","n2")
^dc.testI("iUnq",-100000000000000," N1",3)=""
^dc.testI("iUnq",-100000000000000," N2",4)=""
^dc.testI("iUnq",1," N1",1)=""
^dc.testI("iUnq",1," N2",2)=""

Thank you for pointing this out! I saw this in docs but believe it wouldn't work for object-valued properties.

What about defining a super parent, which is the only node with a NULL parent?

In that case every other root node has this node as a parent and filtering is easy (WHERE parent IS NULL filters only the super parent). And there's no need for an additional calculated property in that case.

There would still need to be some enforcement of the super parent being the only node with a NULL parent (and the point here is that the unique index wouldn't do that). Also finding all of the top-level nodes (assuming we could have multiple independent trees) would be a slightly more complicated.

I have to throw in my opinions and possibly a few facts regarding nulls and unique constraints.

IRIS Unique index - this is primarily a syntactical shortcut as it defines not only an index but also a unique constraint on the index key. Most pure SQL implementations don't merge the two concepts and the SQL standard doesn't define indexes. The SQL Standard does define unique constraints. Keep in mind that both IDKEY and PRIMARYKEY are modifiers of a unique constraint (and, in our world, the index defined as IDKEY is also special). There can be at most one index flagged as IDKEY and one that is flagged as PRIMARYKEY. An index can be both PRIMARYKEY and IDKEY.

There was once an SQL implementation that defined syntax for both "unique index" and "unique constraint" with different rules. The difference between them was simple - if an index is not fully populated (not all rows in the table appear in the index - we call this a conditional index) then the unique index only checked for uniqueness in the rows represented in the index. A unique constraint applies to all rows.

Also keep in mind that an index exists for a singular purpose - to improve the performance of a subset of queries. Any SQL constraint can be expressed as a query.

The SQL Standard is a bit inconsistent when it comes to null behavior. In the Framework document there is this definition:

A unique constraint specifies one or more columns of the table as unique columns. A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns.

In the Foundation document, there exists two optional features, F291 and F292. These features define a unique predicate (291) and unique null treatment (292). These features appear to provide syntax where the user can define the "distinct-ness" of nulls. Both are optional features, both are relatively recent (2003? 2008?). The rule when these features are not supported is left to the implementor.

IRIS is consistent with the Framework document statement - all constraints are enforced on non-null keys only. A "null" key is defined as a key in which any key column value is null.

Thank you Dan! The index/constraint distinction and SQL standard context are particularly helpful facts for this discussion. :)

the best part of this article is "IRIS does not treat null as a distinct value in unique indices"