Article
· Feb 21 9m read

Functional indices for lightning-fast queries on many-to-many relationship tables

Suppose you have an application that allows users to write posts and comment on them. (Wait... that sounds familiar...)

For a given user, you want to be able to list all of the published posts with which that user has interacted - that is, either authored or commented on. How do you make this as fast as possible?

Here's what our %Persistent class definitions might look like as a starting point (storage definitions are important, but omitted for brevity):

Class DC.Demo.Post Extends %Persistent
{

Property Title As %String(MAXLEN = 255) [ Required ];
Property Body As %String(MAXLEN = "") [ Required ];
Property Author As DC.Demo.Usr [ Required ];
Property Draft As %Boolean [ InitialExpression = 1, Required ];
}

Class DC.Demo.Usr Extends %Persistent
{

Property Name As %String [ Required ];
}

Class DC.Demo.Comment Extends %Persistent
{

Property Post As DC.Demo.Post [ Required ];
Property Usr As DC.Demo.Usr [ Required ];
Property Comment As %String(MAXLEN = "") [ Required ];
}

And our query, as a starting point:

select ID from DC_Demo.Post where (Author = ? or ID in (select distinct Post from DC_Demo.Comment where Usr = ?)) and Draft = 0

The naïve approach would just be:

  • Add bitmap indices on Author and Draft in DC.Demo.Post.
  • Add a standard index on (Usr, Post) in DC.Demo.Comment.

And this isn't a bad approach at all! For some use cases it might even be "good enough." What is IRIS SQL going to do under the hood? We can look at the query plan:

 Generate a stream of idkey values using the multi-index combination:
     ((bitmap index DC_Demo.Post.Draft) INTERSECT ((bitmap index DC_Demo.Post.Author) UNION (bitmap index DC_Demo.Post.Draft)))
 For each idkey value:
     Output the row.

Subquery C:
 Read index map DC_Demo.Comment.UsrPost, using the given Usr and Post, and looping on ID.
 For each row:
     Determine subquery result.

This isn't bad. Suppose that there are 50000 posts and each user has commented on 500 of them on average. How many global references will this query involve? Well, at minimum, three for the bitmap indices, and on average 500 in the subquery (iterating over the UsrPost index). Clearly, the subquery is the bottleneck. How can we make it faster?

The answer is to use a functional index (a subclass of %Library.FunctionalIndex) with the %FIND predicate condition (and a subclass of %SQL.AbstractFind). Our functional index will be defined in the Comment class, but won't actually contain the IDs of Comments as a typical bitmap index would. Instead, for each user, it will have a bitmap of Post IDs for which that user has at least one comment. We can then combine this bitmap very efficiently with other bitmap-indexed conditions in the Post table. Obviously this has some overhead for insert/update/delete of new comments, but the performance benefit for reads may well merit that overhead.

A functional index needs to define the behavior of the index for Insert, Update, and Delete operations, plus implement a few other methods (purge, sortbegin, sortend). A %SQL.AbstractFind implementation needs to implement methods for traversing and retrieving bitmap index chunks. For fun, we'll use a generic %SQL.AbstractFind implementation that looks at a standard bitmap index structure (given a global reference to its root node).

Note - if you don't know what a "bitmap chunk" is or this all sounds like Greek, consider reading the documentation on bitmap indexes, particularly the parts about their structure and manipulation.

Moving on to the code, DC.Demo.ExistenceIndex is our functional index:

Include %IFInclude
/// Given:
/// 
/// <code>
/// Class Demo.ClassC Extends %Persistent
/// {
/// Property PropA As Demo.ClassA;
/// Property PropB As Demo.ClassB;
/// Index BValuesForA On (PropA, PropB) As DC.Demo.ExistenceIndex;
/// }
/// </code>
/// 
/// Call from SQL as follows, given a value for PropA of 21532, to return values of PropB associated with PropA=21532 in ClassB:
/// <code>
/// select * from Demo.ClassC where ID %FIND Demo.ClassB_BValuesForAFind(21532) and <other-bitmap-index-conditions>
/// </code>
Class DC.Demo.ExistenceIndex Extends %Library.FunctionalIndex [ System = 3 ]
{

/// Returns a %SQL.AbstractFind subclass appropriate for this functional index
ClassMethod Find(pSearch As %Binary) As %Library.Binary [ CodeMode = generator, ServerOnly = 1, SqlProc ]
{
	If (%mode '= "method") {
	    Set tIdxGlobal = ..IndexLocationForCompile(%class,%property)
	    Set name = $Name(@tIdxGlobal@("id"))
	    Set name = $Replace(name,$$$QUOTE("id"),"pSearch")
		$$$GENERATE(" Quit ##class(DC.Demo.ReferenceFind).%New($Name("_name_"))")
	}
}

/// Retruns true iff a record with (prop1val, prop2val) exists.
ClassMethod Exists(prop1val, prop2val) [ CodeMode = generator, ServerOnly = 1 ]
{
	If (%mode '= "method") {
		Set indexProp1 = $$$comSubMemberKeyGet(%class,$$$cCLASSindex,%property,$$$cINDEXproperty,1,$$$cINDEXPROPproperty)
		Set indexProp2 = $$$comSubMemberKeyGet(%class,$$$cCLASSindex,%property,$$$cINDEXproperty,2,$$$cINDEXPROPproperty)
		Set table = $$$comClassKeyGet(%class,$$$cCLASSsqlschemaname)_"."_$$$comClassKeyGet(%class,$$$cCLASSsqltablename)
		Set prop1 = $$$comMemberKeyGet(%class,$$$cCLASSproperty,indexProp1,$$$cPROPsqlfieldname)
		If (prop1 = "") {
			Set prop1 = indexProp1
		}
		Set prop2 = $$$comMemberKeyGet(%class,$$$cCLASSproperty,indexProp2,$$$cPROPsqlfieldname)
		If (prop2 = "") {
			Set prop2 = indexProp2
		}
		$$$GENERATE(" &sql(select top 1 1 from "_table_" where "_prop1_" = :prop1val and "_prop2_" = :prop2val)")
		$$$GENERATE(" Quit (SQLCODE = 0)")
	}
}

/// This method is invoked when a new instance of a class is inserted into the database.
ClassMethod InsertIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ]
{
    If (%mode '= "method") {
        Set tIdxGlobal = ..IndexLocationForCompile(%class,%property)
        Set name = $Name(@tIdxGlobal@("id","chunk"))
        Set name = $Replace(name,$$$QUOTE("chunk"),"chunk")
        
        $$$GENERATE(" If ($Get(pArg(1)) '= """") && ($Get(pArg(2)) '= """") { ")
        $$$GENERATE("  $$$IFBITOFFPOS(pArg(2),chunk,position)")
        $$$GENERATE("  Set $Bit("_$Replace(name,$$$QUOTE("id"),"pArg(1)")_",position) = 1")
        $$$GENERATE(" }")
    }
}

/// This method is invoked when an existing instance of a class is updated.
ClassMethod UpdateIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ]
{
    If (%mode '= "method") {
        Set tIdxGlobal = ..IndexLocationForCompile(%class,%property)
        Set name = $Name(@tIdxGlobal@("id","chunk"))
        Set name = $Replace(name,$$$QUOTE("chunk"),"chunk")
        $$$GENERATE(" If ($Get(pArg(3)) '= """") && ($Get(pArg(4)) '= """") { ")
        $$$GENERATE("  $$$IFBITOFFPOS(pArg(4),chunk,position)")
        $$$GENERATE("  Set $Bit("_$Replace(name,$$$QUOTE("id"),"pArg(3)")_",position) = .."_%property_"Exists(pArg(3),pArg(4))")
        $$$GENERATE(" }")
        
        $$$GENERATE(" If ($Get(pArg(1)) '= """") && ($Get(pArg(2)) '= """") { ")
        $$$GENERATE("  $$$IFBITOFFPOS(pArg(2),chunk,position)")
        $$$GENERATE("  Set $Bit("_$Replace(name,$$$QUOTE("id"),"pArg(1)")_",position) = 1")
        $$$GENERATE(" }")
    }
}

/// This method is invoked when an existing instance of a class is deleted.
ClassMethod DeleteIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ]
{
    If (%mode '= "method") {
        Set tIdxGlobal = ..IndexLocationForCompile(%class,%property)
        Set name = $Name(@tIdxGlobal@("id","chunk"))
        Set name = $Replace(name,$$$QUOTE("chunk"),"chunk")
        $$$GENERATE(" If ($Get(pArg(1)) '= """") && ($Get(pArg(2)) '= """") { ")
        $$$GENERATE("  $$$IFBITOFFPOS(pArg(2),chunk,position)")
        $$$GENERATE("  Set $Bit("_$Replace(name,$$$QUOTE("id"),"pArg(1)")_",position) = .."_%property_"Exists(pArg(1),pArg(2))")
        $$$GENERATE(" }")
    }
}

/// Helper method to get the global reference for a given index's storage.
ClassMethod IndexLocationForCompile(pClassName As %String, pIndexName As %String) As %String
{
    Set tStorage = ##class(%Dictionary.ClassDefinition).%OpenId(pClassName).Storages.GetAt(1).IndexLocation
    Quit $Name(@tStorage@(pIndexName))
}

/// Purges the index
ClassMethod PurgeIndex() [ CodeMode = generator, ServerOnly = 1 ]
{
    If (%mode '= "method") {
        Set tIdxGlobal = ..IndexLocationForCompile(%class,%property)
        $$$GENERATE(" Kill " _ tIdxGlobal)
    }
}

/// Calls SortBegin prior to bulk operations
ClassMethod SortBeginIndex() [ CodeMode = generator, ServerOnly = 1 ]
{
    If (%mode '= "method") {
        Set tIdxGlobal = ..IndexLocationForCompile(%class,%property)
        // No-op
        $$$GENERATE(" Quit")
    }
}

/// Calls SortEnd following bulk operations
ClassMethod SortEndIndex(pCommit As %Integer = 1) [ CodeMode = generator, ServerOnly = 1 ]
{
    If (%mode '= "method") {
        Set tIdxGlobal = ..IndexLocationForCompile(%class,%property)
        // No-op
        $$$GENERATE(" Quit")
    }
}

}

DC.Demo.ReferenceFind is our generic %SQL.AbstractFind look-at-a-bitmap-chunk-array implementation:

/// Utility class to wrap use of %SQL.AbstractFind against a bitmap index global reference
Class DC.Demo.ReferenceFind Extends %SQL.AbstractFind [ System = 3 ]
{

/// Global reference to iterate over / consider for %SQL.AbstractFind %FIND operation methods
Property reference As %String [ Private ];
Method %OnNew(pReference As %String) As %Status [ Private, ServerOnly = 1 ]
{
	Set ..reference = pReference
	Quit $$$OK
}

Method NextChunk(ByRef pChunk As %Integer = "") As %Binary
{
	Set pChunk=$Order(@i%reference@(pChunk),1,tChunkData)
	While pChunk'="",$bitcount(tChunkData)=0 {
		Set pChunk=$Order(@i%reference@(pChunk),1,tChunkData)
	}
	Return $Get(tChunkData)
}

Method PreviousChunk(ByRef pChunk As %Integer = "") As %Binary
{
	Set pChunk=$Order(@i%reference@(pChunk),-1,tChunkData)
	While pChunk'="",$bitcount(tChunkData)=0 {
		Set pChunk=$Order(@i%reference@(pChunk),-1,tChunkData)
	}
	Return $Get(tChunkData)
}

Method GetChunk(pChunk As %Integer) As %Binary
{
	If $Data(@i%reference@(pChunk),tChunkData) {
		Return tChunkData
	} Else {
		Return ""
	}
}

}

And DC.Demo.Comment now looks like this, with two added bitmap indices (and appropriate foreign keys for good measure):

Class DC.Demo.Comment Extends %Persistent
{

Property Post As DC.Demo.Post [ Required ];
Property Usr As DC.Demo.Usr [ Required ];
Property Comment As %String(MAXLEN = "") [ Required ];
Index Usr On Usr [ Type = bitmap ];
Index Post On Post [ Type = bitmap ];
Index UserPosts On (Usr, Post) As DC.Demo.ExistenceIndex;
ForeignKey UsrKey(Usr) References DC.Demo.Usr();
ForeignKey PostKey(Post) References DC.Demo.Post() [ OnDelete = cascade ];
}

Our SQL query becomes:

select ID from DC_Demo.Post where (Author = ? or ID %FIND DC_Demo.Comment_UserPostsFind(?)) and Draft = 0

And the query plan just becomes:

 Generate a stream of idkey values using the multi-index combination:
     ((bitmap index DC_Demo.Post.Draft) INTERSECT ((bitmap index DC_Demo.Post.Author) UNION (given bitmap filter for DC_Demo.Post.%ID)))
 For each idkey value:
     Output the row.

How many global references now? One for the Author bitmap, one for the Draft bitmap, and one for the "posts for a given user" bitmap index node in DC.Demo.Comment. Now the "which posts have I been involved with?" list won't get (as much) slower as you comment more!

Disclaimer: unfortunately, the Developer Community isn't actually powered by InterSystems IRIS, so you probably shouldn't comment *that* much.

Discussion (3)3
Log in or sign up to continue