Intro
In the process of trying to get more familiar with Objectscript I decided to try to build a general priority queue since I wasn't able to find an implementation anywhere. My though process for implementing this followed this general path.
Binary Heap
My first though to try to make this work quickly was to implement a binary heap and my first attempt at this used a multidimensional to build it. This version was relatively efficient, worked for strings/numbers natively, and worked for objects be letting the user override the comparitor.
Class pqueue.Queue Extends %RegisteredObject
{
Property Data As %Any [ MultiDimensional ];
Property Size As %Integer [ InitialExpression = 0 ];
Property Comparitor As %String [ InitialExpression = "(a,b) return a < b" ];
Method Swap(i As %Integer, j As %Integer) As %Status [ Private ]
{
set temp = ..Data(i)
set ..Data(i) = ..Data(j)
set ..Data(j) = temp
}
Method Comp(x As %Any, y As %Any) As %Boolean [ Private ]
{
return $XECUTE(..Comparitor, x, y)
}
Method PercolateUp(idx As %Integer) [ Private ]
{
while idx > 0 {
set newidx = (idx-1)\2
if ..Comp( ..Data(idx), ..Data(newidx) ) do ..Swap( idx, newidx )
else Quit
set idx = newidx
}
}
Method PercolateDown() [ Private ]
{
set idx = 0
while ((idx+1)*2) < ..Size {
if ..Comp( ..Data(idx*2+2), ..Data(idx*2+1) ) set newidx = idx*2+2
else set newidx = idx*2+1
if ..Comp( ..Data(idx), ..Data(newidx) ) Quit
do ..Swap( idx, newidx )
set idx = newidx
}
if ( (idx*2+1 < ..Size) && ..Comp( ..Data(idx*2+1), ..Data(idx) ) ) do ..Swap( idx, idx*2+1 )
}
Method IsEmpty() As %Boolean
{
return ..Size = 0
}
Method Top() As %Any
{
if ..IsEmpty() return ""
return ..Data(0)
}
Method Put(inp As %Any) As %Status
{
set sts = $$$OK
set ..Data( ..Size ) = inp
do ..PercolateUp( ..Size )
set ..Size = ..Size + 1
return sts
}
Method Get(Output obj As %Any) As %Status
{
set sts = $$$OK
if ..IsEmpty() {
set obj = ""
return $$$ERROR( "Cannot Get() from empty Queue" )
}
set obj = ..Data( 0 )
set ..Size = ..Size - 1
set ..Data( 0 ) = ..Data( ..Size )
do ..PercolateDown()
kill ..Data( ..Size )
return sts
}
Method GenerateComparitor(operator As %String = "<", transform As %String = "") As %Status
{
set ..Comparitor = "(a,b) return a" _ transform _ " " _ operator _ " b" _ transform
return $$$OK
}
}
After getting this working I wanted to try out different internal arrays since I assumed multidimensional arrays must be slower than a simple integer indexed array. So I modified the above code to use
* `Property Data As list Of %Any;` : This works OK but it is about 3-4 times slower than using a multidimensional making it pointless.
* `Property Data As %DynamicArray;` : I assumed this would be relatively fast but proved to be slow to the point of absurdity. When the amount of data stored in the queue is small it is around as fast as the `list of %Any`, but as the data grows the time it takes to make inserts and gets grows linearly which makes it pointless to build a heap on top of.
## Using the Multidimensional Array's Self-sorting
On its face this idea seems pretty simple, use the fact that the multidimensional array is always sorted to grab the lowest cost (highest priority) item. A problem with this are that indexing by an object simply uses the string representation of that object which isn't useful. To handle this a customer `evaluator` function is needed to return the integer or string evaluation of an object which is then sorted, it is then stored as `data( evaluation, obj_str_rep ) = object` which ensures that objects are correctly sorted even when two objects evaluate to the same value.
Class pqueue.SparseQueue Extends %RegisteredObject
{
// Parameter Comp(a,b) As $XECUTE(..Comparitor, a, b);
Property Data As %Any [ MultiDimensional ];
Property Size As %Integer [ InitialExpression = 0 ];
Property Evaluator As %String [ InitialExpression = "(a) return a" ];
Method IsEmpty() As %Boolean
{
return ..Size = 0
}
Method Top() As %Any
{
if ..IsEmpty() return ""
return $Order( ..Data("") )
}
Method Put(inp As %Any) As %Status
{
set sts = $$$OK
set ..Data( $XECUTE(..Evaluator, inp), inp ) = inp
set ..Size = ..Size + 1
return sts
}
Method Get(Output obj As %Any) As %Status
{
set sts = $$$OK
if ..IsEmpty() {
set obj = ""
return $$$ERROR( "Cannot Get() from empty Queue" )
}
set loc = $ORDER( ..Data("") )
set obj = ..Data(loc, $ORDER( ..Data(loc, "")))
set ..Size = ..Size - 1
kill ..Data( loc, obj )
return sts
}
Method GenerateEvaluator(transform As %String = "") As %Status
{
set ..Evaluator = "(a) return a" _ transform
return $$$OK
}
}
This method proved to be by far the fastest. It has some slight disadvantages in that it can be harder to write an evaluator than a comparitor and, in the form given above, it cannot hold the same object at the same value twice (this is a rare case but could theoretically be a problem or a benefit).
## Speed Test
To test these out I wrote a simple script that generates a large randomized weighted directed graph and runs [Dijkstra's shortest path algorithm](https://www.geeksforgeeks.org/dijkstras-shortest-path-algorithm-greedy-algo-7/) using a given queue object. Using a graph with `150000` vertices where each has `10` neighbors, the output is printed below. Here while the algorithm is running an update is printed every `10000` edges checked, where the time since the last 10 thousand and size of the priority queue is printed, with some final stats at the end. The order they are run in is
* Self-sorting Multidimensional
* Heap Multidimensional
* Heap Dynamic Array
* Heap List of Object
> Generated Random Graph