Best Structure to Make a Priority Queue In Objectscript

Article

Colin Langella · Mar 11 2m read

#ObjectScript #Tips & Tricks #Documentation

Intro

In the process of trying to get more familiar with Objectscript I decided to try to build a general priority queue since I wasn't able to find an implementation anywhere. My though process for implementing this followed this general path.

Binary Heap

My first though to try to make this work quickly was to implement a binary heap and my first attempt at this used a multidimensional to build it. This version was relatively efficient, worked for strings/numbers natively, and worked for objects be letting the user override the comparitor.

Class pqueue.Queue Extends %RegisteredObject
{

Property Data As %Any [ MultiDimensional ];

Property Size As %Integer [ InitialExpression = 0 ];

Property Comparitor As %String [ InitialExpression = "(a,b) return a < b" ];

Method Swap(i As %Integer, j As %Integer) As %Status [ Private ]
{
set temp = ..Data(i)
set ..Data(i) = ..Data(j)
set ..Data(j) = temp
}

Method Comp(x As %Any, y As %Any) As %Boolean [ Private ]
{
return $XECUTE(..Comparitor, x, y)
}

Method PercolateUp(idx As %Integer) [ Private ]
{
while idx > 0 {
    set newidx = (idx-1)\2

    if ..Comp( ..Data(idx), ..Data(newidx) ) do ..Swap( idx, newidx )
    else                                     Quit

    set idx = newidx
}
}

Method PercolateDown() [ Private ]
{
set idx = 0

while ((idx+1)*2) < ..Size {
    if ..Comp( ..Data(idx*2+2), ..Data(idx*2+1) ) set newidx = idx*2+2
    else                                          set newidx = idx*2+1

    if ..Comp( ..Data(idx), ..Data(newidx) ) Quit

    do ..Swap( idx, newidx )
    set idx = newidx
}

if ( (idx*2+1 < ..Size) && ..Comp( ..Data(idx*2+1), ..Data(idx) ) ) do ..Swap( idx, idx*2+1 )
}

Method IsEmpty() As %Boolean
{
return ..Size = 0
}

Method Top() As %Any
{
if ..IsEmpty() return ""
return ..Data(0)
}

Method Put(inp As %Any) As %Status
{
set sts = $$$OK

set ..Data( ..Size ) = inp
do ..PercolateUp( ..Size )
set ..Size = ..Size + 1

return sts
}

Method Get(Output obj As %Any) As %Status
{
set sts = $$$OK
if ..IsEmpty() {
    set obj = ""
    return $$$ERROR( "Cannot Get() from empty Queue" )
}

set obj = ..Data( 0 )
set ..Size = ..Size - 1
set ..Data( 0 ) = ..Data( ..Size )
do ..PercolateDown()
kill ..Data( ..Size )
return sts
}

Method GenerateComparitor(operator As %String = "<", transform As %String = "") As %Status
{
set ..Comparitor = "(a,b) return a" _ transform _ " " _ operator _ " b" _ transform
return $$$OK
}

}

After getting this working I wanted to try out different internal arrays since I assumed multidimensional arrays must be slower than a simple integer indexed array. So I modified the above code to use

Property Data As list Of %Any; : This works OK but it is about 3-4 times slower than using a multidimensional making it pointless.
Property Data As %DynamicArray; : I assumed this would be relatively fast but proved to be slow to the point of absurdity. When the amount of data stored in the queue is small it is around as fast as the list of %Any, but as the data grows the time it takes to make inserts and gets grows linearly which makes it pointless to build a heap on top of.

Using the Multidimensional Array's Self-sorting

On its face this idea seems pretty simple, use the fact that the multidimensional array is always sorted to grab the lowest cost (highest priority) item. A problem with this are that indexing by an object simply uses the string representation of that object which isn't useful. To handle this a customer evaluator function is needed to return the integer or string evaluation of an object which is then sorted, it is then stored as data( evaluation, obj_str_rep ) = object which ensures that objects are correctly sorted even when two objects evaluate to the same value.

Class pqueue.SparseQueue Extends %RegisteredObject
{

// Parameter Comp(a,b) As $XECUTE(..Comparitor, a, b);

Property Data As %Any [ MultiDimensional ];

Property Size As %Integer [ InitialExpression = 0 ];

Property Evaluator As %String [ InitialExpression = "(a) return a" ];

Method IsEmpty() As %Boolean
{
    return ..Size = 0
}

Method Top() As %Any
{
    if ..IsEmpty() return ""
    return $Order( ..Data("") )
}

Method Put(inp As %Any) As %Status
{
    set sts = $$$OK

    set ..Data( $XECUTE(..Evaluator, inp), inp ) = inp
    set ..Size = ..Size + 1

    return sts
}

Method Get(Output obj As %Any) As %Status
{
    set sts = $$$OK
    if ..IsEmpty() {
        set obj = ""
        return $$$ERROR( "Cannot Get() from empty Queue" )
    }

    set loc = $ORDER( ..Data("") )
    set obj = ..Data(loc, $ORDER( ..Data(loc, "")))

    set ..Size = ..Size - 1
    kill ..Data( loc, obj )
    return sts
}

Method GenerateEvaluator(transform As %String = "") As %Status
{
    set ..Evaluator = "(a) return a" _ transform
    return $$$OK
}

}

This method proved to be by far the fastest. It has some slight disadvantages in that it can be harder to write an evaluator than a comparitor and, in the form given above, it cannot hold the same object at the same value twice (this is a rare case but could theoretically be a problem or a benefit).

Speed Test

To test these out I wrote a simple script that generates a large randomized weighted directed graph and runs Dijkstra's shortest path algorithm using a given queue object. Using a graph with 150000 vertices where each has 10 neighbors, the output is printed below. Here while the algorithm is running an update is printed every 10000 edges checked, where the time since the last 10 thousand and size of the priority queue is printed, with some final stats at the end. The order they are run in is

Self-sorting Multidimensional
Heap Multidimensional
Heap Dynamic Array
Heap List of Object

Generated Random Graph

Running 1664388@pqueue.SparseQueue
Done Initial Step
Checking edges

10000 - 3.1151779 - 86709
20000 - 3.6075209 - 166999
30000 - 2.9923264 - 241546
40000 - 3.4856531 - 310390
50000 - 3.1714395 - 374396
60000 - 4.9496489 - 433555
70000 - 2.3947319 - 489044
80000 - 3.2073625 - 539410
90000 - 3.5492076 - 586065
100000 - 3.8483492 - 629463
110000 - 1.9906381 - 669002
120000 - 2.4800312 - 705165
130000 - 3.2020008 - 738792

Found path of distance 134.957 after checking 138752 edges

Running 1664178@pqueue.Queue
Done Initial Step
Checking edges
10000 - 29.0445107 - 86709
20000 - 28.278281 - 166999
30000 - 27.2860397 - 241546
40000 - 26.6431472 - 310390
50000 - 26.2807624 - 374396
60000 - 28.1151768 - 433555
70000 - 27.5075059 - 489044
80000 - 29.4012881 - 539410
90000 - 27.4369653 - 586065
100000 - 26.743563 - 629453
110000 - 27.0132265 - 669002
120000 - 26.9039187 - 705175
130000 - 26.8454762 - 738782

Found path of distance 134.957 after checking 138753 edges

Running 1664308@pqueue.DynamicQueue
Done Initial Step
Checking edges
10000 - 83.6996966 - 86709
20000 - 136.8056452 - 166999
30000 - 176.4097968 - 241546
40000 - 228.9383939 - 310390
50000 - 250.5338934 - 374396
60000 - 268.3424439 - 433555
70000 - 279.7747292 - 489044
80000 - 290.0156605 - 539410
90000 - 295.9455988 - 586065
100000 - 282.0121209 - 629453
110000 - 306.5787604 - 669002
120000 - 302.9471951 - 705175
130000 - 299.4361629 - 738782

Found path of distance 134.957 after checking 138753 edges

Running 1664237@pqueue.ListOfQueue
Done Initial Step
Checking edges
10000 - 127.688423 - 86709
20000 - 140.2352706 - 166999
30000 - 131.0675351 - 241546
40000 - 137.0392259 - 310390
50000 - 129.6285655 - 374396
60000 - 133.0306415 - 433555
70000 - 138.8908825 - 489044
80000 - 136.6955307 - 539410
90000 - 136.6123544 - 586065
100000 - 126.6696681 - 629453
110000 - 131.6500795 - 669002
120000 - 126.1802136 - 705175
130000 - 133.6258408 - 738782

Found path of distance 134.957 after checking 138753 edges

Found path that takes 134.957 in 45.157 (1664388@pqueue.SparseQueue)
Found path that takes 134.957 in 381.095 (1664178@pqueue.Queue)
Found path that takes 134.957 in 3466.382 (1664308@pqueue.DynamicQueue)
Found path that takes 134.957 in 1839.445 (1664237@pqueue.ListOfQueue)

Here we can see that the dynamic array is the only one that has significant growth as its size gets bigger (this growth doesn't continue since, as more of the graph has been checked, the odds that any node has already been visited and gets skipped would grow letting more of the work get skipped). We can also see that the self-sort method checks one less edge (in other trials I've seen this number be 0 and 10+), this is a result of two paths to the same node taking the same amount of time and therefore not being able to be stored separately, which is convenient in this case.