Using the Work Queue Manager to process data on multiple cores

I have a client who is in the process of developing some tests (short routine) to perform on a large data set – possibly several hundred million records  – and is looking for an efficient way to spread the workload across his CPU’s to speed up the execution time. Any suggestions or tips I can pass on?​

Absolutely investigate the Work Queue Manager:

http://docs.intersystems.com/cache20152/csp/documatic/%25CSP.Documatic.cls?PAGE=CLASS&LIBRARY=%25SYS&CLASSNAME=%25SYSTEM.WorkMgr

It’s the same code we use for parallelizing class builds and for %PARALLEL support in SQL.

  • 0
  • 0
  • 624
  • 5

Comments

First problem is with Work Queue Manager, that it does not work in CSP processes. In my project CacheBlocksExplorer, I tried to use it for async scan of database blocks. Here you can find my source, where I use it with WebSocket. Another problem is It does not work as well, if you just have only one core. 

So, you can force set how many jobs use:

Set ^||%ISC.WorkQueueMgr("ForceJobs")=2

 

Both these issues are addressed in the version we have scheduled for 2016.2.

Previousy the worker jobs are owned by the process asking for the work, in the new version we have a global pool of jobs available for any work. This allows the work queue manager to work in CSP processes. Also when you ask for a work group you can now specify how many processes you require.

Note that the 'ForceJobs' setting is unsupported and undocumented so we may remove it at some point.

One thing I am unclear on is whether the worker jobs have access to the in-memory objects of the process initiating the workers?  I have a potential use case for this that I am investigating.

Are there any practical examples of using this?

The worker jobs do not have access to the memory of the master process. These are just regular Cache processes. If you wish to share information between jobs then use globals and locks to synchronize access to pieces you need to change.

Have you got an example of how to use the Pause and Resume methods?

All the examples I've seen so far use the Queue and WaitForComplete methods, i.e. the process that has access to the workmgr object is waiting for everything to be completed and is therefore not accepting any other signals.

Please find below this test code of mine, using QueueCallBack and Wait methods that allows stopping the work, even if it's not very elegant. But it looks as if the Wait method is waiting for %exit =1 only. How would I suspend / resume?

Class DEV.WK.Tst1 Extends %RegisteredObject
{

ClassMethod Run() As %Status
{
^wk.Tst1
sc=..ManageQ()
sc
}

ClassMethod ManageQ() As %Status
{
sc=$$$OK
queue=$SYSTEM.WorkMgr.Initialize("/multicompile=1",.sc) if ('sc) sc=$$$ERROR($$$GeneralError,"Error initialising work queues: "_$system.Status.GetErrorText(sc)) sc }
for i=1:1:100 {
startchunk=i*1000,endchunk=((i+1)*1000)-1
sc=queue.QueueCallback("##class("_$classname()_").Work","##class("_$classname()_").QCallBack",startchunk,endchunk) if 'sc sc=$$$ERROR($$$GeneralError,"Error queuing up work: "_$system.Status.GetErrorText(sc)) return sc }
}
sc=queue.Wait(,.AtEnd) ^wk.Tst1("AtEnd")=AtEnd
if sc&&'AtEnd sc=$$$ERROR($$$GeneralError,"Processing halted") }
queue.Clear()
sc
}

ClassMethod Work(StartId As %Integer, StopId As %Integer) As %Status
{
^wk.Tst1("work",$j,StartId)=""
60
$$$OK
}

ClassMethod QCallBack(StartId As %Integer, StopId As %Integer) As %Status
{
if $d(^wk.Tst1("work",%job)) ^wk.Tst1("work",%job,StartId)=1 }
if $g(^wk.Tst1("halt"))=1 { %exit=1 }
$$$OK
}
 

ClassMethod Halt() As %Status
{
^wk.Tst1("halt")=1
$$$OK
}

}