Ensemble Interfaces Disk-space Usage Estimation and Purge Verification Framework

One of the topics that comes up often when managing Ensemble productions is disk space:

The database (the CACHE.DAT file) grows in a rate that was unexpected; or the Journal files build up at a fast pace; or the database grows continuously though the system has a scheduled purge of the Ensemble runtime data.

It would have been better if these kind of phenomena would have been observed and accounted for yet at the development and testing stage rather than on a live system.

For this purpose I created a basic framework that could aid in this task.

The fundamental idea behind it is that it gathers some data usage information at certain points in time in order to provide a picture of how much space was used, and the user can then take this information perform the volume-related calculations and arrive at the expected database and journal growth rates.

Apart from capturing the data usage before and after running the interface test, it also captures it a third time after a Purge has been run. Perhaps a separate post can discuss this in more detail, but in certain cases the Purge might not always be able out-of-the-box to delete all the related structures of the message bodies, so the framework also allows to find these potential "leaks" ahead of time.

I hope folks find this useful and I invite the community to extend/fix it for everyone's benefit.

It is posted on Github. See more details there and in the related class reference.

Comments

Thanks for sharing the code Tani.

IMO I think these type of monitoring should be done directly from the core Caché like from sensors in SYSMONMGR and should be provided by the system. I'm hoping to open source SAM (System Alerting and Monitoring) soon as it was demoed last year. The idea was to have a plug-&-play component to drop on all instances to monitor and an appliance that would gather those warnings and alerts.

 

Thanks Luca.

Note this is not instead of run-time monitoring and alerting.

This is intended for a one-time run (though could be iterative) at development/testing phase. In order to help plan the required disk-space (and validate proper purging). For this I thought something Ensemble-specific, as well as growth process-oriented (capture, run, capture, compare, report) was beneficial. In real-time production phase of course one would put into place the more generic core monitoring functionalities, and would have no use of this framework.

I added some functionality to register (and report) not only the global names but also the class name (using that global). Thanks Dale for the idea.

[Update is available via GitHub, link in the post above]

Here also is some sample output -

Data Usage Report
===========================

Database file size used: 1
Journal file size used: 0
Journal space used: 325108

Globals growth
----------------------
Ens.MessageBodyD                                                                     .007
Ens.MessageHeaderD                                                                   .02
Ens.MessageHeaderI                                                                   .003
Ens.Util.LogD                                                                        .004
ITest.Proxy.s0.AddressD            ITest.Proxy.s0.Address.cls                        .004
ITest.Proxy.s0.PersonD             ITest.Proxy.s0.Person.cls                         .003

Journal Profile
----------------------
Ens.ActiveMessage                                                                    26184
Ens.BusinessProcessD                                                                 3308
Ens.BusinessProcessI                                                                 1456
Ens.Configuration                                                                    424
Ens.JobRequest                                                                       132
Ens.JobStatus                                                                        112
Ens.MessageBodyD                                                                     19508
Ens.MessageHeaderD                                                                   34624
Ens.MessageHeaderI                                                                   89540
Ens.Queue                                                                            37996
Ens.Runtime                                                                          50920
Ens.Suspended                                                                        100
Ens.Util.LogD                                                                        10404
Ens.Util.LogI                                                                        12248
ITest.Proxy.s0.AddressD            ITest.Proxy.s0.Address.cls                        16096
ITest.Proxy.s0.PersonD             ITest.Proxy.s0.Person.cls                         8624

Globals remaining after purge
----------------------
ITest.Proxy.s0.AddressD            ITest.Proxy.s0.Address.cls                        .004
ITest.Proxy.s0.PersonD             ITest.Proxy.s0.Person.cls                         .003

Here are a couple more hints:

1) Review your code for $$$LOGINFO,  $$$LOGWARNING, $$$LOGERROR and $$$TRACE statements. Remove those that might have been useful during development and testing

2) If you create temporary globals make sure they are mapped to CACHETEMP

3) Depending on the nature of your application is there transactional data that can be summarized for future analysis and purge the original class data. For example: I have a production that processes Pharmacy Prescriptions. When I unpack the HL7 OMP message I create a hierarchy of parent-child classes objects that consisting of data from the HL7 message as well as  chunk of of other data, flags, etc... Once the order has been processed most of this data becomes irrelevant and so I extract the salient data, Patient ID, Prescription ID, Item Code, Quantities ordered and Quantities Dispensed and an overall Order Status. I write this to an archive class and purge the original data after 7 days.

See this related post with a class that could help with auto-generating the %OnDelete method for your classes, that could make sure your message references to persistent classes get deleted together with the purge of the message body.