Article
· Apr 6, 2023 7m read

A beginners guide to Orphaned data- How as a trust we cleaned up 200+gb

Purpose of this article

There are two great WRC best practice articles Ensemble Orphaned Messages | InterSystems Developer Community | Best
 and the delete helper post DeleteHelper - A Class to Help with Deleting Referenced Persistent Classes (intersystems.com)
that go into orphaned records and how to deal with orphans. This article is not a meaning to replace these written by Intersystems professionals but to build on this and how as a trust we used both of these to help get our database into a more compact size using this information and other discussions, including our methods of actually cleaning up this data 

The scenario

Our backups were getting larger and larger. We had a case of one server having to be force failed over at the start of the year and we required a restore. Having a large db made even copying this database take a very long time never mind a restore to rebuild the shadow server. So we had to decide to finally tackle this growth. It had already been identified initial cause 

  1. The out of the box task either at some point it was assumed to be ran but did not have message bodies ticked. This is because when querying one such message body we got ID 1 from about 10+ years ago . This task was the default Ens.Util.Tasks.Purge as referenced in the best practice. This leads to tip 1 in the process 

Understand your data 

What data are you storing in your database? Do you have data that has to stay in tables of records. For your transactional data by which I mean the messages linked to Message headers you can query this using 

SELECT Distinct(MessageBodyClassName) from Ens.MessageHeader

From this initially i would look at the message classes i would open up these and understand what global these are being stored in 

This gives you some level of understanding of where your data is stored.

Note, if you are saving directly to Streams such as %LIBRARY.GLOBALBINARYSTREAM this is one indication of orphans as these should be saved into stream containers we will cover this later 


Running Global size report
One tool of seeing your db sizes at a glance is runnning GSize this will give some indication of where data is stored in your db. In terminal run the following steps 

do ^%GSIZE
Directory name: NAMESAPCE/ =>
All Globals? No => YES
Show details => NO

This can be an indication of where data is being used up to help guide you. 

Step 2 Stopping future Orphans 

This is covered pretty well in the best practice guide but to break down the steps we covered. 

  1. Find classes directly sending Cache streams and migrating the code to use a stream container. 
  2. Find embedded classes not dealt with- adding on manual deletion and on delete helper tool 
  3. Looking at a few %OnSaves for hl7 messages 
Step 3 Cleaning up orphnans
  1. Running a cleanup task daily 
  2. Cleaning up Streams and other misc orphans 

 

Classes using direct stream classes 

It was noted by  Suriya Narayanan that any %libary classes should not be directly used. Streams end up in ^Ens.Stream if not contained. This data is not stored with good references so you have to look through the global to find what is the last data you want to keep. 

In our scenario it was a BP sending a message to a operation instead of sending your stream add it to a container. 

set HTMLDocument=##class(%Library.GlobalBinaryStream).%New()
set tSC=..SendRequestSync(..DocmanRouterName,HTMLDocument, .aResponse,20,"")
//needed sent instead as container
set requestContainer = ##class(Ens.StreamContainer).%New()
set tSc=requestContainer.StreamSet(HTMLDocument)
set tSC=..SendRequestSync(..DocmanRouterName,requestContainer, .aResponse,20,"")
%Saves

Not all % saves of classes before sending causes the issue, only if they are then not sent. It can happen in amending a copy of a hl7 often saved and perhaps saving an interim that does not get sent. Below was a stream example after manipulation for hl7 this was not saved so created orpahans. I will not after sometimes you have cleared your orphans and monitor some of these will be clearer, the below %save object was never used so would be an orpahan as only the hl7 was sent. There is a %new object prior to this 

 Embedded objects

This is referenced heavily in the Delete helper document  below is an example. XML messages are notorious for this

Class Messages.XML.GenericWif.fileParameters Extends (%Persistent, %XML.Adaptor)
{
Property revisionNumber As %String;
Property primaryLink As Messages.XML.GenericWif.primaryLink;
Property additionalIndexes As Messages.XML.GenericWif.additionalIndexes;

When this object is set in the tie and cleaned up only Messages.XML.GenericWif.fileParameters gets deleted you have two approaches 

1) You go into each object and add the class method and OnDelete SQL triggers it checks for subobjects exisiting and deletes them. You can check by either using objectscript or sql 

ClassMethod %OnDelete(oid As %ObjectIdentity) As %Status [ Private ]
{
      // Delete the property object references.
      Set tSC = $$$OK, tThis = ##class(Messages.XML.GenericWif.fileParameters).%Open(oid)
      If $ISOBJECT(tThis.primaryLink) Set tSC = ##class(Messages.XML.GenericWif.primaryLink).%DeleteId(tThis.primaryLink.%Id())
      If $ISOBJECT(tThis.additionalIndexes) Set tSC = ##class(Messages.XML.GenericWif.additionalIndexes).%DeleteId(tThis.additionalIndexes.%Id())
      Quit tSC
}

/// Callback/Trigger for SQL delete
Trigger OnDelete [ Event = DELETE ]
{
      // Delete the property object references. {%%ID} holds the id of the record being deleted.
      Set tID={%%ID}
      Set tThis = ##class(Messages.XML.GenericWif.fileParameters).%OpenId(tID)
      If $ISOBJECT(tThis.primaryLink) Do ##class(Messages.XML.GenericWif.primaryLink).%DeleteId(tThis.primaryLink.%Id())
      If $ISOBJECT(tThis.additionalIndexes) Do ##class(Messages.XML.GenericWif.additionalIndexes).%DeleteId(tThis.additionalIndexes.%Id())
      Quit
}

for objectscript you open up one of your ids, we copied on a test system or non production live and for instance type query an oldest record and then query the other tables in this example Messages_XML_GenericWif.primaryLink in sql. And then you can see if you can open the sub ids from that . i.e. you've added the code you delete id 1 of fileParameters which has embedded messages. 

set a = ##class(Messages.XML.GenericWif.primaryLink).%OpenId(2)
zw a 
//this output the info. Now delete parent
set tSC=##class(Messages.XML.GenericWif.fileParameters).%DeleteId(1)
set a = ##class(Messages.XML.GenericWif.primaryLink).%OpenId(2)
zw a 
//a should be ""

Option 2 is the deleteHelper. What this does is adds the %onDelete Classmethod not the sql onto the .int for the code. You can do the same testing as above basically, it is useful All you do is add the deleteSuper class in the extends of your message class i.e. 

Class Messages.BoltonRenal.Pathology.Outbound.PathologyResult Extends (Ens.Request, SRFT.Utility.DeleteHelper.OnDeleteSuper)
{
Property requestingClinician As %String;
Property department As Messages.BoltonRenal.Pathology.Outbound.Department;
// the on deletesuper is a class like this in the link
include Ensemble
/// A class to help assist in "deep" deleting of an instance, including references to other persistent classes.
/// <br><br>
/// To use simply add as a Super Class in your persistent class<br>
/// The class defines a Generator %OnDelete method that will generate code for your class, 
/// deleting, if needed, references (inclduing collections) to other persistent classes<br>
ClassMethod %OnDelete(oid As %ObjectIdentity) As %Status [ CodeMode = objectgenerator, Private, ServerOnly = 1 ]
{

	// a list ($ListBuild format) of "simple" (non-collection) property names we'll want to delete the references of (because they're Persistent)
	Set delPropNames = ""
                           

If you go to routines and the message and the .int you will start to see embedded objects getting deleted. i.e. 

 

Cleaning messages

After all this we wrote our own class to purge messages outside of the 150 days. The idea was as follows 

  • Set your custom messages types to delete in a data lookup table (an upgrade would be to store this in a table to make the code cleaner
  • Manually set the oldest ens message body to delete. the rest could be figured out from SQL 
  • Global data created to store results. 
  • Manually clean streams and other remanants

Again delete at your own risk 

The idea is you have your list of message body types (your base is probably HL7 and Message body details) (just realised misspelt the lookup) 

The code loops through this and gets minimum ID from the header (for message body you need to eyeball your oldest custom message saving to messagebody) and deletes through 

if msgBodyName="Ens.MessageBody"{
            set tMinMsgId=..MinBodyID

}else
{
    set tMinMsgId=..GetMinimumIDForMessage(rs.MessageBodyClassName)
}

// Min ID is just basically this query   set minIDQuery="SELECT  TOP(1) MessageBodyID  FROM Ens.MessageHeader where MessageBodyClassName=?"

And deletes it. Has added code around it to log 
  SET tSC1=$CLASSMETHOD(className,"%DeleteId",tResult.ID)
 

You can trial how many of each type to delete. We went 500000  for when running hl7 and message bodies and 7200000  a run when just running message bodies to run after the normal cleaning process

Again test this against a copy of your live Database first (non production or dev system) 

You don't want to delete too much to make journals too big so is a trial and error

 

class used attached- I accept no responsibility for if used and lose data required, test throughly prior to use

Cleaning Streams

We have a little task that can be used to just kill global streams. Update the numbers and run . Be very clear what the latest stream number you should keep is, also only works if you streams left are sequential 

 

ClassMethod StreamPurge() As %Status
{
	set i=1
	while (i<200000){
		k ^CacheStream(i)
		s i=i+1
	}
	q $$$OK
}
Discussion (1)1
Log in or sign up to continue
Class User.Utility.Tasks.PurgeCustomMessages Extends %SYS.Task.Definition
{

Parameter TaskName = "PurgeCustomMessages";

Property MaxDelete As %String [ InitialExpression = 1 ];

Property MinBodyID As %String [ InitialExpression = 1 ];

Method OnTask() As %Status
{
    //Get The Types of message classes
    Set sc = $$$OK
    s tMessagesToOrpahan=0
    //open up table of messages types to purge 
    try{
        Set rs = ##class(%SQL.Statement).%ExecDirect(,"select Keyname as MessageBodyClassName,Datavalue as ObjectSQLQryName from Ens_Util.LookupTable where tablename='OrpahnedMessagePurging'")
    }
    catch err {
       Set sc = err.AsStatus() 
       //quit st
    }
    
    While rs.%Next() 
    {
        set msgBodyName=rs.ObjectSQLQryName
        set className=rs.MessageBodyClassName
        if msgBodyName="Ens.MessageBody"{
            set tMinMsgId=..MinBodyID

        }else{
        set tMinMsgId=..GetMinimumIDForMessage(rs.MessageBodyClassName)
        }
        Set tSuccess=0
        Set tFailed=0  
        set dt=$translate($zdatetime($horolog,8,1), " :", "")
        if tMinMsgId '=-1
        {
            set msgQuery="SELECT top ? ID from "_msgBodyName_" where id <?"
            set tStatement = ##class(%SQL.Statement).%New()
            set qStatus = tStatement.%Prepare(.msgQuery)
            if qStatus '= 1 {
                s ^OrphanedMsgTask(dt,msgBodyName,"Error") ="Prepare Failed"
                s sc= $System.Status.DisplayError(qStatus) quit}
            
            set tResult = tStatement.%Execute(..MaxDelete,tMinMsgId)
           
           While tResult.%Next() 
           {
                s tMessagesToOrpahan=1

                            try{
                                SET tSC1=$CLASSMETHOD(className,"%DeleteId",tResult.ID)
                            
                            if tSC1=$$$OK{
                                set tSuccess=tSuccess+1

                            }
                            else{
                                set tFailed=tFailed+1
                                s ^OrphanedMsgTaskFailed(className,tResult.ID)=tSC1
                            }
                            }
                            catch 
                            {
                                set tFailed=tFailed+1
                                set sc = $$$ERROR($$$GeneralError,"DeleteID Failure"_tResult.ID)
                                s ^OrphanedMsgTask(dt,msgBodyName,"Error") ="DeleteID Failure"
                            }
           }
        }
        else
        {
            s ^OrphanedMsgTask(dt,msgBodyName,"Status")="No messages in header table for "_msgBodyName
        }
        
      s ^OrphanedMsgTask(dt,msgBodyName,"Status")="Purged for "_msgBodyName  _" "_tSuccess_" messages and failed to purge " _tFailed _ " messages"   
    }
   
    quit sc
}

/// Method to return the minimum ID for the message type you are purging 
ClassMethod GetMinimumIDForMessage(pMessageType As %String) As %String
{
    set MessageID=-1
    set minIDQuery="SELECT  TOP(1) MessageBodyID  FROM Ens.MessageHeader where MessageBodyClassName=?"
    set tStatement = ##class(%SQL.Statement).%New()
    set qStatus = tStatement.%Prepare(.minIDQuery)
    if qStatus '= 1 {write "%Prepare failed:" do $System.Status.DisplayError(qStatus) quit}
    set tResult = tStatement.%Execute(pMessageType)
    While tResult.%Next() 
    {
    set MessageID=tResult.MessageBodyID
    Write tResult.MessageBodyID, !
    }
    w MessageID
    return MessageID
}

ClassMethod StreamPurge() As %Status
{
    set i=1
    while (i<200000){
        k ^CacheStream(i)
        s i=i+1
    }
    q $$$OK
}

ClassMethod Test() As %String
{

    set msgQuery="SELECT top ? ID from EnsLib_HL7.Message where id <?"
            set tStatement = ##class(%SQL.Statement).%New()
            set qStatus = tStatement.%Prepare(.msgQuery)
            s ^OrphanedMsgTask = qStatus
            if qStatus '= 1 {write "%Prepare failed:" do $System.Status.DisplayError(qStatus) quit}
            s ^OrphanedMsgTask =tStatement
            set tResult = tStatement.%Execute(1,..GetMinimumIDForMessage("EnsLib.HL7.Message"))
            While tResult.%Next() 
            {
                w tResult.ID
            }
            quit qStatus
}

/// Classmethod to create a list of the message types 
ClassMethod MessageTypes() As %ListOfDataTypes
{
    Set sc = $$$OK
    Set rs = ##class(%SQL.Statement).%ExecDirect(,"SELECT DISTINCT (MessageBodyClassName) AS messageType FROM Ens.MessageHeader where MessageBodyClassName Like 'messages%'")
    set messageTypeList=##class(%ListOfDataTypes).%New()
    While rs.%Next() {
        do messageTypeList.Insert(rs.messageType)
    }
    w messageTypeList.Count()
    for i = 1:1:messageTypeList.Count() {
        write !, messageTypeList.GetAt(i)}
}

No idea how to upload a file so here is the class