Discussion
· Feb 21, 2022

How to efficiently store historical data, similar to current data, without mixing physical storage ?

Let's consider you would like to efficiently store your historical data in a similar structure than the one used for your current data, but without sharing the same physical storage (ie : not in the same global). What is the most efficient way to do it ?

Below a simple class of your current data : 

Class data.current.person Extends (%Persistent, %Populate)
{

Parameter DEFAULTGLOBAL = "^on.person";

Property name As %String;

Property dob As %Date(FORMAT = 4);

Property activ As %Boolean [ InitialExpression = 1 ];

Property created As %TimeStamp [ InitialExpression = {$zdt($h,3)} ];
Storage Default
{
<Data name="personDefaultData">
<Value name="1">
<Value>%%CLASSNAME</Value>
</Value>
<Value name="2">
<Value>name</Value>
</Value>
<Value name="3">
<Value>dob</Value>
</Value>
<Value name="4">
<Value>activ</Value>
</Value>
<Value name="5">
<Value>created</Value>
</Value>
</Data>
<DataLocation>^on.personD</DataLocation>
<DefaultData>personDefaultData</DefaultData>
<IdLocation>^on.personD</IdLocation>
<IndexLocation>^on.personI</IndexLocation>
<StreamLocation>^on.personS</StreamLocation>
<Type>%Storage.Persistent</Type>
}

}

NB : you can omit the DEFAULTGLOBAL parameter if you're happy with the one automatically generated (which is the case, most of the time) ;  I use it in my example to emphasize the separation of globals in current and archive classes.

 

First approach

You could first think about creating 2 persistent classes and copy the same definition in both classes (except the Storage clause which will be recreated at the first compilation) :

Class data.current.person Extends %Persistent {}
Class data.archive.person Extends %Persistent {}

But this approach will rapidly reveal itself disastrous in terms of maintenance, for obvious reason of duplicating the code/definitions.

 

Second approach

Another idea could be to create a common class with ClassType = "" in order to not create the storage definition at its level. Then, you can make archive and current classes inherit from this common class and add %Persistent inheritance in both subclasses.

Class data.common.person Extends %Persistent [ ClassType = "" ] {}
Class data.current.person Extends (%Persistent,data.common.person) [ClassType = persistent ] {}
Class data.archive.person Extends (%Persistent,data.common.person) [ ClassType = persistent ] {}

This way, you solve the question and get separate globals for each persistent class with a unique class to store all the definition. 

But, this is not really convenient to manage 3 classes instead of 2 and to have to maintain the definition in the common class ; neither to be obliged to open the common class to get the detail of the definition of the current class.

 

Third approach

Well, a third solution could be to return to a very simple approach : just create an archive class which inherits from the current data class...

Class data.current.person Extends %Persistent {}
Class data.archive.person Extends data.current.person {}

In that case, current and historical data will share the same storage definition and, as a result, their data will be stored in the same globals, which is not really convenient to physically separate both kind of data and thus not answering to our need.

 

A simpler approach

You can obtain a similar structure in archive class which inherits from current, without inheriting from its storage, by simply adding %Persistent inheritance before the class you want to inherit from.

In this case, even if the super class already contains a storage class with DataLocation (because of its persistent type), at compile time, the inheritance precedence will force the generation of a new storage clause based on the subclass definition, with its own storage for DataLocation, IdLocation, IndexLocation, StreamLocation.

Class data.archive.person Extends (%Persistent, data.current.person)
{
Parameter DEFAULTGLOBAL = "^off.person";
Storage Default
{
<Data name="personDefaultData">
<Value name="1">
<Value>%%CLASSNAME</Value>
</Value>
<Value name="2">
<Value>name</Value>
</Value>
<Value name="3">
<Value>dob</Value>
</Value>
<Value name="4">
<Value>activ</Value>
</Value>
<Value name="5">
<Value>created</Value>
</Value>
</Data>
<DataLocation>^off.personD</DataLocation>
<DefaultData>personDefaultData</DefaultData>
<IdLocation>^off.personD</IdLocation>
<IndexLocation>^off.personI</IndexLocation>
<StreamLocation>^off.personS</StreamLocation>
<Type>%Storage.Persistent</Type>
}
}

 

Other ideas ?

If you have other ways to address this question, please share your ideas and comments.

Discussion (9)1
Log in or sign up to continue

I'm using a modified second approach.

1. Create a base abstract class with all properties:

Class model.PersonBase Extends (%XML.Adaptor, %JSON.Adaptor) [ Abstract ]
{

Property Name;

}

2. Create persistent class:

Class model.Person Extends (%Persistent, model.PersonBase, Utils.Copyable
{
/// Indices, fkeys and relationships go here
}

Utils.Copyable allows easy copy/compare.

3. Create snapshot class:

Class model.PersonSnapshot Extends (%Persistent,  model.PersonBase, Utils.Copyable)
{

Index parentIndex On parent;

/// Creation timestamp (UTC)
Property createdOn As %TimeStamp(POPORDER = -1, XMLPROJECTION = "NONE") [ InitialExpression = {$ZDATETIME($ZTIMESTAMP, 3, 1, 3)}, Required ];

Property parent As model.Person(XMLPROJECTION = "NONE");

Method %OnNew(parentId As %Integer) As %Status [ Private, ServerOnly = 1 ]
{
    #dim sc As %Status = $$$OK
    
    set ..parent = ##class(model.Person).%OpenId(parentId,, .sc)
    if $$$ISERR(sc) quit sc

    quit ..copyFrom(..parent, ##class(model.PersonBase).%ClassName(1))
}

}

And done.

This way you can add a snapshot of object at any point of time and by calling compareTo calculate a diff between any two versions.

There's also a hash function you can use to calculate hash from some or all object properties to speed up equivalence checks.

Thanks Robert for your comment.
 

Merging globals is exactly what the toArchive method does here :


 


Class data.archive.person Extends (%Persistent, data.current.person)
{

Parameter DEFAULTGLOBAL = "^off.person";

/// Description
ClassMethod archive(purgeArchive As %Integer = 0, purgeSource As %Integer = 0) As %Status
{
    set sc = $$$OK , tableName = ""
    set (archived,archivedErrors, severity) = 0

    set sourceClassName = $PIECE(##class(%Dictionary.ClassDefinition).%OpenId(..%ClassName(1)).Super,",",2)
    set targetClassName = ..%ClassName(1)

    set sourceClass = ##class(%Dictionary.ClassDefinition).%OpenId(sourceClassName) 
    set targetClass = ##class(%Dictionary.ClassDefinition).%OpenId(targetClassName)

    set sourceDataLocation = sourceClass.Storages.GetAt(1).DataLocation
    set sourceIndexLocation = sourceClass.Storages.GetAt(1).IndexLocation
    set sourceStreamLocation = sourceClass.Storages.GetAt(1).StreamLocation

    set targetDataLocation = targetClass.Storages.GetAt(1).DataLocation
    set targetIndexLocation = targetClass.Storages.GetAt(1).IndexLocation
    set targetStreamLocation = targetClass.Storages.GetAt(1).StreamLocation

    set tableName = $$$CLASSsqlschemaname($$$gWRK,sourceClassName) _"."_  $$$CLASSsqltablename($$$gWRK,sourceClassName)

    if $ISOBJECT(sourceClass) 
     & $ISOBJECT(targetClass)
     & tableName '= "" {
        if $ISOBJECT(sourceClass.Storages.GetAt(1)) 
         & $ISOBJECT(targetClass.Storages.GetAt(1))
         {
            set tStatement=##class(%SQL.Statement).%New(1) 
            kill sql
            set sql($i(sql)) = "SELECT" 
            set sql($i(sql)) = "id"  
            set sql($i(sql)) = "FROM"
            set sql($i(sql)) = tableName
            set sc = tStatement.%Prepare(.sql) 
            set result = tStatement.%Execute()

            kill:purgeArchive @targetDataLocation, @targetIndexLocation, @targetStreamLocation 

            while result.%Next() {
                set source = $CLASSMETHOD(sourceClassName,"%OpenId",result.%Get("id"))

                if $ISOBJECT(source) {
                    set archive = $CLASSMETHOD(targetClassName,"%New")

                    for i = 1:1:sourceClass.Properties.Count() {
                        set propertyName = sourceClass.Properties.GetAt(i).Name
                        set $PROPERTY(archive,propertyName) = $PROPERTY(source,propertyName)
                    }

                    set sc = archive.%Save()
                    if sc {
                        set archived = archived + 1
                    } else {
                        set archivedErrors = archivedErrors + 1
                    }
                }
            }

            kill:purgeSource @sourceDataLocation, @sourceIndexLocation, @sourceStreamLocation

            set msg ="archive data from " _ sourceClassName _ " into "_ targetClassName _ " result:" _ archived _ " archived (errors:" _ archivedErrors _ ")"

       } else {
            set severity = 1
            set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes have not storage definition"
        }
    } else {
        set severity = 1
        set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes not found in %Dictionary.ClassDefinition"
    }
    do ##class(%SYS.System).WriteToConsoleLog(msg,0,severity)
    Return sc
}

ClassMethod toArchive(purgeArchive As %Integer = 0, purgeSource As %Integer = 0) As %Status
{
    set sc=$$$OK

    set sourceClassName = $PIECE(##class(%Dictionary.ClassDefinition).%OpenId(..%ClassName(1)).Super,",",2)
    set targetClassName = ..%ClassName(1)
    set sourceClass = ##class(%Dictionary.ClassDefinition).%OpenId(sourceClassName) 
    set targetClass = ##class(%Dictionary.ClassDefinition).%OpenId(targetClassName)

    if $ISOBJECT(sourceClass) 
     & $ISOBJECT(targetClass) {
        if $ISOBJECT(sourceClass.Storages.GetAt(1)) 
         & $ISOBJECT(targetClass.Storages.GetAt(1))
         {
    
            set sourceDataLocation = sourceClass.Storages.GetAt(1).DataLocation
            set sourceIndexLocation = sourceClass.Storages.GetAt(1).IndexLocation
            set sourceStreamLocation = sourceClass.Storages.GetAt(1).StreamLocation

            set targetDataLocation = targetClass.Storages.GetAt(1).DataLocation
            set targetIndexLocation = targetClass.Storages.GetAt(1).IndexLocation
            set targetStreamLocation = targetClass.Storages.GetAt(1).StreamLocation

            kill:purgeArchive @targetDataLocation, @targetIndexLocation, @targetStreamLocation 

            merge @targetDataLocation = @sourceDataLocation
            merge @targetIndexLocation = @sourceIndexLocation
            merge @targetStreamLocation = @sourceStreamLocation

            set ^mergeTrace($i(^mergeTrace)) = $lb($zdt($h,3),sourceDataLocation)

            kill:purgeSource @sourceDataLocation, @sourceIndexLocation, @sourceStreamLocation

            set severity = 0
            set msg = "ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " SUCCESSFULLY"
                    

        } else {
            set severity = 1
            set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes have not storage definition"
        }
    } else {
        set severity = 1
        set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes not found in %Dictionary.ClassDefinition"
    }
    do ##class(%SYS.System).WriteToConsoleLog(msg,0,severity)
    return sc
}

Storage Default
{
<data name="personDefaultData">
<value name="1">
<value>%%CLASSNAME</value>
</value>
<value name="2">
<value>name</value>
</value>
<value name="3">
<value>dob</value>
</value>
<value name="4">
<value>activ</value>
</value>
<value name="5">
<value>created</value>
</value>
</data>
<datalocation>^off.personD</datalocation>
<defaultdata>personDefaultData</defaultdata>
<idlocation>^off.personD</idlocation>
<indexlocation>^off.personI</indexlocation>
<streamlocation>^off.personS</streamlocation>
<type>%Storage.Persistent</type>
}

}

You have a lucky design without indices. That makes life with 2 distinct globals easy.

I though you have a common index for current and archive versions. I guess I misunderstood your point.

A simple MERGE ^archive(....)=^source(....)  just doesn't maintain any index.

I think the biggest issue is that it can store only one (previous) version. Or if you merge:

merge ^archive(id, ts) = ^source(id)

you'll need a custom storage for a composite id key.