Discussion
Sylvain Guilbaud · Feb 21, 2022

How to efficiently store historical data, similar to current data, without mixing physical storage ?

Let's consider you would like to efficiently store your historical data in a similar structure than the one used for your current data, but without sharing the same physical storage (ie : not in the same global). What is the most efficient way to do it ?

Below a simple class of your current data : 

Class data.current.person Extends (%Persistent, %Populate)
{

Parameter DEFAULTGLOBAL = "^on.person";

Property name As %String;

Property dob As %Date(FORMAT = 4);

Property activ As %Boolean [ InitialExpression = 1 ];

Property created As %TimeStamp [ InitialExpression = {$zdt($h,3)} ];
Storage Default
{
<Data name="personDefaultData">
<Value name="1">
<Value>%%CLASSNAME</Value>
</Value>
<Value name="2">
<Value>name</Value>
</Value>
<Value name="3">
<Value>dob</Value>
</Value>
<Value name="4">
<Value>activ</Value>
</Value>
<Value name="5">
<Value>created</Value>
</Value>
</Data>
<DataLocation>^on.personD</DataLocation>
<DefaultData>personDefaultData</DefaultData>
<IdLocation>^on.personD</IdLocation>
<IndexLocation>^on.personI</IndexLocation>
<StreamLocation>^on.personS</StreamLocation>
<Type>%Storage.Persistent</Type>
}

}

NB : you can omit the DEFAULTGLOBAL parameter if you're happy with the one automatically generated (which is the case, most of the time) ;  I use it in my example to emphasize the separation of globals in current and archive classes.

 

First approach

You could first think about creating 2 persistent classes and copy the same definition in both classes (except the Storage clause which will be recreated at the first compilation) :

Class data.current.person Extends %Persistent {}
Class data.archive.person Extends %Persistent {}

But this approach will rapidly reveal itself disastrous in terms of maintenance, for obvious reason of duplicating the code/definitions.

 

Second approach

Another idea could be to create a common class with ClassType = "" in order to not create the storage definition at its level. Then, you can make archive and current classes inherit from this common class and add %Persistent inheritance in both subclasses.

Class data.common.person Extends %Persistent [ ClassType = "" ] {}
Class data.current.person Extends (%Persistent,data.common.person) [ClassType = persistent ] {}
Class data.archive.person Extends (%Persistent,data.common.person) [ ClassType = persistent ] {}

This way, you solve the question and get separate globals for each persistent class with a unique class to store all the definition. 

But, this is not really convenient to manage 3 classes instead of 2 and to have to maintain the definition in the common class ; neither to be obliged to open the common class to get the detail of the definition of the current class.

 

Third approach

Well, a third solution could be to return to a very simple approach : just create an archive class which inherits from the current data class...

Class data.current.person Extends %Persistent {}
Class data.archive.person Extends data.current.person {}

In that case, current and historical data will share the same storage definition and, as a result, their data will be stored in the same globals, which is not really convenient to physically separate both kind of data and thus not answering to our need.

 

A simpler approach

You can obtain a similar structure in archive class which inherits from current, without inheriting from its storage, by simply adding %Persistent inheritance before the class you want to inherit from.

In this case, even if the super class already contains a storage class with DataLocation (because of its persistent type), at compile time, the inheritance precedence will force the generation of a new storage clause based on the subclass definition, with its own storage for DataLocation, IdLocation, IndexLocation, StreamLocation.

Class data.archive.person Extends (%Persistent, data.current.person)
{
Parameter DEFAULTGLOBAL = "^off.person";
Storage Default
{
<Data name="personDefaultData">
<Value name="1">
<Value>%%CLASSNAME</Value>
</Value>
<Value name="2">
<Value>name</Value>
</Value>
<Value name="3">
<Value>dob</Value>
</Value>
<Value name="4">
<Value>activ</Value>
</Value>
<Value name="5">
<Value>created</Value>
</Value>
</Data>
<DataLocation>^off.personD</DataLocation>
<DefaultData>personDefaultData</DefaultData>
<IdLocation>^off.personD</IdLocation>
<IndexLocation>^off.personI</IndexLocation>
<StreamLocation>^off.personS</StreamLocation>
<Type>%Storage.Persistent</Type>
}
}

 

Other ideas ?

If you have other ways to address this question, please share your ideas and comments.

3
0 278
Discussion (9)1
Log in or sign up to continue

I'm using a modified second approach.

1. Create a base abstract class with all properties:

Class model.PersonBase Extends (%XML.Adaptor, %JSON.Adaptor) [ Abstract ]
{

Property Name;

}

2. Create persistent class:

Class model.Person Extends (%Persistent, model.PersonBase, Utils.Copyable
{
/// Indices, fkeys and relationships go here
}

Utils.Copyable allows easy copy/compare.

3. Create snapshot class:

Class model.PersonSnapshot Extends (%Persistent,  model.PersonBase, Utils.Copyable)
{

Index parentIndex On parent;

/// Creation timestamp (UTC)
Property createdOn As %TimeStamp(POPORDER = -1, XMLPROJECTION = "NONE") [ InitialExpression = {$ZDATETIME($ZTIMESTAMP, 3, 1, 3)}, Required ];

Property parent As model.Person(XMLPROJECTION = "NONE");

Method %OnNew(parentId As %Integer) As %Status [ Private, ServerOnly = 1 ]
{
    #dim sc As %Status = $$$OK
    
    set ..parent = ##class(model.Person).%OpenId(parentId,, .sc)
    if $$$ISERR(sc) quit sc

    quit ..copyFrom(..parent, ##class(model.PersonBase).%ClassName(1))
}

}

And done.

This way you can add a snapshot of object at any point of time and by calling compareTo calculate a diff between any two versions.

There's also a hash function you can use to calculate hash from some or all object properties to speed up equivalence checks.

Thanks Eduard for sharing your code implementing a very powerful approach of data snapshot.
 

You have a lucky design without indices. That makes life with 2 distinct globals easy.
The archiving action maintaining the original ID could be achieved by a simple
MERGE from ^source to ^archive.  maintenance of indices could be slightly
effort requiring class methods.  

 

Thanks Robert for your comment.
 

Merging globals is exactly what the toArchive method does here :


 


Class data.archive.person Extends (%Persistent, data.current.person)
{

Parameter DEFAULTGLOBAL = "^off.person";

/// Description
ClassMethod archive(purgeArchive As %Integer = 0, purgeSource As %Integer = 0) As %Status
{
    set sc = $$$OK , tableName = ""
    set (archived,archivedErrors, severity) = 0

    set sourceClassName = $PIECE(##class(%Dictionary.ClassDefinition).%OpenId(..%ClassName(1)).Super,",",2)
    set targetClassName = ..%ClassName(1)

    set sourceClass = ##class(%Dictionary.ClassDefinition).%OpenId(sourceClassName) 
    set targetClass = ##class(%Dictionary.ClassDefinition).%OpenId(targetClassName)

    set sourceDataLocation = sourceClass.Storages.GetAt(1).DataLocation
    set sourceIndexLocation = sourceClass.Storages.GetAt(1).IndexLocation
    set sourceStreamLocation = sourceClass.Storages.GetAt(1).StreamLocation

    set targetDataLocation = targetClass.Storages.GetAt(1).DataLocation
    set targetIndexLocation = targetClass.Storages.GetAt(1).IndexLocation
    set targetStreamLocation = targetClass.Storages.GetAt(1).StreamLocation

    set tableName = $$$CLASSsqlschemaname($$$gWRK,sourceClassName) _"."_  $$$CLASSsqltablename($$$gWRK,sourceClassName)

    if $ISOBJECT(sourceClass) 
     & $ISOBJECT(targetClass)
     & tableName '= "" {
        if $ISOBJECT(sourceClass.Storages.GetAt(1)) 
         & $ISOBJECT(targetClass.Storages.GetAt(1))
         {
            set tStatement=##class(%SQL.Statement).%New(1) 
            kill sql
            set sql($i(sql)) = "SELECT" 
            set sql($i(sql)) = "id"  
            set sql($i(sql)) = "FROM"
            set sql($i(sql)) = tableName
            set sc = tStatement.%Prepare(.sql) 
            set result = tStatement.%Execute()

            kill:purgeArchive @targetDataLocation, @targetIndexLocation, @targetStreamLocation 

            while result.%Next() {
                set source = $CLASSMETHOD(sourceClassName,"%OpenId",result.%Get("id"))

                if $ISOBJECT(source) {
                    set archive = $CLASSMETHOD(targetClassName,"%New")

                    for i = 1:1:sourceClass.Properties.Count() {
                        set propertyName = sourceClass.Properties.GetAt(i).Name
                        set $PROPERTY(archive,propertyName) = $PROPERTY(source,propertyName)
                    }

                    set sc = archive.%Save()
                    if sc {
                        set archived = archived + 1
                    } else {
                        set archivedErrors = archivedErrors + 1
                    }
                }
            }

            kill:purgeSource @sourceDataLocation, @sourceIndexLocation, @sourceStreamLocation

            set msg ="archive data from " _ sourceClassName _ " into "_ targetClassName _ " result:" _ archived _ " archived (errors:" _ archivedErrors _ ")"

       } else {
            set severity = 1
            set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes have not storage definition"
        }
    } else {
        set severity = 1
        set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes not found in %Dictionary.ClassDefinition"
    }
    do ##class(%SYS.System).WriteToConsoleLog(msg,0,severity)
    Return sc
}

ClassMethod toArchive(purgeArchive As %Integer = 0, purgeSource As %Integer = 0) As %Status
{
    set sc=$$$OK

    set sourceClassName = $PIECE(##class(%Dictionary.ClassDefinition).%OpenId(..%ClassName(1)).Super,",",2)
    set targetClassName = ..%ClassName(1)
    set sourceClass = ##class(%Dictionary.ClassDefinition).%OpenId(sourceClassName) 
    set targetClass = ##class(%Dictionary.ClassDefinition).%OpenId(targetClassName)

    if $ISOBJECT(sourceClass) 
     & $ISOBJECT(targetClass) {
        if $ISOBJECT(sourceClass.Storages.GetAt(1)) 
         & $ISOBJECT(targetClass.Storages.GetAt(1))
         {
    
            set sourceDataLocation = sourceClass.Storages.GetAt(1).DataLocation
            set sourceIndexLocation = sourceClass.Storages.GetAt(1).IndexLocation
            set sourceStreamLocation = sourceClass.Storages.GetAt(1).StreamLocation

            set targetDataLocation = targetClass.Storages.GetAt(1).DataLocation
            set targetIndexLocation = targetClass.Storages.GetAt(1).IndexLocation
            set targetStreamLocation = targetClass.Storages.GetAt(1).StreamLocation

            kill:purgeArchive @targetDataLocation, @targetIndexLocation, @targetStreamLocation 

            merge @targetDataLocation = @sourceDataLocation
            merge @targetIndexLocation = @sourceIndexLocation
            merge @targetStreamLocation = @sourceStreamLocation

            set ^mergeTrace($i(^mergeTrace)) = $lb($zdt($h,3),sourceDataLocation)

            kill:purgeSource @sourceDataLocation, @sourceIndexLocation, @sourceStreamLocation

            set severity = 0
            set msg = "ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " SUCCESSFULLY"
                    

        } else {
            set severity = 1
            set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes have not storage definition"
        }
    } else {
        set severity = 1
        set msg = "ERROR WHILE ARCHIVING " _ sourceClassName _ " in "_ targetClassName _ " : " _ " classes not found in %Dictionary.ClassDefinition"
    }
    do ##class(%SYS.System).WriteToConsoleLog(msg,0,severity)
    return sc
}

Storage Default
{
<data name="personDefaultData">
<value name="1">
<value>%%CLASSNAME</value>
</value>
<value name="2">
<value>name</value>
</value>
<value name="3">
<value>dob</value>
</value>
<value name="4">
<value>activ</value>
</value>
<value name="5">
<value>created</value>
</value>
</data>
<datalocation>^off.personD</datalocation>
<defaultdata>personDefaultData</defaultdata>
<idlocation>^off.personD</idlocation>
<indexlocation>^off.personI</indexlocation>
<streamlocation>^off.personS</streamlocation>
<type>%Storage.Persistent</type>
}

}

Okay, why would you want to index active versions and old versions together?

In my design I explicitly create indices for the active version only. Old versions are only indexed on the parent field.

Why together ? ? ?  Who wants that ? ? ?   Where ? ? ?
A simple MERGE ^archive(....)=^source(....)  just doesn't maintain any index.
this was my message    

You have a lucky design without indices. That makes life with 2 distinct globals easy.

I though you have a common index for current and archive versions. I guess I misunderstood your point.

A simple MERGE ^archive(....)=^source(....)  just doesn't maintain any index.

I think the biggest issue is that it can store only one (previous) version. Or if you merge:

merge ^archive(id, ts) = ^source(id)

you'll need a custom storage for a composite id key.

YEAH! A history track makes it definitely more complex