· Feb 15, 2016

Retrieving last known record (based on a date) in series of logged events


Given a log file with these records representing, hypotethically, the ‘movements’ or changes in residential city, for citizens, like this: 

 ID           CitizenRef           RelocationDate                 City                        Surname
 48          1000                       2015-04-01                          Boston                  Smith
 49          1000                       2015-07-01                          Seattle                  Smith
 50          1000                       2015-10-01                          Boston                  Smith
 51          1000                       2016-01-15                          NewYork             Smith

Note - Smith (1000) lived in Boston, Seattle, and again in Boston (2015) and finally, in NewYork (2016). He has multiple entries in this log.

I want to perform analytics based on the citizens that are currently living in NewYork. To do this, I created a dimension ‘IsLastKnownRecord’, which is populated with the value “True” when processing a row that I know represents the last known movement for the citizen, and “False” otherwise.  The sourceExpression for this dimension calls a query on the source table to determine this.

This way, to perform analytics based on a list of citizens that are currently in NewYork, (or other city for that matter) I merely have to include a filter where IsLastKnownRecord=”true”.

This seems to work, but I’m wondering if there is another (more elegant ?) way of achieving the same result without creating a the dimension ‘IsLastKnownRecord’ in the cube, but instead, directly as part of the pivot table.



Discussion (8)1
Log in or sign up to continue

Hello.  You can do it like this:

  1.  In your source fact class define a new property - IsLastKnownRecord As %Boolean
  2. Fill this property with correct values during batch imort, or fill it with a task or a trigger or a callback (many ways to calculate it, choose any approach you want)
  3. In your cube delete IsLastKnownRecord dimension
  4. In studio open your cube class and add buildRestriction to the cube definition:  <cube buildRestriction="IsLastKnownRecord=1"
  5. Save and compile your cube
  6. During cube build only facts with IsLastKnownRecord value equal to true would be used to build a cube

Thanks, but in my question, when asking if "there is another (more elegant ?) way of achieving the same result without creating a the dimension ‘IsLastKnownRecord’ in the cube, but instead, directly as part of the pivot table."

I meant, is there a way of achiving the filtered results showing the last recorded date, using, only MDX syntax - (and not creating a dimension derived from the underlying source data).

Your solution assumes I have access to the source class and data model. If I do, then yes, i'll extend the underlying source data model in one of several ways (including that mentioned in a subsequent post).

Note also - that I still want to quert all logged events so setting a cube buildRestriction is not something I would be wanting to do.

thanks anyway,


You can calculate such property on a class side, something like this. Index by CitizenRef and RelocationDate, and method which looking for the next date, and returns true if nothing found.

Index CitizenRelocation On (CitizenRef, RelocationDate);

Property IsLastKnownRecord As %Boolean [ Calculated, SqlComputeCode = {set {*}=##class({%%CLASSNAME}).IsLastKnownRecordCheck({CitizenRef}, {RelocationDate})}, SqlComputed ];

Method IsLastKnownRecordGet() As %Boolean
    quit ..IsLastKnownRecordCheck(i%CitizenRef, i%RelocationDate)

ClassMethod IsLastKnownRecordCheck(CitizenRef As %Integer, RelocationDate As %Date) As %Boolean [ CodeMode = objectgenerator ]
    set storagename="Default"
    set storageInd=%class.Storages.FindObjectId(%classname_"||"_storagename)
    set storage=%class.Storages.GetAt(storageInd)
    set indexLocation=storage.IndexLocation
    set indexLocation=$name(@indexLocation@("CitizenRelocation"))
    do %code.WriteLine($c(9)_"quit $order(@($name("_indexLocation_"))@(CitizenRef, RelocationDate))=""""")
    quit $$$OK

And result you can see below

Good way to calculate such property. One question I have is why you use indirection and $name function in the generated method?  It can be safely removed, for example like this:

ClassMethod IsLastKnownRecordCheck(CitizenRef As %Integer, RelocationDate As %Date) As %Boolean [ CodeMode = objectgenerator ]
    set storagename=&quot;Default&quot;
    set storageInd=%class.Storages.FindObjectId(%classname_&quot;||&quot;_storagename)
    set storage=%class.Storages.GetAt(storageInd)
    set indexLocation=storage.IndexLocation
    do %code.WriteLine($c(9)_&quot;quit $order(&quot;_indexLocation_&quot;(&quot;&quot;CitizenRelocation&quot;&quot;, CitizenRef, RelocationDate))=&quot;&quot;&quot;&quot;&quot;)
    quit $$$OK

What I have done in previous projects is to link the correlated records, e.g.:

 ID           CitizenRef           RelocationDate                 City                        Surname         NextLocationId
 48          1000                       2015-04-01                          Boston                  Smith               49
 49          1000                       2015-07-01                          Seattle                  Smith               50
 50          1000                       2015-10-01                          Boston                  Smith               51
 51          1000                       2016-01-15                          NewYork              Smith               NULL


You can quickly filter for current locations by setting your filter to "NextLocationId IS NULL". This is not very different from the other comments so far, but this approach is more flexible.

If you ask the question: "Who lived in Boston and relocated to Seattle" you can check for the case (pseudo-filter)

City="Boston" AND NextLocationId.City="Seattle"

You can even ask more complex questions like "Who relocated to Boston and stayed longer than a year after living in NewYork?"