Retrieving last known record (based on a date) in series of logged events

Hi,

Given a log file with these records representing, hypotethically, the ‘movements’ or changes in residential city, for citizens, like this: 

 ID           CitizenRef           RelocationDate                 City                        Surname
...
 48          1000                       2015-04-01                          Boston                  Smith
 49          1000                       2015-07-01                          Seattle                  Smith
 50          1000                       2015-10-01                          Boston                  Smith
 51          1000                       2016-01-15                          NewYork             Smith
...

Note - Smith (1000) lived in Boston, Seattle, and again in Boston (2015) and finally, in NewYork (2016). He has multiple entries in this log.

I want to perform analytics based on the citizens that are currently living in NewYork. To do this, I created a dimension ‘IsLastKnownRecord’, which is populated with the value “True” when processing a row that I know represents the last known movement for the citizen, and “False” otherwise.  The sourceExpression for this dimension calls a query on the source table to determine this.

This way, to perform analytics based on a list of citizens that are currently in NewYork, (or other city for that matter) I merely have to include a filter where IsLastKnownRecord=”true”.

This seems to work, but I’m wondering if there is another (more elegant ?) way of achieving the same result without creating a the dimension ‘IsLastKnownRecord’ in the cube, but instead, directly as part of the pivot table.

Thanks

Steve

  • 0
  • 0
  • 223
  • 8
  • 0

Comments

Hello.  You can do it like this:

  1.  In your source fact class define a new property - IsLastKnownRecord As %Boolean
  2. Fill this property with correct values during batch imort, or fill it with a task or a trigger or a callback (many ways to calculate it, choose any approach you want)
  3. In your cube delete IsLastKnownRecord dimension
  4. In studio open your cube class and add buildRestriction to the cube definition:  <cube buildRestriction="IsLastKnownRecord=1"
  5. Save and compile your cube
  6. During cube build only facts with IsLastKnownRecord value equal to true would be used to build a cube

Thanks, but in my question, when asking if "there is another (more elegant ?) way of achieving the same result without creating a the dimension ‘IsLastKnownRecord’ in the cube, but instead, directly as part of the pivot table."

I meant, is there a way of achiving the filtered results showing the last recorded date, using, only MDX syntax - (and not creating a dimension derived from the underlying source data).

Your solution assumes I have access to the source class and data model. If I do, then yes, i'll extend the underlying source data model in one of several ways (including that mentioned in a subsequent post).

Note also - that I still want to quert all logged events so setting a cube buildRestriction is not something I would be wanting to do.

thanks anyway,

Steve

You can calculate such property on a class side, something like this. Index by CitizenRef and RelocationDate, and method which looking for the next date, and returns true if nothing found.

Index CitizenRelocation On (CitizenRef, RelocationDate);

Property IsLastKnownRecord As %Boolean [ Calculated, SqlComputeCode = {set {*}=##class({%%CLASSNAME}).IsLastKnownRecordCheck({CitizenRef}, {RelocationDate})}, SqlComputed ];

Method IsLastKnownRecordGet() As %Boolean
{
    quit ..IsLastKnownRecordCheck(i%CitizenRef, i%RelocationDate)
}

ClassMethod IsLastKnownRecordCheck(CitizenRef As %Integer, RelocationDate As %Date) As %Boolean [ CodeMode = objectgenerator ]
{
    set storagename="Default"
    set storageInd=%class.Storages.FindObjectId(%classname_"||"_storagename)
    set storage=%class.Storages.GetAt(storageInd)
    set indexLocation=storage.IndexLocation
    set indexLocation=$name(@indexLocation@("CitizenRelocation"))
    do %code.WriteLine($c(9)_"quit $order(@($name("_indexLocation_"))@(CitizenRef, RelocationDate))=""""")
    quit $$$OK
}

And result you can see below

 

Good way to calculate such property. One question I have is why you use indirection and $name function in the generated method?  It can be safely removed, for example like this:

ClassMethod IsLastKnownRecordCheck(CitizenRef As %Integer, RelocationDate As %Date) As %Boolean [ CodeMode = objectgenerator ]
{
    set storagename=&quot;Default&quot;
    set storageInd=%class.Storages.FindObjectId(%classname_&quot;||&quot;_storagename)
    set storage=%class.Storages.GetAt(storageInd)
    set indexLocation=storage.IndexLocation
    do %code.WriteLine($c(9)_&quot;quit $order(&quot;_indexLocation_&quot;(&quot;&quot;CitizenRelocation&quot;&quot;, CitizenRef, RelocationDate))=&quot;&quot;&quot;&quot;&quot;)
    quit $$$OK
}

Just because, I wanted to split getting full index location, and generated code.

What I have done in previous projects is to link the correlated records, e.g.:

 ID           CitizenRef           RelocationDate                 City                        Surname         NextLocationId
...
 48          1000                       2015-04-01                          Boston                  Smith               49
 49          1000                       2015-07-01                          Seattle                  Smith               50
 50          1000                       2015-10-01                          Boston                  Smith               51
 51          1000                       2016-01-15                          NewYork              Smith               NULL

 

You can quickly filter for current locations by setting your filter to "NextLocationId IS NULL". This is not very different from the other comments so far, but this approach is more flexible.

If you ask the question: "Who lived in Boston and relocated to Seattle" you can check for the case (pseudo-filter)

City="Boston" AND NextLocationId.City="Seattle"

You can even ask more complex questions like "Who relocated to Boston and stayed longer than a year after living in NewYork?"

This is a good idea.

But once again, though, the core question was whether MDX afforded me syntax, that would show only the latest RelocationDates for distinct CitizenRef's found.

(so - assume, my source table only has ID, CitizenRef, RelocationDate, City and Surname fields, and cannot be modified).

Thanks for the suggestion though.

Steve

Hi, Steve!

Have you tried %Last function?

E.g. in a MDX query below it counts records only for the last month of sales for every product. 

SELECT NON EMPTY %Last([DateOfSale].[Actual].[MonthSold].Members) ON 0,NON EMPTY [Product].[P1].[Product Category].Members ON 1 FROM [HoleFoods]