Retrieving last known record (based on a date) in series of logged events

Question

Question

Steve Pisani · Feb 15, 2016

Hi,

Given a log file with these records representing, hypotethically, the ‘movements’ or changes in residential city, for citizens, like this:

ID         CitizenRef           RelocationDate                 City                        Surname
...
48         1000                      2015-04-01                          Boston                  Smith
49         1000                      2015-07-01                          Seattle                  Smith
50         1000                      2015-10-01                          Boston                  Smith
51         1000                       2016-01-15                          NewYork             Smith
...
…

Note - Smith (1000) lived in Boston, Seattle, and again in Boston (2015) and finally, in NewYork (2016). He has multiple entries in this log.

I want to perform analytics based on the citizens that are currently living in NewYork. To do this, I created a dimension ‘IsLastKnownRecord’, which is populated with the value “True” when processing a row that I know represents the last known movement for the citizen, and “False” otherwise. The sourceExpression for this dimension calls a query on the source table to determine this.

This way, to perform analytics based on a list of citizens that are currently in NewYork, (or other city for that matter) I merely have to include a filter where IsLastKnownRecord=”true”.

This seems to work, but I’m wondering if there is another (more elegant ?) way of achieving the same result without creating a the dimension ‘IsLastKnownRecord’ in the cube, but instead, directly as part of the pivot table.

Thanks

Steve

Discussion (8)1

Log in or sign up to continue

Dmitry Maslennikov · Feb 16, 2016

You can calculate such property on a class side, something like this. Index by CitizenRef and RelocationDate, and method which looking for the next date, and returns true if nothing found.

Index CitizenRelocation On (CitizenRef, RelocationDate);

Property IsLastKnownRecord As %Boolean [ Calculated, SqlComputeCode = {set {*}=##class({%%CLASSNAME}).IsLastKnownRecordCheck({CitizenRef}, {RelocationDate})}, SqlComputed ];

Method IsLastKnownRecordGet() As %Boolean
{
    quit ..IsLastKnownRecordCheck(i%CitizenRef, i%RelocationDate)
}

ClassMethod IsLastKnownRecordCheck(CitizenRef As %Integer, RelocationDate As %Date) As %Boolean [ CodeMode = objectgenerator ]
{
    set storagename="Default"
    set storageInd=%class.Storages.FindObjectId(%classname_"||"_storagename)
    set storage=%class.Storages.GetAt(storageInd)
    set indexLocation=storage.IndexLocation
    set indexLocation=$name(@indexLocation@("CitizenRelocation"))
    do %code.WriteLine($c(9)_"quit $order(@($name("_indexLocation_"))@(CitizenRef, RelocationDate))=""""")
    quit $$$OK
}

And result you can see below

2 0

score 2 · Answer 1 · 2016-02-16T02:13:10-05:00

Hello. You can do it like this:

In your source fact class define a new property - IsLastKnownRecord As %Boolean
Fill this property with correct values during batch imort, or fill it with a task or a trigger or a callback (many ways to calculate it, choose any approach you want)
In your cube delete IsLastKnownRecord dimension
In studio open your cube class and add buildRestriction to the cube definition: <cube buildRestriction="IsLastKnownRecord=1"
Save and compile your cube
During cube build only facts with IsLastKnownRecord value equal to true would be used to build a cube

score 0 · Answer 2 · 2016-02-16T17:44:04-05:00

Thanks, but in my question, when asking if "there is another (more elegant ?) way of achieving the same result without creating a the dimension ‘IsLastKnownRecord’ in the cube, but instead, directly as part of the pivot table."

I meant, is there a way of achiving the filtered results showing the last recorded date, using, only MDX syntax - (and not creating a dimension derived from the underlying source data).

Your solution assumes I have access to the source class and data model. If I do, then yes, i'll extend the underlying source data model in one of several ways (including that mentioned in a subsequent post).

Note also - that I still want to quert all logged events so setting a cube buildRestriction is not something I would be wanting to do.

thanks anyway,

Steve

score 1 · Answer 3 · 2016-02-16T04:38:20-05:00

Good way to calculate such property. One question I have is why you use indirection and $name function in the generated method? It can be safely removed, for example like this:

ClassMethod IsLastKnownRecordCheck(CitizenRef As %Integer, RelocationDate As %Date) As %Boolean [ CodeMode = objectgenerator ]
{
    set storagename=&quot;Default&quot;
    set storageInd=%class.Storages.FindObjectId(%classname_&quot;||&quot;_storagename)
    set storage=%class.Storages.GetAt(storageInd)
    set indexLocation=storage.IndexLocation
    do %code.WriteLine($c(9)_&quot;quit $order(&quot;_indexLocation_&quot;(&quot;&quot;CitizenRelocation&quot;&quot;, CitizenRef, RelocationDate))=&quot;&quot;&quot;&quot;&quot;)
    quit $$$OK
}

score 0 · Answer 4 · 2016-02-16T05:22:30-05:00

Dmitry Maslennikov · Feb 16, 2016

Just because, I wanted to split getting full index location, and generated code.

0 0

score 1 · Answer 5 · 2016-02-16T04:58:48-05:00

What I have done in previous projects is to link the correlated records, e.g.:

ID CitizenRef RelocationDate City Surname NextLocationId
...
48 1000 2015-04-01 Boston Smith 49
49 1000 2015-07-01 Seattle Smith 50
50 1000 2015-10-01 Boston Smith 51
51 1000 2016-01-15 NewYork Smith NULL

You can quickly filter for current locations by setting your filter to "NextLocationId IS NULL". This is not very different from the other comments so far, but this approach is more flexible.

If you ask the question: "Who lived in Boston and relocated to Seattle" you can check for the case (pseudo-filter)

City="Boston" AND NextLocationId.City="Seattle"

You can even ask more complex questions like "Who relocated to Boston and stayed longer than a year after living in NewYork?"

score 0 · Answer 6 · 2016-02-16T17:48:52-05:00

This is a good idea.

But once again, though, the core question was whether MDX afforded me syntax, that would show only the latest RelocationDates for distinct CitizenRef's found.

(so - assume, my source table only has ID, CitizenRef, RelocationDate, City and Surname fields, and cannot be modified).

Thanks for the suggestion though.

Steve

score 0 · Answer 7 · 2016-02-16T18:39:10-05:00

Hi, Steve!

Have you tried %Last function?

E.g. in a MDX query below it counts records only for the last month of sales for every product.

SELECT NON EMPTY %Last([DateOfSale].[Actual].[MonthSold].Members) ON 0,NON EMPTY [Product].[P1].[Product Category].Members ON 1 FROM [HoleFoods]