How to fill fact table in DeepSee via JDBC

Question

Question

Max Dimukhametov · Mar 21, 2018

#InterSystems IRIS BI (DeepSee) #Cubes #JDBC #SQL

Hi all.

I have some problem with getting data from external source (PostgreSQL) via JDBC and load it in DeepSee and I resolved some problems while writing this question :)

So..Below I proved my steps of working:

Created SQL Gateway Connection

Created class for loading data via JDBC

S Connection=##class(%Library.SQLGatewayConnection).%New()
S sc=Connection.Connect("jdbcurl","user","pass",0)
If $$$ISERR(sc) do $System.Status.DisplayError(sc) quit
S sc=Connection.AllocateStatement(.Statment)
S query="SELECT * FROM students"
S sc=Connection.Prepare(Statment,query)
S sc=Connection.Execute(Statment)
//Getting result set

In 2 line I got exception (not saved it) and I exclusive this option

Next I Created projections via SQL wizard and now all tables stored in Cache and I get needed data via ##class(%ResultSet.SQL).%Prepare()
After that I create model (StudentRating) in sequence by result set
Manually in Architect rebuild fact table
Next if need to update information I clear data from StudentRating by method %KillExtent() and repeat 3 and 4 points.

What I want to change in current situation?

1) Get needed information by JDBC without projections (If I understand correctly it's not recommended option https://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY...)

2) Auto-rebuild fact table (2-3 times in day)

Please advise me how to resolve 1 and 2 point or suggest best practices for my question.

Discussion (7)0

Log in or sign up to continue

Eduard Lebedyuk · Mar 22, 2018

For higher performance it's better to keep the data in InterSystems platform and sync it with remote db periodically.

To download the data via xDBC you have two main approaches:

Interoperability (Ensemble) SQL inbound adapter
"Raw" access via %SQLGatewayConnection or %Net.Remote.Java.JDBCGateway

Interoperability approach is better as it solves most problems and user should just enter the query, etc. "Raw" can be faster and allows for fine-tuning.

Now, to keep data synced there are several approaches available:

If source table has UpdatedOn field, track it and get rows updated only after last sync.
Journals: some databases provide parsable journals, use them to determine which rows changed in relevant tables.
Triggers: sometimes source table has triggers (i.e. Audit) which while do not provide explicit UpdatedOn field nonetheless can be used to determine row update time.
Hashing: hash incoming data and save the hash, update the row only if hash changed.

If you can modify source database - add UpdatedOn field, it's the best6 solution.

Linked tables allow you not to store data permanently, but the cube would be rebuilt each time. With other approaches syncing cube is enough.

Also check this guide on DeepSee Data Connectors.

1 0

score 1 · Answer 1 · 2018-03-22T08:22:07-04:00

Hi, Max!

I think you have two questions here.

1. how to import data into Caché class from another DBMS.

2. How to update the cube which uses an imported table as a fact table.

About the second question: I believe you can introduce a field in the imported table with a hash-code for the fields of the record and import only new rows so DeepSee Cube update will work automatically in this case. Inviting @Eduard Lebedyuk to describe the technics in details.

Regarding the 1st question - I didn't get your problem, but you can test linked table via SQL Gateway UI in Control Panel.

score 0 · Answer 2 · 2018-03-28T00:09:26-04:00

Thanks, Evgeny!

About questions:

1) Now I use projection tables and getting result set via SQL Query after that fill it in source table. I think it's a best solution in current moment.

2) I use source table as a result set of some SQL query and before of this update table is not suitable for me because need to check which row was deleted, changed or added. Now I just delete all rows from source table and after that call SQL query for getting result and insert it in table. What do you think about this approach?

score 0 · Answer 3 · 2018-03-28T06:32:58-04:00

2) It's OK approach for relatively small external source tables because in this case, you need to build the cube for all the records and have no option to update/sync.

If you OK with timings on cube building you are very welcome to use the approach.

If the building time is sensible for the application consider use the approach what @Eduard Lebedyuk already has advised you: import only new records from the external database (hash method, sophisticated query or some other bright idea) and do the cube syncing which will perform faster than cube rebuilding.

score 0 · Answer 4 · 2018-03-22T14:16:54-04:00

Evgeny Shvarov · Mar 22, 2018

Thanks, Eduard!

Also this is a relevant article too by @Vicky Li

0 0

score 0 · Answer 5 · 2018-03-27T23:52:00-04:00

Thanks for answer, Eduard!

I have additional question:

How best to rebuild cube? Need rebuild 2 times in day for each days of week besides saturday.
Also I want to fill source table before a rebuild <= In this case I call SQLQuery for getting result set.

I know about synchronisation but I don't fully understand how to set rebuild time with the terms written above. Also synchronisation get data from source table and fill it in ^OBJ.DSTIME. What event need for update source table before filling ^OBJ.DSTIME?

My idea is call some method via TaskManager. This method update source table in first and after that rebuild cube without synchronization.

score 0 · Answer 6 · 2018-03-28T04:42:00-04:00

You're doing two separate operations:

Syncing the data
Syncing the cube

They can both be system tasks with one task dependent on other or even one task altogether.

If you're using persistent objects to store data you can specify DSTIME:

Parameter DSTIME = "AUTO";

and ^OBJ.DSTIME would be maintained automatically.

UPD. Read your other comment. DSTIME is relevant only for syncing. It does not affect full build behaviour.