Question
· May 10, 2018

Global update / Temporary global

Hi,

I wonder if you could help me with a problem with persistent globals and journals.

We have a few interfaces using a persistent global just to translate some codes. The use of that global is like a lookup table...an interface calls a method that search for a field (in the message) in the global. If found, the code is translated with another field from the global.

The format of the global is something like:

$LB("","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27")

The method that search for the code in the global and translate the code is using a SQL query against the global:

SELECT TOP 1 %ID FROM globalxxx
 WHERE (field10 = :inputvalue) ORDER BY field2 DESC


This can be done because the global is a persistent class with a SQL table linked to it defined like:

Class XXX.Stored.globalxxx Extends %Persistent [ SqlTableName = globalxxx ]

(http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=...)

All this is working fine...the problem is how to update that global. Because is used for some LIVE interfaces, downtime needs to be minumum and the problem is the number of entries in the file that needs to update that global...about 60,000. I though having another global similar as that one to import the data from the file, then remove duplicate entries (yes, the file can contain duplicate values that need to be removed). Once the data is in the copy global, the global is exported to a file and changed the name of the global so it can point to the LIVE global. Once that is done, it can be easily imported into the LIVE global reducing the downtime of the interfaces to just a few seconds (just the time to import the global from a global import file).

Problems with this: 

1) I am not sure if this is the right way to do it but it is all I could think with my basic knowledge.
2) Because of the process to remove duplicate entries from the copy global, there are a few operations that remove entries from that global...all this generates entries in the journal that at the end take huge amount of disk space. I was told, global can be changed to temporary global (https://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY...) making those actions not to be logged in the journal file. The problem with this is that I am not sure how to convert/specify my copy global is temporary...I tried to change the name to start with CACHETEMP (XXX.Stored.CACHETEMPglobalxxx) or change name of the class to start with CACHETEMP (CACHETEMP.Stored.globalxxx)...but I am not quite sure this is working.

Suggestions, help, feedbacks, ideas...will be more than welcome!

Thanks

Discussion (6)1
Log in or sign up to continue

Hi Antonio,

Sounds like you are almost there.

The temp global name is case sensitive, so make sure it starts with ^CacheTemp and not ^CACHETEMP.

You will lose this temp global on a restart, so you will need to then make it permanent once you have dealt with all the duplicates. You can do this with the merge command...

http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=...

You will need to kill the original global first otherwise you will merge the temp global into it.

Alternatively you could have keyed your original global in a way that duplicates are easily identified, e.g. ^foo(uniqueKeyValue)=data, you can then check if the entry exists first before deciding to write anything to the database.

Sean.

WOW...that was quick...thanks Sean

1) CacheTemp needs to be the name of the class or the name of the global? I mean:

^XXX.Global.CacheTempGlobal OR ^CacheTempGlobal

2) The process I designed read the file with the updated data, and upload line by line unless is a duplicate...so basically every line requires a query against the Temp global to check if that entry already exist or not. Sometimes, the new entry needs to replace the value in the temp global, so for that I delete the entry in the temp global and then add the new one. (That part is quite complex and I do not think is relevant for the global problem). Once the temp global contains all data from the file is when I start the process to export the global to a file, modify the name of the global in the file to point to the LIVE one and then when I update the LIVE global with that file.

I cannot use the merge command as the file with the updated data includes all data not just new entries but everything...so I rather delete the LIVE global completely and then import the new global file.

3) In order to avoid so many deletions of data to the temp global, I was thinking maybe to upload the content of the file into an array or something, remove all duplicates from there and then the final data to be uploaded to the temp global. But I am not sure if that will have a real improvement with just uploading/deleting to the temp global directly.

Thanks

It will be the name of the global... ^CacheTempGlobal

Global export import seems like a good strategy.

I was just looking at your SQL again, do you have all of the lookup values in one single global node? If this is the case then that would explain why the transaction log was getting full up. 

What you should really do is split the data out into many global entries...

E.g.

^Lookup("UK")="United Kingdom"
^Lookup("USA")="United States"
^Lookup("ESP")="Spain"
^Lookup("ITA")="Italy"

Making the key (or keys depending on your data) unique and easy to check for duplicates on those key(s).

Then, when you make updates to the global, only the node that changes will be committed to the transaction log, which will have a tiny overall impact.

1) my question is because when I create the global I have to specify the package as well:

Package: XXX

Name: XXX

I am not sure if the CacheTemp has to be included in the package name or just class name so: XXX.CacheTempGlobal or CacheTempGlobal

2) About how the global is structured:

    ^XXX.Stored.GlobalD    =    3
     ^XXX.Stored.GlobalD(1)    =    $lb("","Code3","TESTING3","","","FIRCROFT, LONDON ROAD","GREEN","EGHAM","",$c(0)," 0BS","1",$c(0),"A","P","H1","19740401","19910401",$c(0),$c(0),$c(0),$c(0),"0",$c(0),$c(0),$c(0),$c(0),$c(0))
     ^XXX.Stored.GlobalD(2)    =    $lb("","Code2","TESTING2","","","FIRCROFT, LONDON ROAD","GREEN","EGHAM","",$c(0)," 0BS","1",$c(0),"A","P","H2","19740401","19910401",$c(0),$c(0),$c(0),$c(0),"0",$c(0),$c(0),$c(0),$c(0),$c(0))
     ^XXX.Stored.GlobalD(3)    =    $lb("","Code10","TESTING10","","","FIRCROFT, LONDON ROAD","GREEN","EGHAM","",$c(0)," 0BS","1",$c(0),"A","P","H3","19740401","19910401",$c(0),$c(0),$c(0),$c(0),"0",$c(0),$c(0),$c(0),$c(0),$c(0))

Thats how the global looks like. So basically the search the interfaces run is jut to check a specific field in the message is in the global field 14 (H1,H2,H3...) and if so translate that code to global field 1 (Code2, Code3, Code10...). But when importing the data any of the other fields can change...so that's why is better to remove the global completely and upload everything from the file.

The problem with the logs is when so many duplicates are found in the file and the last one needs to be kept in the global...that made every single duplicate new code to be uploaded and the existing one to be removed...which I think generates an entry for each action...

You won't need a package name for the global name, this is just a naming convention for globals that are auto named by the %Persistent class.

If you provide your own global name then it can be anything you want, so for the transitory data just call it something like... 

^CacheTempDedup

I think there is probably an easier way to solve the dedup problem by passing the log file twice and recording the line numbers of interest into a temporary global, you can then skip all the early duplication and minimise the transaction writes.

This is a 2 min bash but it should give you a general idea of how to do that...

kill ^CacheTempDedupSeen
kill ^CacheTempDedupLine
set lineNo=1
set file=##class(%File).%New("file.txt")
do file.Open("R")
set data=file.ReadLine
while file.AtEnd=0 {
    set code=$piece(data,",",1) //e.g
    if $data(^CacheTempDedupSeen(code)) kill ^CacheTempDedupLine(^CacheTempDedupSeen(code))
    set ^CacheTempDedupSeen(code)=lineNo
    set ^CacheTempDedupLine(lineNo)=code
    set data=file.ReadLine
    set lineNo=lineNo+1
}
do file.Rewind()
set lineNo=1
set data=file.ReadLine
while file.AtEnd=0 {
    if $data(^CacheTempDedupLine(lineNo)) {
        //this data is the last in the log for its code type, update the persistent class directly
    }
    set data=file.ReadLine
    set lineNo=lineNo+1
}