How to keep Cache table synchronized with an external table?

Primary tabs

I have a MySQL server with "posts" table.

I also have a Caché server with "downloadedposts" table. 

They are connected from Caché to MySQL via SQL Gateway

I want to keep Caché table synced with MySQL one  (MySQL "posts" table is a master copy), so periodically Caché queries MySQL server and downloads data. So far so good, and if a record appears or changes in MySQL table, Caché downloads the changes.

The problem I'm encountering is that sometimes rows would be deleted from  MySQL "posts" table.

How do I synchronize deletions?

I don't want delete data from Caché table and recreate it each time because:

  • It's costly
  • Caché table is related to other tables
  • I don't want to lose data accidentally (for example because MySQL server went under in a middle of a query)

That said, what's the best approach to syncing deletions?

Here's the one I thought up:

  • (Once) Create new column "deleted" in "downloadedposts" table
  • Before syncing create pool
  • Start syncing and add IDs from "posts" to the pool
  • After syncing execute sql:
UPDATE downloadedposts (Deleted) VALUES (NotInPool('postspool', ID))

Some additional considerations:

  • IDs in "posts" and "downloadedposts" tables are corresponding and equal
  • There is a lot of data,  so I can't build a long string of IDs for EXISTS/IN query against MySQL

Here's my pool code:

Include (Utils, %occErrors)
Class Utils.Pool [ Abstract ]
{
/// Create new pool
/// w $System.Status.GetErrorText(##class(Utils.Pool).CreatePool(1))
ClassMethod CreatePool(PoolName As %String(MAXLEN=50) = "") As %Status
{
    Return:((PoolName="") || ($Length(PoolName)>50)) $$$ERROR($$$GeneralError, "Empty or long pool name")
    Return:..PoolExists(PoolName) $$$ERROR($$$GeneralError, "Pool already exists")
    Lock +$$$Pool(PoolName):0
     If '$TEST {
         Return $$$ERROR($$$GeneralError, "Cannot lock the pool " _ PoolName)
     }
     Kill $$$Pool(PoolName)
     Set $$$Pool(PoolName) = 0
     Return $$$OK
}
/// Add unique element to existing pool
/// w $System.Status.GetErrorText(##class(Utils.Pool).AddToPool(1, 1))
ClassMethod AddToPool(PoolName As %String, Element As %Integer) As %Status
{
    Set st = $$$OK
    If ..PoolExists(PoolName) {
        If ..NotInPool(PoolName, Element) {
            Set $$$Pool(PoolName, Element) = ""
            Set $$$Pool(PoolName) = $$$Pool(PoolName) + 1
        }
    } Else {
        Set st = $$$ERROR($$$GeneralError, "Pool does not exist")
    }
    Return st
}
/// Check that element is in pool
/// w ##class(Utils.Pool).InPool(1, 1)
ClassMethod InPool(PoolName As %String, Element As %Integer) As %Boolean [ CodeMode = expression, SqlName = InPool, SqlProc ]
{
$Data($$$Pool(PoolName, Element))=1
}
/// Check that element is not in pool
/// w ##class(Utils.Pool).NotInPool(1, 1)
ClassMethod NotInPool(PoolName As %String, Element As %Integer) As %Boolean [ CodeMode = expression, SqlName = NotInPool, SqlProc ]
{
$Data($$$Pool(PoolName, Element))'=1
}
/// Check that pool exists
ClassMethod PoolExists(PoolName As %String) As %Boolean [ CodeMode = expression ]
{
$Data($$$Pool(PoolName))
}
/// Delete existing pool
/// w $System.Status.GetErrorText(##class(Utils.Pool).DeletePool(1))
ClassMethod DeletePool(PoolName As %String) As %Status
{
    Return:((PoolName="") || ($Length(PoolName)>50)) $$$ERROR($$$GeneralError, "Empty or long pool name")
    Kill $$$Pool(PoolName)
    Lock -$$$Pool(PoolName)
    Return $$$OK
}
}

And Ultis include file:

#Define Pool ^Utils.Pool 

The code is also available on GitHub.

To sum up, there are the following questions:

  • Is there a more effective approach to syncing deleted rows across two tables?
  • Any improvements to the pool code?
  • + 2
  • 0
  • 730
  • 1
  • 3

Answers

I think any answer here should be thinking about using triggers.  I don't know MySQL enough to know if it has linked tables, but if it were Caché as the master copy, this is pretty easy.  You have your Master table which contains your data, and an external table that has a copy.  You link the external table to Caché and put in triggers in the master table to call INSERT/UPDATE/DELETE on the mapped table.  

Triggers are, unfortunately, not an option as MySQL table gets updated via backup (so it's DROP TABLE -> CREATE TABLE -> INSERT)

Ensemble SQL Adaptors are pretty good in syncing. Can that code be used here ?

It originally sounded like "posts" was a live table, receiving INSERTs, DELETEs and perhaps UPDATEs, and so there were suggestions such as firing a trigger whenever any of those happened (MySQL does support triggers). But it now seems like "posts" is generally static, and periodically updated/replaced.

So, to have the Caché version duplicate the MySQL version, could you similarly empty the "downloadedposts" table, and then load it with the latest backup that was used to update "posts"? This might only make sense if the database were small enough to use a MySQL logical backup: http://dev.mysql.com/doc/refman/5.7/en/backup-types.html. Perhaps you could reconsider this approach. Caché offers options (such as no journaling) to speed such loads.