Question
Daniel Bertozzi · Apr 13

Moving API data into external SQL instance without audits/persistence

Recently our team have been getting requests to pickup a large amount of data from API nightly (e.g. using ODATA to loop through pages) and placing that into MS SQL databases.

My question is, what is considered the best practice to get data from API and route to external SQL database WITHOUT persisting any messages/traces etc?

At the moment my thinking would be to write a service that triggers an action once a day to an EnsLib.HTTP.OutboundAdapter, then map the HTTP response into a linked table (via method to keep the operation clean that managed the gateway on the way in and out of the method). 

What bothers me about this its using EnsLib.HTTP.OutboundAdapter to do something else its not originally intended for. And the gateway management of link tables is a bit of pain (especially when there is patching that affects SQL failover cluster because IRIS seems to lose sight of the linked connection until a process/operation is restarted). You also get none of the adapter benefits of EnsLib.SQL.OutboundAdapter.

 

 

Product version: IRIS 2019.1
$ZV: IRIS for Windows (x86-64) 2019.1.2 (Build 718U) Mon Mar 22 2021 06:42:36 EDT
1
0 308
Discussion (5)2
Log in or sign up to continue

First lets get the adapter concerns out of the way.  An adapter, in general, is only the "how do I connect" part of the.  What you need to do is write an Business Operation that has this set as its adapter.  Within your service you can do anything you want with the payload that is returned.

If you are not familiar with how this works I would start with our documentation.  Here are a couple of links to get you started:

https://docs.intersystems.com/iris20212/csp/docbook/DocBook.UI.Page.cls?KEY=EGDV_intro

https://docs.intersystems.com/iris20212/csp/docbook/Doc.View.cls?KEY=EGDV_busop

There are also Learning Service resources you could use.

Now to your process.  First off what is the reason you don't want messages or any tracing?  There are a number of good reasons to utilize the structure of IRIS interoperability including the ability to see a trace if something goes wrong and queue persistence to provide resilience in delivery.  

If messaging is still not viable for any reason I would consider creating a temporary holding class/Table for the data. then sending a message to another Business operation based on the SQL adapter that will pick this data up, write it to MS, and then remove it from the temp storage.  You can map this data to either a non-journaled database or to IRISTEMP depending on your needs.  Also allow for the fact that there could be multiple batches in progress so how you key the temporary storage will be important.

Some important questions:

What happens to the data if the process aborts in the middle?

Would missing any of the data being pulled have a negative effect on the business?

Is there any concern over data lineage for security or external auditing ?

Hi Rich. Sorry for the delay. Thanks for your input. In answer to some of your questions:

Re: Use of messages/tracing.

The interface would just produce too much data. For instance, if I had to call an API and retrieve a full set of data every night and then paginate through it, then every http response would have to be passed back to process or another operation and get attached to the session and become part of massive view trace that takes an age to render and causes the data namespace messages to expand too much.

What happens to the data if the process aborts in the middle?

As all our adapters have AlertGroups, this will just use the Ens.Alert an email will be generated for someone to look at so we have some visibility.

Would missing any of the data being pulled have a negative effect on the business?

In this instance no because we are making a copy of some reporting data and the business can just check the source if needed. As mentioned, at minimum we catch any exceptions and alert a technical contact. Obviously with an API pull to SQL, its possible to get non-2XX status code that ruins a whole run. We would probably use the http adapter settings here to manage that but because we are not getting a delta unfortunately, only the latest run is ever important.

Is there any concern over data lineage for security or external auditing? 

Generally no, because data from the http response is just passed into a SQL table. The actual transformation of the data happens after it reaches its destination via reporting engine external to IRIS.

Re: Temporary holding table.

I previously did this to produce 5GB of json for a MongoDB parser, it works well but you have to compact the namespace after kill the globals or just live with reserving an amount of space for subsequent runs in the data namespace.
Yes, a global mapping to IRISTEMP could work to get around the journalling. My only issue is it looks like I'd have to add this mapping manually to my namespace in every environment before I promote my code. Is there are way of do this via method so I can script it?

Re: Batching, this would only run once a day, however I'm generating a run identifier so should be unique enough.

Daniel,

Ok, so the data is basically a parsed report and each daily run replaces the previous run with no Delta or net change processing.  So on error they could just rerun the process to get the data.  

One suggestion on the alerting, use a distribution group as the address to send to and make sure a at least one of the Business consumers of the data are on that list.  Just alerting a technical resource could lead the business side to use incomplete data as they don't know that something went wrong.  Relying on the technical resource to pass on this information  can be risky.  Not a dig at the customer, just reality.

So, back to your question, I would still lean towards a solution that separates the loading of the data from the export to MS SQL Server.   Since you want to avoid the use of messages for the data I would still lean towards a global mapped to IRISTEMP, note NOT a process private global as the business hosts will be in separate processes.  This is a repetitive process so the fact that the DB will grow should not be an issue.  It will reach a size necessary to handle the load and stay there.  You can, if needed, programatically handle the mapping using the Config.MapGlobals API ( https://docs.intersystems.com/irislatest/csp/documatic/%25CSP.Documatic.cls?&LIBRARY=%25SYS&CLASSNAME=Config.MapGlobals).

So this would look something like this:

  1. Business Service accepts and processes the data into a non-journaled global on a schedule.  When the data has been consumed a message is generated that is sent to the Business Operation to trigger the next phase
  2. Business operation to write to MS SQL Server reacts to receiving a message to process, including how to identify the dataset to load.  The process consumes the data out of the Global to write to SQL Server.  Once complete the data in the Global for this run is purged.

Anyway that is my two cents on this.  I am sure others will have different ideas.

You could use a linked database (remote db projected as local tables) and then write an insert query to that linked table before it gets sent to a production.