Question
· Oct 26, 2016

Data extraction using SQL Via ODBC

Hi All,

            I have a general query in regards to developers experience on extracting data from cache databases and the most efficient way to do so. I work with a number of clients who have applications with cache databases and require the data off the host system and onto data warehouse platforms for research and analysis. Often they require the data in source state which means the extracts are often simply a table scan of the entire database table without any aggregation or manipulation.The approach we have always taken has been to use ODBC drivers with SQL queries in order to extract the data but it often leads to mixed results. Some tables can be very efficient on extract whilst others with less fields and volume take far longer (Appear to be leaf tables in complex hierarchies, assuming the query has to traverse the hierarchy in order to find the data).

Sometimes applying where clauses can make the queries perform worse than doing the raw table scan, appreciate adding indexes would help however it's not an option since we don't own the system, we simply want the data. We have done some chicanery around finding ID numbers based on dates then using these to drive queries but reluctant to pursue such an approach as complicates the process.

We've often also encountered performance degradation on the live databases possibly due to the high volume of transactions we commit with the database being fragmented across disk. We have previously resolved this by re-organising the data via the GBLOC copy routine however it can't be done frequently due to the impact on service.

We have recently deployed a shadow server specifically for extracting data which has improved performance but I am keen to understand if there are any other approaches that could be considered to improve extract time, appreciate native code may be an option but would like to hear and share experiences with the community on this.

Ps kudos to everyone for setting this up and contributing some of the content is excellent.

Discussion (2)0
Log in or sign up to continue

Paul,

Have you looked at the details in the Caché SQL manual about optimizing performance:

http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=...

Play particular attention to the section about using TuneTable to properly set the meta-data around table extent size, selectivity, etc.  Lot of people miss this and it's very important because the query plans rely on this metadata to optimize performance.