Scott Beeson · Oct 6, 2016

ELI5: How does selecting 400,000 records take 12 seconds but selecting 500,000 records take ... to be determined?

I experience this constantly with Cache SQL.  Especially when querying the ATNA log.


SELECT TOP 400000 * FROM HS_IHE_ATNA_Repository.Aggregation ORDER BY ID DESC


That took 12 seconds.  I then upped the number to 500,000 and it took 185 seconds.


Shouldn't the execution time scale proportionately?


If I run the 500,00 query again it takes 2.4 seconds.

0 389
Discussion (4)1
Log in or sign up to continue

Hi Scott,

Nope!  The issue here is probably caching.  While things are in memory they are fast, and it is slow when they have to get off of disk. So when you get the next 100,000 rows you need to read data off of the disk and that takes some time.  If everything is in memory or on disk then you might get a proportional increase.

This explains it. And I'd heard this before, I guess it just really doesn't make sense to me.  Sometimes it seems arbitrary which queries are fast and which are not.

Sometimes such strange results are caused by ignoring the fact that usually there are several levels of caching, from high to low:

- Caché global cache

- filesystem cache (on Linux/UNIX only, as Windows version uses direct i/o)

- hdd controller cache.

So even restarting Caché can be not enough to drop the cache for clear "cold" testing. The tester should be aware of data volume involved, it should be much more than hdd controller cache (at least). mgstat can help to figure this out, besides it can show when you start reading data mostly from global cache rather than from filesystem/hdd.

Thanks for the info. Since we're hosted by Intersystems I assume all that was taken into account :)


I just want my data. :)