Dynamic SQL DisplayFormatted split files on row count

Question

Question

Stefan Cronje · Jun 10, 2021

Hi,

I have the requirement to dump large result sets into files. I am currently using %SQL.Statement and the result set created by it.

The DisplayFormatted is used to write a CSV. The query I used to test does not contain any filters, it is just a select * from a table with 16 million rows.
The file created is 2GB in size. The issue I have is that it takes 8 hours to create this file.

I have seen in the past that file streams write fast at the start. As the file continues to grow, the process' memory usage increases until it reaches the max allowed. After that, the file writing slows down significantly. By monitoring the process I have seen that some form of memory management gets done, but it seems like some sort of swapping, as the memory used drops a little and then goes back to the max.

I have resolved this in other places in our applications where we write files, by adding a sequence to the file name, and only writing 20k records per file. This keeps the memory usage under control, and subsequently much faster file output.

Is there a standard way of splitting the files on a number of rows using the DisplayFormatted method of the resultset?

Thanks,
Stefan

Product version: Ensemble 2018.1

Discussion (3)1

Log in or sign up to continue

Marc Mundt · Jun 10, 2021

There's no option for %DisplayFormatted to split the results into multiple parts automatically. You would need to write a custom method to do your CSV output with the logic you need. I can post some sample code that outputs CSV if that would be helpful.

8 hours for 16 million records at ~140 bytes/record seems really slow. I just did a test on a slow Windows VM and for a simple "SELECT *" for 16 million records @ 150 bytes/record, %DisplayFormatted took about 18 minutes to output the 2.4 GB CSV.

I suggest you contact the WRC to help look at where the bottleneck is.

1 0

score 0 · Answer 1 · 2021-06-10T11:39:16-04:00

Thanks. I do not need a sample. I was just hoping I do not have to code it. But it is simple enough to do.

It is the per process memory where the limitation comes in I think. If you do that test with limited per process memory, it might be a different story. It may even be a difference in how files are handled within Cache on Windows vs. Unix.

score 0 · Answer 2 · 2021-06-11T07:11:41-04:00

Just to keep you informed.

I did a few tests. The more columns selected, the slower the output is, and quite significantly.