What is the best practice for parcing RecordMap (batch) when DTL is insufficient

Question

Question

Thom de Groot · Nov 15, 2023

#CSV #DTL #Mapping #InterSystems IRIS for Health

Hello,

For a particular problem we were trying to parse a relatively large *.csv file with a recordmap. We are doing this from a BPL where we start with a REST call to acquire the file. This file needs a slight transformation which we tried in a DTL. However DTL's seem to be incapable of parsing larger files.

We see two different alternatives: The first one to write custom code to parse the file. And the second option to write SQL queries and to write the output to a csv file. We were wondering what the best practice for parsing larger files, transforming the data, and filter the data to different locations based on fields in the *.csv are and would appreciate input.

Thanks in advance!

Discussion (7)3

Log in or sign up to continue

Stefan Cronje · Nov 16, 2023

Hi,

I am not sure I know what the issue is exactly. Is it speed, is it DB writes, or memory (FRAMESTACK) errors.

In the meantime, some things we've done to help with large files.

Increase your Per Process Memory allocation.
If there is an iteration (for loop) in the DTL, at the end of the loop, unswizzle the record that was just used to free up memory. https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=EDTL_other#EDTL_foreach_unload_target
In the BPL, for speed increases and fewer database writes, set the Parameter SKIPMESSAGEHISTORY to 1.

We have BPL's that process CSV files with 100k rows, and these things helped.

1 0

score 0 · Answer 1 · 2023-11-15T06:07:54-05:00

Hi,

You can use EnsLib.RecordMap.Service.FileService, this will ingest large files and generate many small messages that you can then send to a router / DTL.

Here is a video explaining how its done...

https://learning.intersystems.com/course/view.php?id=1094

Good luck!

score 0 · Answer 2 · 2023-11-16T03:36:15-05:00

Hello,

Thank you for the reply. However i don't see this working for this case. We already have an element in the BPL that uses a recordmap to make records out of the CSV to send them to the DTL. We can only get the csv by a REST call, so using the BusinessServer for record maps is not realy an option. If the best way to do this is without a record map we would also like to know.

score 0 · Answer 3 · 2023-11-23T02:59:23-05:00

After some more tries, starting the "for each loop" and ending it with "unswizzle" inside the DLT seemed to have fixed our problem.

Thank you all for the replies

score 0 · Answer 4 · 2023-11-16T05:43:32-05:00

Hi Thom,

What are you trying to do with the data?

Rereading your original post, you say you want to make a slight transformation to the file, which almost suggests the end result is another CSV file?

Perhaps if you could expand the requirements a little then it will be easier to point you in the simplist direction.

score 1 · Answer 5 · 2023-11-16T06:44:37-05:00

Hi Thom,

I suggest you don’t use recordmapper at all or DTLs neither are good for very wide csvs, use a file reader and save the input to a quickstream. Pass the stream id to a custom process which walks the quickstream changes each line and saves to a new quick stream, pass final quickstream to outbound. This will prevent sending individual lines to processes or Dtls (which if your csv is huge would be a massive overhead). You can contact me and I’d be happy to help.

score 1 · Answer 6 · 2023-11-16T11:08:48-05:00

Hey Paul,

Half agree if the OP requirements turn out to be a file to file use case.

If not, I wanted to defend the record mapper solution a little so as not to put off other readers from this approach.

As an example, I recently implemented a solution that had a million line CSV file. This would generate around 300,000 messages from a complex mapping. Each record was individually passed into a DTL which in turn produced a master file update message. These were then pushed into the local EPR (as per the EPR's requirements).

Yes, many messages, but not what I would call an overhead when taken into context of the frequency and need for M16 messages.

Bottom line, the solution I selected was the most maintainable solution. Almost zero glue code to go wrong and all maintanence managed inside DTL's. This is exactly what the record mapper was designed for.