Which are the best practises to handle large JSON?

Question

Question

Pietro Di Leo · Jun 13, 2024

#Embedded Python #FHIR #JSON #ObjectScript #Performance #Python #HealthShare #InterSystems IRIS

Hello everyone,

Recently, I've been working on a Business Process that processes a large JSON FHIR message containing up to 50k requests in an array within the JSON.

Currently, the code imports the JSON as a dynamic object from the original message stream, obtains an iterator from it, and processes each request one at a time in a loop.

The performance meets the requirements, even with much larger requests than the one exposed above. However, I am interested in learning the best practices for handling large JSON in ObjectScript (and why not, Embedded Python) to achieve better performance for future developments.

Here are some ideas I have considered:

Iterative Processing: Process each element of the JSON iteratively without loading the entire stream into memory.
Chunking: Split the loaded JSON into smaller chunks and process them one by one. I could split the array in parts and I wonder if getting an iterator from a smaller JSON chunk would reduce computation time.
Parallel Processing: After splitting the JSON, using parallel processing to handle multiple requests simultaneously. I read something about %SYSTEM.WorkMgr but I'm not sure how can I implement it in this case.

I would appreciate any technical information or reference to documentation about the listed topic or any other additional best practices that could help in handling large JSON data more efficiently in ObjectScript and Embedded Python.

Thank you!

Product version: IRIS 2023.1

Discussion (2)3

Log in or sign up to continue

Steven Hobbs · Jun 13, 2024

%DynamicAbstractObject-s provides the fastest for reading JSON from an external location and once in memory it provides excellent random access reading of data components. Large JSON arrays/objects are supported. I have done %FromJSON on a 20GB JSON file on a system with 16GB of physical memory—required paging of memory but still acceptable performance. You don’t require random access performance and it is best to avoid using more memory than you need. Maybe someday InterSystems can provide methods for reading JSON subobjects while remember how that object was nested in its parent objects.

Assuming your JSON is in a file or stream and the outer array contains only elements which are arrays or objects you can try the following. 1. First read the leading ‘[‘; 2. Then use %FromJSON to read the array element and process that JSON element (the read will stop on the closing ‘]’ or ‘}’); 3. Read single chars from your stream/file skipping white space and if that char is a ‘,’ then go back to step 2 but if that char is ‘]’ then you are done.

0 0

score 0 · Answer 1 · 2024-06-16T04:11:59-04:00

Ciao Pietro,

as said %DynamicAbstractObject has excellent performance and can handle easily very large JSON streams/files.
Depending on your system settings, for large JSON you may need to accommodate process memory, fortunately you can adjust it in your code at runtime so you can write robust code that does not depend on system configuration parameters.
Note that the default value of "Maximum Per-Process Memory" has changed during time, so a new installation and an upgraded installation have different values.

IMHO the real question here is: in what side of the JSON processing is your code?

Are you generating the JSON or are you receiving the JSON from a third party?

If you are receiving the JSON, then I don't think there is much you can do about, just load it and IRIS will handle it.
I'm pretty sure that any attempt to "split" the loading of the JSON stream/file will result in worst performance and consumed resources.
To split a large JSON you need to parse it anyway....

If you are generating the JSON, then depending on the project and specifications constraints, you may split you payload in chunks, for example in FHIR the server can choose to break a bundle it up into "pages".

I'm not sure if your question is about loading the JSON file/stream into a %DynamicAbstractObject or about processing the large %DynamicAbstractObject once it has been imported from JSON?

What's your problem and what's your goal?