Question
Pavithra Rajamohan · Oct 13, 2021

Method for retreiving embedded PDF file from JSON

Hello all,

I receive an embedded PDF link as part of a JSON message. The %Get method is currently used to extract the data items from the JSON message, however we have come across a problem when trying to extract a data item which is an embedded PDF file as the string is too long. Does anyone have any suggestions as to what would be the appropriate method to use to extract this particular data item?

Thanks

1
0 590
Discussion (5)2
Log in or sign up to continue

%Get() accepts a type parameter -- you can specify a type of stream and it will return a stream object, which is not limited by the max size of strings.

Thanks Marc, unfortunately we are using an older version of Healthshare (2017) and the %Get method doesnt include the type parameter and only has the key. The method listed in the system is as follows:

Method %Get(key) As %CacheString
{
    
    try {
set ans = $zu(210,10, .key)
    catch do $$$APPERROR1($$$LASTERROR) }
    ans
}

Are there any other methods that can be used?

Thanks!

Would this section of documentation help with your situation?

https://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=GJSON_create#GJSON_create_serialize_streams

(I'm aware this is from the latest documentation online, but I did confirm that this section also exists in my HealthShare 2017.2 version as well.)

That section appears to basically describe how to save the JSON to a temporary file on the filesystem, then re-open the file as an Object and access the key successfully without hitting <MAXSTRING>. Yes, I'm aware that does cause extra storage I/O, so for busy servers this could have a negative impact on I/O performance.

It looks like they also offer a possible solution changing the JSON entity to a %Stream.GlobalCharacter which can handle strings much larger than <MAXSTRING> which may help with not adding nearly so much storage I/O as the solution in the previous paragraph.

Hope this helps!

As mentioned in another Developer Community Question, a recent version of IRIS would allow you to  evaluate object.%Get("pdfKeyName",,"stream") which would return to you an %Stream object containing the JSON string in question as raw characters.  Also, %Get in IRIS can support object.%Get("pdfKeyName",,"stream<base64") which would do Base64 decoding as it creates the %Stream. However, you said you need to stick with an older version of Caché/Ensemble which predates these %Get features.

It is possible to convert the long JSON string component to a %Stream but it will take some parsing passes.

(1) First use SET json1Obj=[ ].%FromJSON(FileOrStream) to create json1OBJ containing all the elements of the original JSON coming from a File or Stream.

(2) If your pdf JSON string is nested in json1OBJ then select down to the closest containing %DynamicObject containing your pdf JSON string,  I.e. Set json2Obj=json1Obj.level1.level2 if your original JSON looks like {"level1":{"level2":{"pdfKeyName":"Very long JSON string containing pdf", ...}, ...}, ...}

(3) Create a new %Stream containing the JSON representation of that closest containing %DynamicObject.  I.e.,

   SET TempStream=##class(%Stream.TmpBinary).%New()
   DO json2Obj.%ToJSON(TempStream)

(4) Read out buffers from TempStream looking for "pdfKeyName":" .  Note that this 14-character string could span a buffer boundary.

(5) Continue reading additional buffers until you find a "-characer not preceded by a \-character; Or until you find a "-character preceded by an even number of \-characters.

(6) Take characters read in step (5) and pass them through $ZCVT(chars,"I","JSON",handle) to convert JSON string escape characters to binary characters and then write them to a %Stream.FileBinary (or some other Binary %Stream of your choosing.)

(7) If your JSON "pdfKeyName" element contains Base64 encoding  then you will also need to use $SYSTEM.Encryption.Base64Decode(string). Unfortunately $SYSTEM.Encryption.Base64Decode(string), unlike $ZCVT(chars,"I",trantable,handle), does not include a 'handle' argument to support the buffer boundary cases in those situations where you have broken a very large string into smaller buffers.  Therefore you should remove the whitespace characters from 'string' before calling $SYSTEM.Encryption.Base64Decode(string) and you must make sure the argument 'string' has a length which is a multiple of 4 so that a group of 4 data characters cannot cross a buffer boundary.

Again the more complete support in IRIS can do steps (2) through (7) without worrying about buffer boundaries by executing

   Set BinaryStreamTmp = json1Obj.level1.level2.%Get("pdfKeyName",,"stream<base64")

and you can then copy the BinaryStreamTmp contents to wherever you want to resulting .pdf file.