Question
prashanth ponugoti · Oct 19, 2022

Not able to convert pdf to base64 if pdf size is more than 4MB

 

Hi Friends,

I have a requirement to convert pdf from URL to Base64 format. I have created one utility method and used in the DTL.

Working fine for small pdf files , we got one pdf with size 4MB , this method is failing (creating corrupted base64 content).

Could you please suggest me the way to convert big pdfs?

set encodedData = ""
set request=##class(%Net.HttpRequest).%New()
    do request.Get(httpUrl)
    if request.HttpResponse.StatusCode = 200
    {
//set len = request.HttpResponse.Data.SizeGet()
    set content = request.HttpResponse.Data.Read()
    set encodedData = $system.Encryption.Base64Encode(content)
    set encodedData=$translate(encodedData, $c(13,10))
    }
    QUIT encodedData

 

Thanks

Prashanth

Product version: IRIS 2021.2
$ZV: IRIS2023
0
0 164
Discussion (6)2
Log in or sign up to continue

I am assuming your problem is that request.HttpResponse.Data.Read() is complaining because you are reading the entire pdf file into an ObjectScript variable with its maximum supported string length of 3,641,144 characters.  You will have to read it out in smaller chunks that individually fit into an ObjectScript string.  The chunksize will be important as you pass the chunked data to $system.Encryption.Base64Encode(content) and your chunks cannot end between the boundaries between two different BASE64 encoding blocks.  The results of each Base64Encode must then be sent to some form of %Stream (probably %Stream.GlobalBinary or %Stream.FileBinary) since only a %Stream can hold a block of data larger than 3,641,144 characters.  Using a small, appropriate chuncksize will limit the in-memory resources used by this conversion.

If you don't mind having the entire PDF file in memory at one time you can use %DynamicObject to hold and decode that base64 data.  The %Library.DynamicObject and %Library.DynamicArray class objects are usually used to represent data that was originally JSON encoded.  These Dynamic Objects exist only in memory but you can serialize them into JSON textual representation using the %ToJSON(output) method.  But if the JSON text representation contains more than 3,641,144 characters then you better direct 'output' into some form of %Stream.

You can convert a binary pdf file into BASE64 encoding doing something like:

SET DynObj={}  ;; Creates an empty %DynamicObject
DO Dynobj.%Set("pdffile",request.HtttpResponse.Data,"stream")
SET Base64pdf=Dynobj.%Get("pdffile",,"stream>base64")

Then Base64pdf will a readonly, in-memory %Stream.DynamicBinary object which is encoded in BASE64.  You can use Base64pdf.Read(chunksize) to read the BASE64 out of Base64pdf in ObjectScript supported chunks.  You do not have to worry about making sure the chunksize is a multiple of 3 or a multiple of 4 or a multiple of 72.  You can also copy the data in Base64pdf into a writeable %Stream.FileBinary or a %Stream.GlobalBinary using the OtherStream.CopyFrom(Base64pdf) method call.

If your HttpResponse contains a BASE64 encoded pdf file instead of a binary pdf file then you can do the reverse decoding by:

SET DynObj={}
DO Dynobj.%Set("pdffile",request.HtttpResponse.Data,"stream<base64")
SET BinaryPDF=Dynobj.%Get("pdffile",,"stream")

Then BinaryPDF is a readonly %Stream.DynamicBinary containing the decoded pdf data.  You can copy it to a %Stream.FileBinary object which can then be examined using a pdf reader application.

Your solution is nearly perfect, here my quick (untested) version.


ClassMethod Encode()
{
	// You read N bytes (which MUST be divisible by 3) and write N*4/3 encoded bytes
	// 3 * 8190 = 24570; 24570 * 4 / 3 = 32760;  32760 < 32768; to avoid (slow) long strings
	set CHUNK=24570
	set NOCR=1	// don't insert CRLF after each 72 written bytes
	set encodedData=##class(%Stream.TmpBinary).%New() // adapt this to your needs: %Stream.Whatever...
	
	set request=##class(%Net.HttpRequest).%New()
	set request.Server="..."
    do request.Get("/...")
    
    if request.HttpResponse.StatusCode = 200 {
    	while 'request.HttpResponse.Data.AtEnd {
	    	do encodedData.Write($system.Encryption.Base64Encode(request.HttpResponse.Data.Read(CHUNK),1))
		}
	}
    QUIT encodedData
    
    // as an alternative, you could return a string or a streamobject
    set YOURMAXSTRING = 32767 // or 3641144
    if encodedData.Size <= YOURMAXSTRING {
	    do encodedData.Rewind()
	    quit encodedData.Read(encodedData.Size)
    } else { quit encodedData }
}

Thanks, Julius Kavay

ENCODE to base64 is working for me now as per your suggested code snippet.

Here I have another issue after the conversation.

I need to send the converted string in the JSON property "Filecontent" to the REST outbound.

FilecontentdataType is String, which exceeds the size when I assign this converted stream to String.

I have changed the Filecontent dataType to Stream, but when I convert the object to JSON ,  it is ignoring this property.

could you please any suggestions will be very much helpful?

thanks,

Prashanth

REST-API is not my daily bread, so show me a few lines of your (problematic) code and I will try my best