How to use Base64 to encrypt a string that exceeds the maximum length of string type

Primary tabs

Now I want to return a large amount of data to the front end. The string length has reached 40000 +, and the returned data needs to be encrypted by AES + Base64. I can convert the string into a stream. AES can use the AESCBCEncryptStream method to encrypt, but Base64 has no stream method。Anyone who get the solution  would you kindly share the solution please。

Any help would be appreciated. Thanks!

Replies

/// Encode a stream as BASE64
ClassMethod Base64EncodeStream(
  pStream As %Stream,
  Output pEncoded As %StreamAs %Status
{
  tSC=$$$OK
  try {
    tSC=pStream.Rewind()
    q:$$$ISERR(tSC)

    pEncoded=##class(%Stream.TmpCharacter).%New()
    while 'pStream.AtEnd {
      tLen=5700 
      tSC=pEncoded.Write($system.Encryption.Base64Encode(pStream.Read(.tLen)))
      q:$$$ISERR(tSC)
    }
    q:$$$ISERR(tSC)
        
    tSC=pEncoded.Rewind()
  catch (e{
    tSC=e.AsStatus()
  }
  tSC
}

Base64 encoding only works on strings of 8-bit bytes.  If you have a Unicode string with a character with an encoded value greater than 255 then direct Base64 encoding is not possible.  The following documentation excerpt from the %SYSTEM.Encryption.Base64Encode Class Reference page will describe a first step at turning Unicode into a UTF-8 byte string and then applying the Base64 encoding to that byte string (but if your encoded stream exceeds the string limit in size then you will need to do more.)

Note: Base 64 encoding is not able to encode a string which contains unicode (2 byte) characters. If you need to Base 64 encode an unicode string, you should first translate the string to UTF8 format, then encode it.
s BinaryText=$ZCONVERT(UnicodeText,"O","UTF8")
s Base64Encoded=$system.Encryption.Base64Encode(BinaryText)
Now to Decode it:
s BinaryText=$system.Encryption.Base64Decode(Base64Encoded)
s UnicodeText=$ZCONVERT(BinaryText,"I","UTF8")

Now if your Base64Encoded, UTF-8 encoded string cannot be longer than 3,641,144 bytes then you can ignore the rest of this reply.  Just execute the code excerpt from the %SYSTEM.Encryption.Base64Encode Class Reference documentation.  If you have a %Stream that is too long then you will need to loop over substrings of the %Stream and you should read on.

If you read from a long %Stream a sequence of substrings then it might not be possible to simply call $system.Encryption.Base64Encode(BinaryText,...) and $ZConvert(BinaryText,"I","UTF8") and because the substrings can be broken between binary bytes that must be combined in order to do the conversion.  So you will need to workaround some issues in the code that was part of the %SYSTEM.Encryption.Base64Encode Class Reference documentation.

Issue one: the SET BinaryText=$ZCONVERT(UnicodeText,"O","UTF8") statement can return more characters of BinaryText than there are characters of UnicodeText.  The new string length may not be a perfect multiple of 3 characters long and $system.Encryption.Base64Encode converts sequences of 3 bytes into 4 bytes so when the Base64Encode(substring) method is called with a substring that has a non-multiple of 3 byte length then the extra 1 or 2 bytes at the end of the substring must be saved and then concatenated onto the beginning of the next substring to be passed to the Base64Encode(substring) method.  Only the very last call on the Base64Encode(substring) method can have a substring byte length which is not a multiple of 3.

Issue two: the "SET UnicodeText=$ZCONVERT(BinaryText,"I","UTF8")" statement might be given a BinaryText substring that ends with an incomplete UTF-8 sequence of characters.  That incomplete sequence must be concatenated onto the beginning of the next BinaryText substring.  Fortunately, the $ZCONVERT function takes an optional fourth argument which is a local variable name.  Evaluating $ZCONVERT(UnicodeText,"O","UTF8",handle) will do its conversion on an input value containing the concatenation of handle with UnicodeText.  When $ZCONVERT is done then the new value of handle will either be the empty string or handle will contain the unconverted substring at the end of the input value.  So before your loop which decodes the %Stream containing Base64 encoded UTF-8 bytes you should execute SET handle="" and inside your decoding loop you should call $ZCONVERT using handle as a forth argument variable.  If at the exit of your decoding loop the variable handle does not contain the empty string then your input stream was ill-formed.

Note that when your decoding loop reads a Base64Encoded substring from the %Stream then I am assuming you read a perfect multiple of 4 bytes from the %Stream before you execute SET BinaryText=$system.Encryption.Base64Decode(Base64Encoded).  [[ If the Base64 encoded %Stream was not generated using the code described above then I am also assuming the Base64 encoded BinaryText does not contain any additional white-space characters, or if it does contain white space then those white-space characters were removed before building a substring that contains a perfect multiple of 4 bytes. ]]  Each sequence of 4 bytes in Base64Encoded will be turned into 3 bytes in BinaryText.

I should point out that if you have a Unicode character with an encoding larger than 65535 (i.e., encoded in UTF-16 using a surrogate pair of two adjacent 16-bit characters) then the statement SET BinaryText=$ZCONVERT(UnicodeText,"O","UTF8") in the encoding loop will also need a fourth "handle" argument to handle the case where the UnicodeText substring ends with the leading half of a surrogate pair.  Characters with Unicode encoding greater than 65535 consist mostly of less frequently used Chinese, Korean and Japanese ideograms but also many of the emojis.  

My comment applies generally to this entire discussion and I know your comment was not talking about Unicode character conversion.  However, your comment did include actual code with a loop.  I wanted to make it easy for readers to see your Base64 encoding which is looping over a %Stream while I discussed the issues involved with adding calls to $ZCONVERT(UnicodeText,"O","UTF8",handle) to convert 16-bit UTF-16 characters into 8-bit UTF-8 bytes which could then be Base64 encoded.

Thank you Vitaliy. why 5700? is there an optimal chunk size to split a large stream?

Vitaly did not explain this magic number, I'll do. An important thing to now about Base64 is that any thee bytes of data encoded to four bytes. So, if you need encode data you have to read in length devisable by 3 (5700/3=1900), and when you decode data, read divisible by four. But decoded data may contain line endings, which should be omitted and not counted.

So, you can use 24573 if you'd need to encode, and 32764 to decode data.

24573 appears as 32764/4*3.

24573 bytes becomes 32764 after encoding, so, it will not be more than the default string limit if no long string activated.

String limit is 3 641 144 so 40 000 symbols is quite okay for a string.

I tried your method yet。Before base64 encryption, the string is encrypted by AES. If the string is encrypted by intercepting the string, if the Chinese character is encountered, the AES encryption string of one Chinese character will be probably intercepted into two strings, and the encrypted data will have problems。

 
Example

Result:

USER>##class(dc.test).Test("test"_$c(68))
gTNg0UMkvQ3o+ehJkvr6lA==

USER>##class(dc.test).Test("test"_$c(768))
R8UuZkjDVZidYckYMTpnVg==

USER>##class(dc.test).Test("测试")
lsYxFAQgNtiXHyaeGTWJ0A==