How to force binary hash algorithm in Ensemble production?

Primary tabs

TL;DR: If I set an Ensemble  Production Service based on EnsLib.File.PassthroughService to a Binary charset encoding, it breaks the file handling. Any ideas?

Full long post:

All,

I set up an Ensemble production to transfer files via SFTP, which works fine sending the files to my Linux server. Then I was informed that we needed to save a hash of the file, for comparison with the destination to verify the file arrived unmodified. I looked at the base64 encoded hash on my linux server:

G7QWAP6FcLInFWP8ECRL/EI2NfKdaf6TtrpwYuvSOEc=

and I created a class to extend the StreamContainer to add a hash value:

Class SFTP.HashFile Extends Ens.StreamContainer [ Inheritance = right, ProcedureBlock ]
{ Property Hash As %String(MAXLEN = 256); }

which seems to work fine. But when the production runs, the hash is listed as:

Vn0ygZZN8DewD8KfKKYF8BTV9K1LsuFBEXnAkqXBEFs=

I narrowed that down to Ensemble automatically choosing the stream type - in this case, it sees it as a FC type stream - File Character - but Linux sees it as a Binary encoding. I verified the issue with this bit of code:

HASHCHECK ; TEST FILE AND GET HASH
 New bstream,cstream,sc,bhash,chash,zfile
 Set zfile="d:\Patch\Test_File_1.CSV"
 Set bstream=##class(%Stream.FileBinary).%New()
 Set sc=bstream.LinkToFile(zfile)
 Do bstream.Rewind()
 Set bhash=$SYSTEM.Encryption.SHAHashStream(256,bstream)
 "Binary: "_$SYSTEM.Encryption.Base64Encode(bhash),!
 Do bstream.%Close()
 Set cstream=##class(%Stream.FileCharacter).%New()
 Set sc=cstream.LinkToFile(zfile)
 Do cstream.Rewind()
 Set chash=$SYSTEM.Encryption.SHAHashStream(256,cstream)
 "Char: "_$SYSTEM.Encryption.Base64Encode(chash),!!
 Do cstream.%Close()
 Q

And when run, here's the output:

ENSDEMO>D ^HASHCHECK
Binary: G7QWAP6FcLInFWP8ECRL/EI2NfKdaf6TtrpwYuvSOEc=
Char:   Vn0ygZZN8DewD8KfKKYF8BTV9K1LsuFBEXnAkqXBEFs=

I found a setting in the EnsLib.File.PassthroughService  to manually set the charset to Binary, and when I set that, the production will give me the correct hash and sends the file - but then recreates the file with a randomized OriginalFilename, waits 5 seconds (or whatever's set in the Retry Interval parameter), and sends it again... and again... and again.

When I look at the message viewer, the <Stream></Stream>  is empty:

<?xml version="1.0" ?>
<!-- type: SFTP.HashFile  id: 281 -->
<HashFile>
<OriginalFilename>C:\Export\aPvlvsYdsoNpng.CSV
</OriginalFilename>
<Stream></Stream>
<Type>FB
</Type>
<Attributes>
<AttributesItem AttributesKey="1" xsi:nil="true"></AttributesItem>
</Attributes>
<Hash>G7QWAP6FcLInFWP8ECRL/EI2NfKdaf6TtrpwYuvSOEc=
</Hash>
</HashFile>

Even though file is sent intact successfully.  (You can see an example of the 'random' filename above.)

Any ideas that would be easier than rewriting  a custom EnsLib.File.PassthroughService and/or Ens.StreamContainer to only handle binary encoded streams?

Thanks for any and all input!

Answers

Try change TranslateTable:

Class dc.test Abstract ]
{

ClassMethod runtests()
{
  ;d ##class(dc.test).runtests()

  data="тест"
  "data: ",?15,data,!

  cs=##class(%Stream.FileCharacter).%New(),
    cs.Filename="C:\Temp\test.txt",
    cs.TranslateTable="UTF8"
  cs.Write(data),
    cs.%Save()

  bstream=##class(%Stream.FileBinary).%New()
  bstream.LinkToFile(cs.Filename)
  "Binary:",?15,$system.Encryption.Base64Encode($system.Encryption.SHAHashStream(256,bstream)),!

  cstream=##class(%Stream.FileCharacter).%New()
  cstream.LinkToFile(cs.Filename)
  "Char:",?15,$system.Encryption.Base64Encode($system.Encryption.SHAHashStream(256,cstream)),!!
  cstream.TranslateTable="" ;or RAW, SAME
  "Char->Binary:",?15,$system.Encryption.Base64Encode($system.Encryption.SHAHashStream(256,cstream))
}

}

USER>##class(dc.test).runtests()
data:          тест
Binary:        409t7BLE9FmeugePMa6BOUINIbG9LXztfSKwnCB0+0g=
Char:          2eEII+27ZRfvbZvK4XNsx7WPDb+82DymPPOAdJ0p1SQ=
 
Char->Binary:  409t7BLE9FmeugePMa6BOUINIbG9LXztfSKwnCB0+0g=

Vitaliy,

Thanks for the code, but how would I integrate changing the the TranslateTable in the DTL of my Production Process? That's where the hash is getting created; I tried several permutations of ' set source.Stream.TranslateTable="" '  but all gave me a <PROPERTY DOES NOT EXIST> error. The CharEncodingTable is a Readonly Internal value, and trying to changing that gives me a <CANNOT SET THIS PROPERTY> error.

Although, looking at the source code of the Ens.StreamContainer class, that might give me an idea how to rewrite the class _easily_ to accomplish what I need...

Thanks for the input!

You can solve the issue "head-on", namely, to re-convert character stream using %IO.StringStream:

Class dc.test Abstract ]
{

ClassMethod test()
{
  char="тест",
    bin=$zcvt(char,"O","UTF8")

  "dump char:" zzdump char
  !!,"dump binary:" zzdump bin

  !!,"Char:",?21,$system.Encryption.Base64Encode($system.Encryption.SHAHash(256,char)),!,
     "Binary:",?21,$system.Encryption.Base64Encode($system.Encryption.SHAHash(256,bin)),!!
    
  cs=##class(%Stream.TmpCharacter).%New()
  cs.Write(char)
  
  stream=##class(%IO.StringStream).%New()
  stream.CharEncoding="UTF8" ; here should be the encoding of your CSV file
  stream.CopyFrom(cs; or d stream.CopyFrom(source.Stream)
  stream.CharEncoding="Binary"

  /*

  Attention! The following code is needed to work around an error in method the SHAHashStream,
  since it expects "Rewind() As %Status", but class the %IO.StringStream uses "Rewind(Output pSC As %Status)"

  */
  
  bs=##class(%Stream.TmpBinary).%New()
  bs.CopyFrom(stream)

  "Char (stream):",?21,$system.Encryption.Base64Encode($system.Encryption.SHAHashStream(256,cs)),!,
  "Binary (stream):",?21,$system.Encryption.Base64Encode($system.Encryption.SHAHashStream(256,bs)),!!
}

}

Result:

dump char:
0000: 0442 0435 0441 0442                                     тест
 
dump binary:
0000: D1 82 D0 B5 D1 81 D1 82                                 Ñ.еÑ.Ñ.
 
Char:                2eEII+27ZRfvbZvK4XNsx7WPDb+82DymPPOAdJ0p1SQ=
Binary:              409t7BLE9FmeugePMa6BOUINIbG9LXztfSKwnCB0+0g=
 
Char (stream):       2eEII+27ZRfvbZvK4XNsx7WPDb+82DymPPOAdJ0p1SQ=
Binary (stream):     409t7BLE9FmeugePMa6BOUINIbG9LXztfSKwnCB0+0g=

recreates the file with a randomized OriginalFilename

Recreates in the same folder?

What is the value of FilePath, WorkPath, ArchivePath, DeleteFromServer settings?

You need to solve the problem with file recreation, as specifying "binary" as a Charset setting gives you the correct hash.

Yes, recreates in the same folder, as the production then re-sends the file (with the randomized filename) via SFTP to the remote server.

If it makes a difference, the production is on a Windows 10 system, although I have access to a Linux server for testing as well.

Settings:

File Path: C:\Export\

Archive Path: C:\Export\CSV_Complete

[[ I have two different business services looking for two different extensions)

Work Path: [[ Null ]]

I don't see a 'DeleteFromServer' setting... will continue digging.

Also, Conform Complete: Readable (default)

File Access Timeout: 2

Hope this helps, and thanks!

What's your SubdirectoryLevels setting value?

Try to move Archive Path outside of File Path.

SubdirectoryLevels: 0

I honestly didn't think that it would make a difference, as when Charset = Native, it works fine. Only when Charset = Binary is when it acts odd.

That said, I changed ArchivePath to c:\Export2\CSV_Complete\ -- restarted the production and tested -- no change in behaviour.

Thanks!

I'm out of ideas. There should not be any changes between file and character streams besides encoding.

I think you need to share a minimal sample that reproduces this error or contact the WRC.