I use the following code to calculate the SHA1 of a file :
set stream = ##class(%Stream.FileBinary).%New()
do stream.LinkToFile(filename)
write $SYSTEM.Encryption.Base64Encode($SYSTEM.Encryption.SHA1HashStream(stream))
This code is called thousands of time and performance is critical. I have tried to code same logic in another language (which is lower level) and it's almost twice as fast. It's unclear why so I started investigating.
Using Process Monitor, it shows that files are read in chunks of 1024 bytes (1K) which is suboptimal. Reading a file of 1MB while require 1024 file system calls. Usually bigger buffer is used (eg : 4096 or 81920).
.png)
The SHA1HashStream() function is implemented this way :
do $System.Encryption.SHAHashReset(160)
set sc=stream.Rewind() If $$$ISERR(sc) Quit ""
while 'stream.AtEnd {
do $System.Encryption.SHAHashInput(160, stream.Read(32000,.sc))
if $$$ISERR(sc) Quit
}
quit $System.Encryption.SHAHashResult(160)
stream.Read(32000) will do the following call :
Read:32000
So I except it to read the file in chunks of 32000 bytes, but that's not the case.
Is this excepted behavior ? Is there a way to change this ?
EDIT: I have been able to force 1024 bytes reads in the other language implementation and it's still about twice faster so performance issue is probably due to something else.