Question
· Nov 15

stream.Read() only reading in chunks of 1K

I use the following code to calculate the SHA1 of a file :

set stream = ##class(%Stream.FileBinary).%New()
do stream.LinkToFile(filename)
write $SYSTEM.Encryption.Base64Encode($SYSTEM.Encryption.SHA1HashStream(stream))

This code is called thousands of time and performance is critical. I have tried to code same logic in another language (which is lower level) and it's almost twice as fast. It's unclear why so I started investigating.

Using Process Monitor, it shows that files are read in chunks of 1024 bytes (1K) which is suboptimal. Reading a file of 1MB while require 1024 file system calls. Usually bigger buffer is used (eg : 4096 or 81920).

The SHA1HashStream() function is implemented this way : 

do $System.Encryption.SHAHashReset(160)
set sc=stream.Rewind() If $$$ISERR(sc) Quit ""
while 'stream.AtEnd {
	do $System.Encryption.SHAHashInput(160, stream.Read(32000,.sc))
	if $$$ISERR(sc) Quit
}
quit $System.Encryption.SHAHashResult(160)

stream.Read(32000) will do the following call :

Read:32000

So I except it to read the file in chunks of 32000 bytes, but that's not the case.

Is this excepted behavior ? Is there a way to change this ?

EDIT: I have been able to force 1024 bytes reads in the other language implementation and it's still about twice faster so performance issue is probably due to something else.

Product version: IRIS 2023.1
$ZV: IRIS for Windows (x86-64) 2023.1.3 (Build 517U) Wed Jan 10 2024 13:36:58 EST
Discussion (2)1
Log in or sign up to continue

The documentation for the Read method says "Some stream classes use this to optimize the amount of data returned to align this with the underlying storage of the stream." I take this to mean that for a file stream, it might be trying to read in a way that aligns with how the drive is formatted. Can you run the command below and see if the Bytes Per Cluster is 1024?

C:\Windows\System32>fsutil fsinfo ntfsInfo C:
NTFS Volume Serial Number :        0x0060ba3960ba356e
NTFS Version      :                3.1
LFS Version       :                2.0
Total Sectors     :                997,918,719  (475.8 GB)
Total Clusters    :                124,739,839  (475.8 GB)
Free Clusters     :                 96,063,960  (366.5 GB)
Total Reserved Clusters :              893,290  (  3.4 GB)
Reserved For Storage Reserve :         884,043  (  3.4 GB)
Bytes Per Sector  :                512
Bytes Per Physical Sector :        512
Bytes Per Cluster :                4096  (4 KB)
Bytes Per FileRecord Segment    :  1024
Clusters Per FileRecord Segment :  0
Mft Valid Data Length :            512.50 MB
Mft Start Lcn  :                   0x00000000000c0000
Mft2 Start Lcn :                   0x0000000000000002
Mft Zone Start :                   0x0000000001d59580
Mft Zone End   :                   0x0000000001d65da0
MFT Zone Size  :                   200.13 MB
Max Device Trim Extent Count :     256
Max Device Trim Byte Count :       0xffffffff
Max Volume Trim Extent Count :     62
Max Volume Trim Byte Count :       0x40000000
Resource Manager Identifier :      BBA1AD65-C5EC-11EE-8ED5-D0AD0854D65E

Hello, I got the same as you (4096) : 

D:\>fsutil fsinfo ntfsInfo D:
NTFS Volume Serial Number :        0x52a864f9a864dd4b
NTFS Version   :                   3.1
LFS Version    :                   2.0
Number Sectors :                   0x000000003e7be7ff
Total Clusters :                   0x0000000007cf7cff
Free Clusters  :                   0x0000000000f5785c
Total Reserved :                   0x0000000000000400
Bytes Per Sector  :                512
Bytes Per Physical Sector :        512
Bytes Per Cluster :                4096
Bytes Per FileRecord Segment    :  1024
Clusters Per FileRecord Segment :  0
Mft Valid Data Length :            0x0000000089b00000
Mft Start Lcn  :                   0x00000000000c0000
Mft2 Start Lcn :                   0x0000000000000002
Mft Zone Start :                   0x0000000006cb4d40
Mft Zone End   :                   0x0000000006cb7320
Max Device Trim Extent Count :     64
Max Device Trim Byte Count :       0x7fe00000
Max Volume Trim Extent Count :     62
Max Volume Trim Byte Count :       0x40000000
Resource Manager Identifier :     F59E5B7C-C569-11ED-B0AE-AC1F6B365CAA