User bio
404 bio not found
Member since Sep 9, 2020
Replies:
  • Both systems are using Windows Server 2012 R2 Standard and Hyper-V (with same very similar CPU). 
  • Both systems are using a core license.
  • CreateGUID is not the bottleneck for sure. This is something I have checked very early. Removing the write to the global (keeping CreateGUID) will allow CPU to reach 100%. The effect of using a GUID (versus a incremental ID) is to spread out the global node writes, which might affect performance. But that not the explanation, because then both systems should be affected.

I have edited OP to reflect those details.
I have tested this on 4 systems (all very similar), and only one behave like that (slow DB writes).

FileSet does a lot of things under the hood. I found that it does several QueryOpen operations per file, due to GetFileAttributesEx calls to get file size, modified date and such. One call should be enough, but FileSet does 4 calls per file :



$ZSEARCH seems more efficient (especially if you don't need extra file info like size or date). This function is not meant to be called in a recursive context, so special care is needed :

kill FILES
set FILES($i(FILES))="C:\somepath\"
set key = ""
for
{
    set key = $order(FILES(key),1,searchdir)
    quit:key=""
    set filepath=$ZSEARCH(searchdir_"*")
    while filepath'=""
    {
        set filename = ##class(%File).GetFilename(filepath)
        if (filename '= ".") && (filename '= "..") //might exclude more folders
        {
            if ##class(%File).DirectoryExists(filepath)
            {
                set FILES($i(FILES)) = filepath_"\" //search in subfolders
            }
            else
            {
                //do something with filepath
                //...
            }
        }

        set filepath=$ZSEARCH("")
    }
}

$ZSEARCH still does one QueryOpen operation per file (AFAIK it's not needed since we only need filename, which is provided by QueryDirectory operation happening before, using FindFirstFile) , but at least it does it only once.

Based on my own measurements, it's at least 5x faster ! (your results may vary). I am looping through 12.000 files, if your have a smaller dataset, it might not worth the trouble.

If you need extra file attributes (like size) you can use those functions :

##class(%File).GetFileDateModified(filepath)
##class(%File).GetFileSize(filepath)

Even with those calls in place, it's still faster than FileSet.

Certifications & Credly badges:
Norman has no Certifications & Credly badges yet.
Global Masters badges:
Norman has no Global Masters badges yet.
Followers:
Following:
Norman has not followed anybody yet.