Question
· Feb 23, 2019

FileSpec wildcard question

For the FIleSpec on a file service can you use a multiple wildcard values like *CLINICALENCOUNTER*.xml?

It seems like you are restricted to a single wildcard value.

Discussion (8)3
Log in or sign up to continue

I just tested this using the standard HL7 File Service and was able to use multiple wildcards to select files. In my testing I used a File Spec of *Test*.hl7 and the service picked up both RadTest01.hl7 and LabTest01.hl7. I also tried a File Spec of LabTest*.hl7;RadTest*.hl7 with the same net result.

The file adapters all use the same core methods for searching the specified directory path, so it should work unless you're using a custom service/adapter.

What version of Ensemble/HSConnect are you using?

Hello,

I am using this format TEST_PID*_NDCA_Drug_Utilization_*_UD.txt in FileSpec to pick up the files like TEST_PID2230_NDCA_Drug_Utilization_20200815_UD.txt. But the interface also pick up the file like TEST_PID2230_NDCA_Drug_Utilization_20200815_RL1_UD.txt. How to make it pick up the format like TEST_PID2230_NDCA_Drug_Utilization_20200815_UD.txt only? I use Ensemble 2018.1.3  version. Thanks.

Okay, that setting eventually worked. I guess the services takes many minutes when there is large number of documents sitting in the OS folder.

One more question - is there a way to use negation on the file spec - e.g. create another service that looks for documents that don't match the specific pattern? I don't see anything obvious reading the code which means I would need to create a custom service to do this.

I would like to have multiple services handling the same OS folder.

I could use Pool Size > 1 but then there is conflict between the jobs when they try to access the same file.

I recommend locking as a solution to that problem.

Create a table/class/global which holds {filename, lock, ownerJobId}.

All service jobs execute the same file search, take first filename, check the lock table. If it's empty - write into it and start processing the file.

If lock table has the entry take the next file, till you find one without lock.

After file is processed delete/move it and remove the entry from lock table.

On job shutdown purge the table records associated with job id.

This way you can scale jobs easily.

Eduard, that's a good suggestion, but one issue I see is that if you are processing millions of documents, that's a lot of database overhead. Using a simple global reference with locks would work better as you would only have n files persisted up to the Pool Size. 

My next challenge is to determine a better way for the file service to pick off files from the OS folder without loading all the files into a potentially massive result set which can blow out local memory and in extreme cases quickly blow out CacheTemp DB size.

Eduard, that's a good suggestion, but one issue I see is that if you are processing millions of documents, that's a lot of database overhead. Using a simple global reference with locks would work better as you would only have n files persisted up to the Pool Size. 

The solution I outlined at all times contains exactly up to PoolSize records, so I don't think it's a very big overhead. You can lock filenames I suppose, why not?

My next challenge is to determine a better way for the file service to pick off files from the OS folder without loading all the files into a potentially massive result set which can blow out local memory and in extreme cases quickly blow out CacheTemp DB size.

Ensemble File inbound adapter uses FileSet from %File class. This query uses $zsearch to iterate over files in a directory and populates a ppg with results at once. Calling Next on that result set only moves the ppg key. You can rewrite FileSet query to advance $zsearch  on a Next call. Don't know how it would affect performance though.