FileSpec wildcard question

Question

Question

David Foard · Feb 23, 2019

#Ensemble

For the FIleSpec on a file service can you use a multiple wildcard values like *CLINICALENCOUNTER*.xml?

It seems like you are restricted to a single wildcard value.

Discussion (8)2

Log in or sign up to continue

Jeffrey Drumm · Feb 24, 2019

I just tested this using the standard HL7 File Service and was able to use multiple wildcards to select files. In my testing I used a File Spec of *Test*.hl7 and the service picked up both RadTest01.hl7 and LabTest01.hl7. I also tried a File Spec of LabTest*.hl7;RadTest*.hl7 with the same net result.

The file adapters all use the same core methods for searching the specified directory path, so it should work unless you're using a custom service/adapter.

What version of Ensemble/HSConnect are you using?

1 0

Julie Wang · Aug 27, 2020

Hello,

I am using this format TEST_PID*_NDCA_Drug_Utilization_*_UD.txt in FileSpec to pick up the files like TEST_PID2230_NDCA_Drug_Utilization_20200815_UD.txt. But the interface also pick up the file like TEST_PID2230_NDCA_Drug_Utilization_20200815_RL1_UD.txt. How to make it pick up the format like TEST_PID2230_NDCA_Drug_Utilization_20200815_UD.txt only? I use Ensemble 2018.1.3 version. Thanks.

0 0

score 0 · Answer 1 · 2020-08-27T14:41:41-04:00

If you know that there will always be 8 characters at the position of the 2nd "*" character, you should be able to use this pattern instead:

TEST_PID*_NDCA_Drug_Utilization_????????_UD.txt

score 0 · Answer 2 · 2020-08-27T17:04:58-04:00

Julie Wang · Aug 27, 2020

yes, it is always a date like YYYYMMDD. I will try to use???????? instead of *. Thanks Jeffrey!

0 0

score 0 · Answer 3 · 2019-02-26T11:31:58-05:00

Okay, that setting eventually worked. I guess the services takes many minutes when there is large number of documents sitting in the OS folder.

One more question - is there a way to use negation on the file spec - e.g. create another service that looks for documents that don't match the specific pattern? I don't see anything obvious reading the code which means I would need to create a custom service to do this.

I would like to have multiple services handling the same OS folder.

I could use Pool Size > 1 but then there is conflict between the jobs when they try to access the same file.

score 0 · Answer 4 · 2019-02-26T11:47:05-05:00

I recommend locking as a solution to that problem.

Create a table/class/global which holds {filename, lock, ownerJobId}.

All service jobs execute the same file search, take first filename, check the lock table. If it's empty - write into it and start processing the file.

If lock table has the entry take the next file, till you find one without lock.

After file is processed delete/move it and remove the entry from lock table.

On job shutdown purge the table records associated with job id.

This way you can scale jobs easily.

score 0 · Answer 5 · 2019-03-01T08:31:06-05:00

Eduard, that's a good suggestion, but one issue I see is that if you are processing millions of documents, that's a lot of database overhead. Using a simple global reference with locks would work better as you would only have n files persisted up to the Pool Size.

My next challenge is to determine a better way for the file service to pick off files from the OS folder without loading all the files into a potentially massive result set which can blow out local memory and in extreme cases quickly blow out CacheTemp DB size.

score 0 · Answer 6 · 2019-03-01T14:11:00-05:00

Eduard, that's a good suggestion, but one issue I see is that if you are processing millions of documents, that's a lot of database overhead. Using a simple global reference with locks would work better as you would only have n files persisted up to the Pool Size.

The solution I outlined at all times contains exactly up to PoolSize records, so I don't think it's a very big overhead. You can lock filenames I suppose, why not?

My next challenge is to determine a better way for the file service to pick off files from the OS folder without loading all the files into a potentially massive result set which can blow out local memory and in extreme cases quickly blow out CacheTemp DB size.

Ensemble File inbound adapter uses FileSet from %File class. This query uses $zsearch to iterate over files in a directory and populates a ppg with results at once. Calling Next on that result set only moves the ppg key. You can rewrite FileSet query to advance $zsearch on a Next call. Don't know how it would affect performance though.