FileSpec wildcard question
For the FIleSpec on a file service can you use a multiple wildcard values like *CLINICALENCOUNTER*.xml?
It seems like you are restricted to a single wildcard value.
Comments
I just tested this using the standard HL7 File Service and was able to use multiple wildcards to select files. In my testing I used a File Spec of *Test*.hl7 and the service picked up both RadTest01.hl7 and LabTest01.hl7. I also tried a File Spec of LabTest*.hl7;RadTest*.hl7 with the same net result.
The file adapters all use the same core methods for searching the specified directory path, so it should work unless you're using a custom service/adapter.
What version of Ensemble/HSConnect are you using?
Hello,
I am using this format TEST_PID*_NDCA_Drug_Utilization_*_UD.txt in FileSpec to pick up the files like TEST_PID2230_NDCA_Drug_Utilization_20200815_UD.txt. But the interface also pick up the file like TEST_PID2230_NDCA_Drug_Utilization_20200815_RL1_UD.txt. How to make it pick up the format like TEST_PID2230_NDCA_Drug_Utilization_20200815_UD.txt only? I use Ensemble 2018.1.3 version. Thanks.
If you know that there will always be 8 characters at the position of the 2nd "*" character, you should be able to use this pattern instead:
TEST_PID*_NDCA_Drug_Utilization_????????_UD.txt
yes, it is always a date like YYYYMMDD. I will try to use???????? instead of *. Thanks Jeffrey!
Okay, that setting eventually worked. I guess the services takes many minutes when there is large number of documents sitting in the OS folder.
One more question - is there a way to use negation on the file spec - e.g. create another service that looks for documents that don't match the specific pattern? I don't see anything obvious reading the code which means I would need to create a custom service to do this.
I would like to have multiple services handling the same OS folder.
I could use Pool Size > 1 but then there is conflict between the jobs when they try to access the same file.
I recommend locking as a solution to that problem.
Create a table/class/global which holds {filename, lock, ownerJobId}.
All service jobs execute the same file search, take first filename, check the lock table. If it's empty - write into it and start processing the file.
If lock table has the entry take the next file, till you find one without lock.
After file is processed delete/move it and remove the entry from lock table.
On job shutdown purge the table records associated with job id.
This way you can scale jobs easily.
Eduard, that's a good suggestion, but one issue I see is that if you are processing millions of documents, that's a lot of database overhead. Using a simple global reference with locks would work better as you would only have n files persisted up to the Pool Size.
My next challenge is to determine a better way for the file service to pick off files from the OS folder without loading all the files into a potentially massive result set which can blow out local memory and in extreme cases quickly blow out CacheTemp DB size.
Eduard, that's a good suggestion, but one issue I see is that if you are processing millions of documents, that's a lot of database overhead. Using a simple global reference with locks would work better as you would only have n files persisted up to the Pool Size.
The solution I outlined at all times contains exactly up to PoolSize records, so I don't think it's a very big overhead. You can lock filenames I suppose, why not?
My next challenge is to determine a better way for the file service to pick off files from the OS folder without loading all the files into a potentially massive result set which can blow out local memory and in extreme cases quickly blow out CacheTemp DB size.
Ensemble File inbound adapter uses FileSet from %File class. This query uses $zsearch to iterate over files in a directory and populates a ppg with results at once. Calling Next on that result set only moves the ppg key. You can rewrite FileSet query to advance $zsearch on a Next call. Don't know how it would affect performance though.