OMOP Odyssey - FHIR® to OMOP ETL (Calypso’s Island)

Article

sween · Feb 18 5m read

Professional Grade FHIR® to OMOP Transformation

Lets zero in on the use of the word professional and put it into some context. It was written by industry experts who wrapped it up in a fee based service with support and some guardrails around flexible options to contribute to its behavior. I feel its an important distinction to either an open or home grown solution (though it may do the same thing) to consider it to scale or provide mission critical value on the other side of it. The OHDSI community has an entire competency around the subject of ETL to the OMOP database, WhiteRabbit for instance that analyzes the OMOP database and Rabbit in a Hat to help design the ETL. I'd short the stock as a bet the community tooling was applied to the InterSystems transformation stack to refine the offering.

Here I go attempting to make a data transformation interesting to a community that probably lives and breathes them, but for sure is a quick start to getting to the front door of the OHDSI Community and the wealth and "Weapons of Mass Solution" to meaningful large scale analytics on your (or somebody elses) healthcare data.

Bulk FHIR

The ingestion standard for the pipeline is Bulk FHIR Export, take a peak at how InterSystems has implemented the Bulk FHIR Coordinator , which the resulting payload for an export is a zipfile containing ndjson files with FHIR resources, one per line.

You can do this yourself with a single exported resource file in json as an example you can use inside something programatically...

Generate Simple BulkFHIR.zip

cd hospitals/
jq -c . hospital*.json > hospitals.ndjson
zip -r hospitals.zip hospitals.ndjson

...or we can simplify it to get the ball rolling and generate the a synthetic payload using Synthea and @Dmitry Zasypkin 's solution out on github, which takes care of a 👣 🔫 to correct a referential mismatch between synthea and some FHIR Servers.

So lets follow the directions and generate a a small population of 10 patients for landing in our OMOP Database and highlight the service.

git clone https://github.com/dmitry-zasypkin/synthea-ndjson
cd synthea-ndjson
docker build . -t synthea
docker run --rm -v ./output:/synthea/output -it synthea -p 10
chmod +x patch-synthea-ndjsons.sh
sudo chmod -R 777 output # may need this
./patch-synthea-ndjsons.sh output/fhir
zip -j fhir1.zip output/fhir/*.*

If everything goes well, you should staring at a bulk fhir export zip called `fhir1.zip` in your current working directory.

One thing to note here is that bulk fhir export is actually a supported export option in `synthea.properties`...

exporter.fhir.bulk_data=true

Ok, so lets send it. Remember the bucket `arn:aws:s3:::omop-fhir` we had setup previous? Lets upload the payload to commence loading our OMOP database.

Checking our work:

Inspect the Transform

Back in the InterSystems OMOP Cloud Portal, navigate to "Metrics" in the side panel and lets inspect the transformation results, hopefully we have some.

Took just under a minute for the transform to complete, but it did, and if you highlight the run, you will get the stats and report card on the import of the data, looks like we have two different conditions occur, some errors and some warnings.

Inspecting the OMOP table, you can validate that the number of landed "persons" is 15, and that 13 resources were in error and could not be landed in the ETL. To see what happened, navigate to the "Errors" navigation item in the InterSystems Portal for the OMOP Service. Definitely not the fun way to check the OMOP Database, but lets take a quick peak using the SQL Browser and confirm 15 Patients (aka OMOP persons) are in the database.

Highlighting the run in the top pane will reveal two tables below, one for Data Errors and One for Warnings. For an exhaustive explanation to Errors and Warnings, The InterSystems OMOP Docs can do a better job that I can.

Looks like a concept id was missing that is critical to the transform (via the OMOP Data Model), you can check for required fields pretty quickly for the table reference in the CDM documentation for Procedure.

Also, it appears we tried to post the transform tried to post a location twice, and thankful we did not.

As for the warnings, it appears Synthea filled a quantity field with a string, but the resource persisted anyways as seen here with `{#}`

If you are looking for the mapping under the hood to understand it, you can find it in the InterSystems Documentation (its easy on the eyes and the understanding is straight forward, at least that link is :).

Transformation Options

In the Configurations tab (and also up front to the provisioning) there were a few options to take a look at in regards to the control of the transformation. The first two options have to do with source data mapping options, and use FHIRPath expressions to allow you control of the mapping. I found the fhirpath.js demo to be educational and useful mining FHIRPaths on FHIR resource json.

Person Id Options

These options are important and sort of tough to see the value when using synthetic data. The fhir resource id is an id that never changes and identifies the resource itself, making it pointless to clinical meaning, but important to the operation of the fhir server. If you are familiar with HealthShare Patient Index and the concept of domain conflicts, the source system is just as important to the identifying if the patient is unique. These options allow you control over minimizing duplicates and using meaningful identifiers in the OMOP Database.

Filtering

Although he OMOP CDM databases is already classified as a pseudo-anonymised databases, there are organizations who may have a problem with some of the data being shipped to it. The filtering options send either entire Tables, or Fields within tables to the bit bucket.

Terminology Mapping

You can extend the OMOP Data model vocabularies and concepts by uploading csvs to the bucket key specified in the s3 configuration of your InterSystems OMOP Deployment. No need to re-hash this as the docs provide ample explanation and specification for the csv format.

https://docs.intersystems.com/services/csp/docbook/DocBook.UI.Page.cls?K...

Load It Up!

Lets get some data up in the InterSystems OMOP CDM and simulate a run of all the US states with a randomized population of of about 100-200 patients a state. The timings appear to be in the ~5 minute range per state, and this is running the lowest spec (trial) of the service available.

Now back to the OHDSI world, lets take a peak at the data from RStudio.

This OMOP CDM is a fully operational Death Star!