Article
· Feb 18 4m read

OMOP Odyssey - No Code CDM Exploration with Databricks AI/BI Genie (Island of Aeolus)

No Code Text to SQL w/ Databricks Genie and InterSystems OMOP

There are quite a few tools in the OHDSI universe to help you write extremely powerful queries to surface your insights, Atlas, Achiles, DQD, ShinyApp, OMOPSketch and notably SQLRender that helps translate between dialects, of which IRIS dialect is now supported... its a tiresome list and very incomplete.  But in my perusal of such tools to learn this community I did not find one that translates natural language so to speak to sql so I took Databricks AI/BI Genie for a spin and here is a subset of the initial results.

Databricks Workspace Setup

My compute cluster is dead simple, with the InterSystems JDBC Drivers included via a volume and attached.  I am on the lag wagon here a bit with the versions, but 3.7.1 did the trick here without issue.

Secondly, you'll need an init script to run on your cluster on start, I had this in a previous post in a couple of places, but here is a direct link to import_cloudsql_certfiicate.sh again.  Basically you paste the certificate from the InterSystems OMOP Portal, and mount the script on the volume and set it to run on startup like below.  On startup we should have the environment to connect to the database security.

 

OMOP to Catalog ELT

Table to Table Extract Load Transform was done via a notebook, on the population in the InterSystems OMOP CDM we loaded up previous, the pipeline takes about an hour to run, but admittedly I am pretty cheap and used Personal Compute to get the job done.  The drug_exposure table seemed to take the longest, and note the workaround in the notebook for the reserved word collision `DOMAIN` 👣🔫.  This rips the OMOP Database schema OMOPCDM54 into a catalog schema named the same.

 
If the screenshots are getting in the way, I included the python in the below teaser.

 
OMOP Genie - InterSystems OMOP ELT.py

Most likely an unneeded visual, but Ill include it anyway.



Databricks Genie AI/BI

Does not seem that I needed to be an expert to get started with my first Genie, all and all it seemed "point and prompt", with facilities to help assert things, learn with additional queries, provide some sample questions and tell the Genie how to behave...

Make a Wish

Not sure if that is where the marketing was going with Genie AI/BI but Ill abuse it to tee up our first question, which is pretty simple to start.

How many care sites are there ?

Okay then, cross check with the deployment.

Not earth shattering I know... but we did get the query and the count was accurate as the result.

Can you draw me a pie chart of the distribution of gender in the database ?

Pictures are good.

Here is one right out of the OMOP Queries text from the repository of real world examples.

 

Count number of people who have at least one observation period in the database that is longer than 365 days.

 

For this one I was more interested in looking at the difference in style from the text to code represented in the examples...
 

Databricks Genie OMOP Query Example
SELECT
  COUNT(DISTINCT `person_id`)
FROM
  (
    SELECT
      `person_id`
    FROM
      `omopcdm54`.`iccastage`.`observation_period`
    WHERE
      DATEDIFF(`observation_period_end_date`, `observation_period_start_date`) > 365
  ) AS subquery

 

    SELECT COUNT(DISTINCT person_ID) AS NUM_persons

    FROM observation_period

    WHERE observation_period_END_DATE - observation_period_START_DATE >= 365;

 


Difference in style, schema location is different, but the result was the  basically the same.

Keep in mind too this is less than interesting as this is "just the CDM" and not the "results of the CDM" which would be way more interesting, but to get to the results we'll have to dive back into the OHDSI community tooling.

 

Discussion (0)2
Log in or sign up to continue