· Jan 30 5m read

Converting generic data into FHIR with IRIS-FHIRfy


In 2021, I participated as an InterSystems mentor in a hackathon, where a newcomer to FHIR asked me if there was a tool to transform generic JSON data containing basic patient information into FHIR format. I informed her that I didn't know anything like that, unfortunately.

But that idea stays in my mind...

Several months later, in 2022, I came up with an idea to experiment: to train a named entity recognition (NER) to identify FHIR elements into generic texts. The training involved synthetic FHIR data generated by Synthea and the spaCy Python library.

While I achieved good initial results in recognizing basic patient FHIR elements, such as names and location information, I encountered a challenge in structuring these elements into a valid JSON format. Consequently, I decided to put the project on hold.

However, a significant development occurred in late 2022: the popularity of Language Model Models (LLMs) pioneered by OpenAI, notably Chat GPT.

New possibilities with LLMs

LLMs have demonstrated impressive capabilities in emulating the way people generate text, including computer programs. Today, there are several programming assistants like GitHub Copilot and Codium.

A few months ago, me and my colleagues @Henrique Dias and @Henry Pereira conducted experiments using OpenAI LLM services to attempt to answer analytical questions by generating Python code that utilizes the IRIS FHIR REST API. You can find more details about these experiments in this post.

Now, we have decided to revisit the paused project of converting generic healthcare data into FHIR format, this time harnessing the power of LLMs. This was the birth of the IRIS-FHIRfy project.

How IRIS-FHIRfy works?

IRIS-FHIRfy employs prompt engineering techniques on LLMs, such as role-playing and prompt chaining. The project is divided into three prompts, each aimed at breaking down the problem into less complex steps. These prompts build upon the output of the previous one, are supposed to enhance reasoning and provide a clear path to implementation.

Let's details each of them:

  1. Technical Report Generation

The first prompt starts by generating a technical report from samples of data that need to be exchanged. This report serves as a foundational step for understanding the data and its requirements.

  1. Implementation Suggestions

The second prompt takes the technical report as input and provides high-level implementation suggestions. It outlines how to go about converting the raw data into the FHIR standard.

  1. Code Implementation in Python

The final prompt takes the implementation suggestions and provides a code implementation in Python. This code serves as a starting point for developers and can be refined further to suit specific project needs. It enables the conversion of the original raw data into the FHIR standard.

The user is provided with the response to each of the prompts.

Here's a visual representation of the entire process:


You can check out the prompts in the following classes: RawDataAnalyzer, SolutionSuggestion and SolutionModuleGenerator.

Currently, we are using the LLM models gpt-3.5-turbo from OpenAI, and gemini-pro from Google Gemini. The user chooses which one to use.

In contrast to the initial LLM project, where the generated Python code was expected to be flawless, this project adopts a more relaxed approach. Here, the generated code is not expected to be perfect but rather serves as a valuable starting point for developers who work with the FHIR standard for healthcare data exchange.

You can check out here the entire process of using IRIS-FHIRfy to analyze, suggest, and implement a solution for transforming simple CSV data into FHIR format using a sample dataset. Additionally, you can follow the process of refining the code generated by the LLM and leveraging it to convert the original raw data into FHIR, subsequently persisting it into IRIS.

How to use IRIS-FHIRfy

You have three different ways to utilize the IRIS-FHIRfy project:

Example of use with IRIS Interoperability

You can check out a sample of converting a simple CSV to FHIR running in an IRIS Interoperability here.

The following is a record of it:


This project is currently in an experimental phase, and as such, it is expected to produce incorrect or unusual results. Our primary objective at this stage is to test the fundamental concept of harnessing the capabilities of LLMs to assist developers in converting generic healthcare data into the FHIR standard.

It's worth noting that we have only worked with very basic and straightforward structured data thus far.

While expecting code to convert unstructured generic healthcare data, such as clinical notes, may be a challenging endeavor, the technical analysis report provided by the tool can still be valuable for developers seeking to understand the data they are working with.

Furthermore, it's important to emphasize that certain critical topics, such as data privacy and security, are not addressed within the scope of this project. These areas must be addressed in future research and development.

Closing Thoughts

We hope you found the concept of this project intriguing, and we encourage you to give it a try with your own data.

If you encounter any results that you believe could be improved, we would greatly appreciate your valuable feedback. Your input will help us refine our models and improve the project's outcomes.

Thank you for taking the time to read this article. Your interest is greatly appreciated!

Discussion (2)2
Log in or sign up to continue

Hi David.

Thank you for your interest int the project.

I didn't try it in GitHub Codespaces, but I just test it in my local Windows PC and it worked:

As you can see, the command in worked for me. I like to use this flags to get more information on possible issues.

But, I changed such a commnad to just docker-compose up -d, once this command automatically builds the image if it does not exists.

Thank you for your feedback, really appreciate it!