Enhancing FHIR Data Exploration with Local LLMs: Integrating IRIS and

Article

Pietro Di Leo · Oct 9 6m read

Open Exchange

#Artificial Intelligence (AI) #FHIR #Large Language Model (LLM) #Prompt Engineering #Python #InterSystems IRIS

Introduction

In my previous article, I introduced the FHIR Data Explorer, a proof-of-concept application that connects InterSystems IRIS, Python, and Ollama to enable semantic search and visualization over healthcare data in FHIR format, a project currently participating in the InterSystems External Language Contest.

In this follow-up, we’ll see how I integrated Ollama for generating patient history summaries directly from structured FHIR data stored in IRIS, using lightweight local language models (LLMs) such as Llama 3.2:1B or Gemma 2:2B.

The goal was to build a completely local AI pipeline that can extract, format, and narrate patient histories while keeping data private and under full control.

All patient data used in this demo comes from FHIR bundles, which were parsed and loaded into IRIS via the IRIStool module. This approach makes it straightforward to query, transform, and vectorize healthcare data using familiar pandas operations in Python. If you’re curious about how I built this integration, check out my previous article Building a FHIR Vector Repository with InterSystems IRIS and Python through the IRIStool module.

Both IRIStool and FHIR Data Explorer are available on the InterSystems Open Exchange — and part of my contest submissions. If you find them useful, please consider voting for them!

1. Setup with Docker Compose

To make the setup simple and reproducible, everything runs locally via Docker Compose.
A minimal configuration looks like this:

services:
  iris:
    container_name: iris-patient-search
    build:
      context: .
      dockerfile: Dockerfile
    image: iris-patient-search:latest  
    init: true
    restart: unless-stopped
    volumes:
      - ./storage:/durable
    ports:
      - "9092:52773"  # Management Portal / REST APIs
      - "9091:1972"   # SuperServer port
    environment:
      - ISC_DATA_DIRECTORY=/durable/iris
    entrypoint: ["/opt/irisapp/entrypoint.sh"]

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    ports:
      - 11424:11434
    volumes:
      - ./ollama_entrypoint.sh:/entrypoint.sh
    entrypoint: ["/entrypoint.sh"]

You can find all the configurations on the GitHub project page.

2. Integrating Ollama into the Workflow

Ollama provides a simple local REST API for running models efficiently on CPU, which makes it ideal for healthcare applications where privacy and performance matter.

To connect IRIS and Streamlit to Ollama, I implemented a lightweight Python class for streaming responses from the Ollama API:

import requests, json

class ollama_request:
    def __init__(self, api_url: str):
        self.api_url = api_url

    def get_response(self, content, model):
        payload = {
            "model": model,
            "messages": [
                {"role": "user", "content": content}
            ]
        }
        response = requests.post(self.api_url, json=payload, stream=True)

        if response.status_code == 200:
            for line in response.iter_lines(decode_unicode=True):
                if line:
                    try:
                        json_data = json.loads(line)
                        if "message" in json_data and "content" in json_data["message"]:
                            yield json_data["message"]["content"]
                    except json.JSONDecodeError:
                        yield f"Error decoding JSON line: {line}"
        else:
            yield f"Error: {response.status_code} - {response.text}"

This allows real-time streaming of model output, giving users the feeling of “watching” the AI write clinical summaries live in the Streamlit UI.

3. Preparing Patient Data for the LLM

Before sending anything to Ollama, data must be compact, structured, and clinically relevant.
For this, I wrote a class that extracts and formats the patient’s most relevant data — demographics, conditions, observations, procedures, and so on — into YAML, which is both readable and LLM-friendly.

Here’s the simplified process:

Select the patient row from IRIS via pandas
Extract demographics and convert them into YAML
Process each medical table (Conditions, Observations, etc.)
Remove unnecessary or redundant fields
Output a concise YAML document used as the LLM prompt context.

This string is then passed directly to the LLM prompt, forming the structured context from which the model generates the patient’s narrative summary.

4. Why Limit the Number of Records?

While building this feature, I noticed that passing all medical records often led small LLMs to become confused or biased toward older entries, losing focus on recent events.

To mitigate this, I decided to:

Include only a limited number records per category in reverse chronological order (most recent first)
Use concise YAML formatting instead of raw JSON
Normalize datatypes (timestamps, nulls, etc.) for consistency

This design helps small LLMs focus on the most clinically relevant data avoiding “prompt overload”.

💬 4. Generating the Patient History Summary

Once the YAML-formatted data is ready, the Streamlit app sends it to Ollama with a simple prompt like:

“You are a clinical assistant. Given the following patient data, write a concise summary of their medical history, highlighting relevant conditions and recent trends.”

The output is streamed back to the UI line by line, allowing the user to watch the summary being written in real time.
Each model produces a slightly different result, even with the same prompt — revealing fascinating differences in reasoning and style.

🧠 5. Comparing Local LLMs

To evaluate the effectiveness of this approach, I tested three lightweight open models available through Ollama:

Model	Parameters	Summary Style	Notes
Llama 3.2:1B	1B	Structured, factual	Highly literal and schema-like output
Gemma 2:2B	2B	Narrative, human-like	Most coherent and contextually aware
Gemma 3:1B	1B	Concise, summarizing	Occasionally omits details but very readable

You can find example outputs on this GitHub folder. Each patient summary highlights how model size and training style influence the structure, coherence, and detail level of the narrative.

Here’s a comparative interpretation of their behavior:

Llama 3.2:1B tends to reproduce the data structure verbatim, almost as if performing a database export. Its summaries are technically accurate but lack narrative flow — resembling a structured clinical report rather than natural text.
Gemma 3:1B achieves better linguistic flow but still compresses or omits minor details.
Gemma 2:2B strikes the best balance. It organizes information into meaningful sections (conditions, risk factors, care recommendations) while maintaining a fluent tone.

In short:

Llama 3.2:1B = factual precision
Gemma 3:1B = concise summaries
Gemma 2:2B = clinical storytelling

Even without fine-tuning, thoughtful data curation and prompt design make small, local LLMs capable of producing coherent, contextually relevant clinical narratives.

🔒 6. Why Local Models Matter

Using Ollama locally provides:

Full data control — no patient data ever leaves the environment
Deterministic performance — stable latency on CPU
Lightweight deployment — works even without GPU
Modular design — easy to switch between models or adjust prompts

This makes it an ideal setup for hospitals, research centers, or academic environments that want to experiment safely with AI-assisted documentation and summarization.

🧭 Conclusion

This integration demonstrates that even small local models, when properly guided by structured data and clear prompts, can yield useful, human-like summaries of patient histories.

With IRIS managing data, Python handling transformations, and Ollama generating text, we get a fully local, privacy-first AI pipeline for clinical insight generation.

Enhancing FHIR Data Exploration with Local LLMs: Integrating IRIS and Ollama