Article
· Oct 9 4m read

Building a FHIR Vector Repository with InterSystems IRIS and Python through the IRIStool module

Introduction

In a previous article, I presented the IRIStool module, which seamlessly integrates the pandas Python library with the IRIS database. Now, I'm explaining how we can use IRIStool to leverage InterSystems IRIS as a foundation for intelligent, semantic search over healthcare data in FHIR format.

This article covers what I did to create the database for another of my projects, the FHIR Data Explorer. Both projects are candidates in the current InterSystems contest, so please vote for them if you find them useful.

You can find them at the Open Exchange:

In this article we'll cover:

  • Connecting to InterSystems IRIS database through Python
  • Creating a FHIR-ready database schema
  • Importing FHIR data with vector embeddings for semantic search

Prerequisites

Install IRIStool from the IRIStool and Data Manager GitHub page.

1. IRIS Connection Setup

Start by configuring your connection through environment variables in a .env file:

IRIS_HOST=localhost
IRIS_PORT=9092
IRIS_NAMESPACE=USER
IRIS_USER=_SYSTEM
IRIS_PASSWORD=SYS

Connect to IRIS using IRIStool's context manager:

from utils.iristool import IRIStool
import os
from dotenv import load_dotenv

load_dotenv()

with IRIStool(
    host=os.getenv('IRIS_HOST'),
    port=os.getenv('IRIS_PORT'),
    namespace=os.getenv('IRIS_NAMESPACE'),
    username=os.getenv('IRIS_USER'),
    password=os.getenv('IRIS_PASSWORD')
) as iris:
    # IRIStool manages the connection automatically
    pass

2. Creating the FHIR Schema

At first, create a table to store FHIR data, then while extracting data from FHIR bundles, create tables with vector search capabilities for each of the extracted FHIR resources (like Patient, Osservability, etc.). 

IRIStool simplifies table and index creation!

FHIR Repository Table

# Create main repository table for raw FHIR bundles
if not iris.table_exists("FHIRrepository", "SQLUser"):
    iris.create_table(
        table_name="FHIRrepository",
        columns={
            "patient_id": "VARCHAR(200)",
            "fhir_bundle": "CLOB"
        }
    )
    iris.quick_create_index(
        table_name="FHIRrepository",
        column_name="patient_id"
    )

Patient Table with Vector Support

# Create Patient table with vector column for semantic search
if not iris.table_exists("Patient", "SQLUser"):
    iris.create_table(
        table_name="Patient",
        columns={
            "patient_row_id": "INT AUTO_INCREMENT PRIMARY KEY",
            "patient_id": "VARCHAR(200)",
            "description": "CLOB",
            "description_vector": "VECTOR(FLOAT, 384)",
            "full_name": "VARCHAR(200)",
            "gender": "VARCHAR(30)",
            "age": "INTEGER",
            "birthdate": "TIMESTAMP"
        }
    )
    
    # Create standard indexes
    iris.quick_create_index(table_name="Patient", column_name="patient_id")
    iris.quick_create_index(table_name="Patient", column_name="age")
    
    # Create HNSW vector index for similarity search
    iris.create_hnsw_index(
        index_name="patient_vector_idx",
        table_name="Patient",
        column_name="description_vector",
        distance="Cosine"
    )

3. Importing FHIR Data with Vectors

Generate vector embeddings from FHIR patient descriptions and insert them into IRIS easily:

from sentence_transformers import SentenceTransformer

# Initialize transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Example: Process patient data
patient_description = "45-year-old male with hypertension and type 2 diabetes"
patient_id = "patient-123"
# Generate vector embedding
vector = model.encode(patient_description, normalize_embeddings=True).tolist()

# Insert patient data with vector
iris.insert(
    table_name="Patient",
    patient_id=patient_id,
    description=patient_description,
    description_vector=str(vector),
    full_name="John Doe",
    gender="male",
    age=45,
    birthdate="1979-03-15"
)

4. Performing Semantic Search

Once your data is loaded, you can perform similarity searches:

# Search query
search_text = "patients with diabetes"
query_vector = model.encode(search_text, normalize_embeddings=True).tolist()

# define sql query
query = f"""
        SELECT TOP 5
            patient_id,
            full_name,
            description,
            VECTOR_COSINE(description_vector, TO_VECTOR(?)) as similarity
        FROM Patient
        ORDER BY similarity DESC
    """
# define query parameters
parameters = [str(query_vector)]

# Find similar patients using vector search
results = iris.query(query, parameters)

# print DataFrame data
if not results.empty:
    print(f"{results['full_name']}: {results['similarity']:.3f}")

Conclusion

  • IRIStool simplifies IRIS integration with intuitive Python methods for table and index creation
  • IRIS supports hybrid SQL + vector storage natively, enabling both traditional queries and semantic search
  • Vector embeddings enable intelligent search across FHIR healthcare data using natural language
  • HNSW indexes provide efficient similarity search at scale

This approach demonstrates how InterSystems IRIS can serve as a powerful foundation for building intelligent healthcare applications with semantic search capabilities over FHIR data.

Discussion (0)1
Log in or sign up to continue