Article Luana Machado · Jun 9 12m read

Building a Conversational Epidemiological Intelligence Platform on InterSystems IRIS for Health

1. Introduction

Epidemiological surveillance is one of the foundational pillars of public health. Régis Júnior et al. (2026) define it as a continuous system of data collection, analysis, interpretation and dissemination of health events — a function whose effectiveness depends critically on the quality of information systems, data analysis capacity, and coordination between different levels of care. The strategic importance of this function was made undeniable during the COVID-19 pandemic: the World Health Organization (2021) documented that countries with more robust surveillance systems demonstrated consistently better outcomes, while regional disparities in surveillance capacity translated directly into heterogeneous responses and measurable differences in morbidity and mortality.

Although healthcare systems now collect vast amounts of standardized clinical data, the challenge is transforming that data into timely, actionable insights. Traditional epidemiological reporting often depends on manual analysis and custom database queries, creating delays that can hinder rapid public health responses.

EpInsights addresses this gap by combining InterSystems IRIS for Health, Embedded Python clinical vectorization, IRIS Vector Search, and a LangChain4j conversational agent. The platform enables epidemiologists and health managers to ask natural-language questions and instantly receive geospatial heatmaps, regional rankings, similar-cluster detection, and AI-generated recommendations, all grounded in real FHIR data and delivered within seconds.


 

2. Architecture Overview

EpInsights is organized into three layers with clearly separated responsibilities.

The data layer stores FHIR R4 resources in InterSystems IRIS for Health, populated with a synthetic epidemiological dataset covering diseases including dengue, COVID-19, yellow fever, hantavirus, scorpion stings, and gastrointestinal syndromes across ten Brazilian cities. Embedded Python runs natively inside IRIS to compute regional clinical vectors, which are persisted in a dedicated IRIS class and queried using Vector Search. Custom ObjectScript stored procedures handle JSON extraction from FHIR resource strings.

The backend layer is a Quarkus application using LangChain4j that orchestrates a multi-agent pipeline. REST endpoints receive natural language questions, route them through specialized agents, execute validated SQL against the IRIS FHIR repository, and return structured responses.

The frontend layer is an Angular application with a Leaflet heatmap and NG-ZORRO components. It presents the analysis as a live dashboard that updates the map, the epicenter panel, the similar regions table, and the summary card simultaneously from each conversational turn.

 


3. Data Layer — Connecting to InterSystems FHIR Server

The application uses InterSystems FHIR Server as the source of all clinical data — patients, observations, encounters, locations, and conditions — stored as native FHIR resources.


4. Regional Clinical Vectorization — Embedded Python and Vector Search

Vector Search goes beyond answering simple SQL questions such as "How many dengue cases are in Recife?". By representing regional clinical data as semantic vectors, the platform can identify cities with similar epidemiological characteristics and disease progression patterns. This enables the discovery of regions that resemble an outbreak epicenter, helping health authorities monitor potential expansion areas and act proactively before case numbers escalate.

4.1 The Vectorization Process

dc.RegionClinicalVectorGenerator runs natively inside IRIS using Embedded Python. It queries clinical data across the last 30 days, crossing Observation, Encounter, Location, Condition, and DiagnosticReport resources:

sql = """
SELECT
    l.addressCity,
    l.addressState,
    GetProp(GetJSON(ro.ResourceString,'code'),'text') AS ObservationName,
    o.code AS ObservationCode,
    c.code AS ConditionCode,
    GetProp(GetJSON(rd.ResourceString,'code'),'text') AS DiagnosticReportName,
    COUNT(*) AS Occurrences
FROM HSFHIR_X0001_S.OBSERVATION o
INNER JOIN HSFHIR_X0001_S.ENCOUNTER e
    ON o.encounter = 'Encounter/' || e._id
INNER JOIN HSFHIR_X0001_S.LOCATION l
    ON e.location LIKE '%' || l._id || '%'
LEFT JOIN HSFHIR_X0001_S.CONDITION c
    ON c.patient = o.patient
...
"""

For each row returned, clinical features are extracted and categorized:

if observation_name:
    feature = "OBS_NAME:" + observation_name.lower()
    region_symptoms[region_key][observation_name] += occurrences

if condition_code:
    feature = "COND_CODE:" + condition_code.lower()
    region_disease_codes[region_key][condition_code] += occurrences

Each feature is hashed into a 384-dimensional vector using MD5:

def feature_index(feature):
    digest = hashlib.md5(feature.encode("utf-8")).hexdigest()
    return int(digest, 16) % VECTOR_DIMENSION

The vector is then weighted by occurrence frequency and normalized by Euclidean norm, producing a unit vector that represents the clinical fingerprint of each city — independent of absolute case volumes:

norm = math.sqrt(sum(v * v for v in vector))
if norm > 0:
    vector = [v / norm for v in vector]

4.2 Persistence and the Vector Index

The normalized vector is persisted in dc.RegionClinicalVector, an IRIS persistent class with a %Library.Vector(LEN = 384) property:

Class dc.RegionClinicalVector Extends %Persistent
{
    Property AddressCity      As %String(MAXLEN = 120);
    Property AddressState     As %String(MAXLEN = 10);
    Property ClinicalVector   As %Library.Vector(LEN = 384);
    Property IdentifiedDiseases As %String(MAXLEN = "");
    Property MainSymptoms     As %String(MAXLEN = "");
    Property ClinicalSummary  As %String(MAXLEN = "");
    ...
}

Insertion uses TO_VECTOR():

insert_stmt = iris.sql.prepare("""
    INSERT INTO dc.RegionClinicalVector
    (AddressCity, AddressState, VectorVersion,
     ClinicalVector, FeatureCount, TotalOccurrences, ...)
    VALUES (?, ?, ?, TO_VECTOR(?, DOUBLE), ?, ?, ...)
""")

4.3 Similarity Queries with VECTOR_COSINE

Once vectors are stored, similarity queries become a single SQL statement using VECTOR_COSINE:

SELECT TOP 10
    other.AddressCity  AS SimilarCity,
    other.AddressState AS SimilarState,
    other.MainSymptoms,
    other.ClinicalSummary,
    VECTOR_COSINE(target.ClinicalVector, other.ClinicalVector) AS Similarity
FROM dc.RegionClinicalVector target
JOIN dc.RegionClinicalVector other
    ON  other.VectorVersion = target.VectorVersion
    AND NOT (other.AddressCity  = target.AddressCity
         AND other.AddressState = target.AddressState)
WHERE target.AddressCity  = :city
  AND target.AddressState = :state
ORDER BY Similarity DESC

A city with a dengue-and-fever-dominated profile will score high cosine similarity with other cities showing the same combination — even if one has 67 cases and another has 12. This is the semantic dimension that pure SQL cannot provide.


5. Backend Layer — Quarkus and the LangChain4j Agent Pipeline

5.1 The Multi-Agent Architecture

The backend implements a pipeline of four specialized agents, each with a single clearly defined responsibility. This separation prevents any single agent from becoming responsible for too many concerns, which would degrade both the quality of outputs and the ability to debug failures.

ImproveAskAgent 

The first agent in the pipeline. It receives the raw user question and rewrites it to guarantee that the downstream agent will request geographically structured data suitable for the dashboard:

@RegisterAiService
public interface ImproveAskAgent {
    @SystemMessage("""
    You are ImproveAskAgent.
    Your job is to rewrite user questions into clear regional
    FHIR epidemiological analysis requests.
    The improved question must explicitly ask for:
    - city, state, total cases, main symptoms, latitude, longitude
    Rules:
    - Never answer the question.
    - Never generate SQL.
    - Return only the improved question as plain text.
    """)
    String improve(String question);
}

This ensures that a casual question like "tell me about dengue" becomes "Show dengue cases aggregated by city and state for the last 30 days, including total cases, main symptoms, latitude and longitude for each region."

ClinicalFhirAgent

The main orchestrator, registered with four tools: FhirSqlTool, AISummaryTool, TerminologyTool, and DateTools. It receives the improved question and decides which tools to call and in what order, returning a structured AskDashboardResponse:

@RegisterAiService(tools = {
    FhirSqlTool.class, AISummaryTool.class,
    TerminologyTool.class, DateTools.class
})
public interface ClinicalFhirAgent {
    @SystemMessage("""
    You are a FHIR IRIS assistant.
    Steps:
      1 - Generate SQL query with FhirSqlTool.
      2 - Analyze results and generate a final answer.
    """)
    AskDashboardResponse ask(String question);
}

SQLFhirBuilderAgent

A specialized SQL generation agent with deep knowledge of the IRIS FHIR schema. Its system prompt defines the allowed schemas and tables, FHIR reference normalization rules, the requirement to use LIKE for coded fields, coordinate extraction patterns, and a strict output contract for dashboard queries:

-- Dashboard output contract enforced by the agent:
l.addressCity  AS city,
l.addressState AS state,
COUNT(DISTINCT c._id) AS totalCases,
MAX(c.code) AS mainSymptoms,
CAST(GetProp(GetJSON(r.ResourceString,'position'),'latitude')  AS DOUBLE) AS latitude,
CAST(GetProp(GetJSON(r.ResourceString,'position'),'longitude') AS DOUBLE) AS longitude

AISummaryAgent

Receives the question and the query results and produces a structured AISummary with a narrative text and a list of actionable recommendations.


5.2 The SQL Generation Pipeline with Retry

FlowSqlExecuteAgents orchestrates the most critical step: generating SQL that the IRIS database will actually accept. The pipeline has three stages and a retry loop:

public String buildSql(String question) {
    List<TerminologyResult> terminology =
        terminologyTool.discoverTerminology(question);

    while (tentativeSql < 5 && !sqlValid) {
        tentativeSql++;
        sql = sqlFhirBuilderAgent.buildSql(question, terminology);
        try {
            sqlValidator.validateReadOnlySelect(sql);
            sqlValid = true;
        } catch (Exception ex) {
            question = String.format("""
                The following SQL is invalid: %s
                Error: %s
                Generate a corrected SQL.
                """, sql, ex.getMessage());
        }
    }
    return sql;
}

Stage 1 — Terminology Discovery

Before generating any SQL, TerminologyTool queries the actual FHIR data to discover which clinical codes and text values exist in the repository:

SELECT
    ResourceType,
    GetJSON(GetJSON(ResourceString,'code'),'coding') AS CodingJson,
    GetProp(GetJSON(ResourceString,'code'),'text')   AS TextValue
FROM HSFHIR_X0001_R.Rsrc
WHERE ResourceType IN ('Condition','Observation')
GROUP BY
    ResourceType,
    GetJSON(GetJSON(ResourceString,'code'),'coding'),
    GetProp(GetJSON(ResourceString,'code'),'text')

This prevents the SQL builder from fabricating codes. The agent knows that dengue is represented as A90 in this specific dataset because it read that from the database — not from its training data.

Stage 2 — SQL Generation

SQLFhirBuilderAgent receives the question and the real terminology context and generates a single SQL statement following all the rules defined in its system prompt.

Stage 3 — Validation

SqlValidator performs two checks:

  1. Safety check — only SELECT statements are allowed, forbidden keywords are blocked, and FHIR reference join patterns are validated.
  2. Compilation check — the query is executed with maxRows=1 against the live IRIS instance. If the database rejects it, the error message is fed back to the SQL builder agent with full context for correction. The loop retries up to five times.
private void validateSqlCompiles(String sql) {
    jdbi.useHandle(handle ->
        handle.createQuery(sql)
              .setMaxRows(1)
              .mapToMap()
              .findFirst()
    );
}

This two-stage validation — safety rules plus live compilation — ensures that no malformed or dangerous SQL ever reaches production execution.


5.3 The Similar Regions Endpoint

The similar regions query is handled separately from the main agent pipeline through a dedicated REST endpoint that calls RegionClinicalVectorTool directly:

@GET
@Path("/similar")
public List<SimilarRegionData> findSimilarRegions(
        @QueryParam("city")  String city,
        @QueryParam("state") String state) {
    return regionClinicalVectorTool.findSimilarRegions(
        city.toUpperCase().trim(),
        state.toUpperCase().trim()
    );
}

The tool executes the VECTOR_COSINE query against dc.RegionClinicalVector via JDBI. This is a static, predefined query — it does not go through the SQL generation pipeline. The separation is deliberate: similarity queries are always the same structure regardless of the disease being analyzed, so there is no reason to involve the LLM in their construction.


5.4 Conversation Memory

Each session maintains independent conversational context through ChatMemoryProviderFactory, which associates a MessageWindowChatMemory of 80 messages with each chatId:

@ApplicationScoped
public class ChatMemoryProviderFactory implements Supplier<ChatMemoryProvider> {
    private final Map<String, ChatMemory> memories = new ConcurrentHashMap<>();

    @Override
    public ChatMemoryProvider get() {
        return chatId -> memories.computeIfAbsent(
            chatId.toString().trim(),
            k -> MessageWindowChatMemory.withMaxMessages(80)
        );
    }
}

This allows follow-up questions to build on previous context within the same session without re-sending the full history from the client.


6. Frontend Layer — Angular Dashboard

The Angular frontend presents two interaction paradigms simultaneously: a conversational chat interface and a geospatial epidemiological dashboard. Both update from the same API response.


7. End-to-End Request Flow

A complete request cycle for "Show dengue cases by region this month" proceeds as follows:

User Question
     │
     ▼
ImproveAskAgent          → rewrites question for structured output
     │
     ▼
ClinicalFhirAgent        → orchestrates tool calls
     │
     ├── TerminologyTool  → discovers real codes from FHIR data
     │
     ├── SQLFhirBuilderAgent → generates SQL
     │
     ├── SqlValidator     → safety + live compilation check (up to 5 retries)
     │
     ├── SqlExecutor      → runs validated query against IRIS
     │
     └── AISummaryAgent   → produces narrative + recommendations
           │
           ▼
     AskDashboardResponse → Angular dashboard updates map, table, summary


8. Conclusion

EpiInsights demonstrates that InterSystems IRIS for Health is more than a FHIR repository. When Embedded Python, Vector Search, and the FHIR SQL layer are combined with a disciplined AI agent architecture, the same platform that stores clinical records can power natural language outbreak detection, regional similarity analysis, and conversational epidemiological intelligence.

The architecture is deliberately auditable. Every answer the system produces can be traced to a specific SQL query against real FHIR data, validated before execution and logged for review. The LLM never accesses the database directly and never produces answers from inference alone — it orchestrates tools that retrieve real data and interprets the results.

For health authorities, this translates into a system they can trust: not because it claims to be trustworthy, but because its reasoning is visible, its data sources are explicit, and its outputs can be independently verified in the IRIS Management Portal at any time.

 

Thank you for reading our article!

If you liked, please, check out our application and give us some support! 😊

https://openexchange.intersystems.com/package/EpInsights

EpInsights

 

References:

RÉGIS JÚNIOR, J. F. et al. Vigilância Epidemiológica Como Instrumento de Gestão em Saúde Pública. Revista Tópicos, v. 4, n. 33, p. 1-21, 2026. DOI: 10.70773/revistatopicos/779251372.

Comments

Iryna Mologa · Jun 10

Hi Luana,

Your video is up on the InterSystems Developers YouTube 🚀

⏯️ EpInsights

Check it out and enjoy☺️

0