Written by

Article Yu Han Eng · Oct 5, 2025 2m read

IRIS Audio Query - Query Audio with Text using InterSystems IRIS

#InterSystems IRIS #Artificial Intelligence (AI) #Embedded Python #Vector Search

With the rapid adoption of telemedicine, remote consultations, and digital dictation, healthcare professionals are communicating more through voice than ever before. Patients engaging in virtual conversations generate vast amounts of unstructured audio data, so how can clinicians or administrators search and extract information from hours of voice recordings?

Enter IRIS Audio Query - a full-stack application that transforms audio into a searchable knowledge base. With it, you can:

Upload and store clinical conversations, consultation recordings, or dictations
Perform natural language queries (e.g., "What did the patient report about symptoms of fatigue?")
Receive a concise answer generated using Large Language Models

At its core, this application is powered by Intersystems IRIS for robust data handling and vector search and built on Intersystems Interoperability framework, all developed using the Python Native SDK.

User Interface

Uploading an audio file:

Performing a query:

Tech Stack

InterSystems IRIS – Persistent object store & vector search foundation
Python (FastAPI) – Backend APIs and business logic
React – UI for upload and querying
TwelveLabs API – Generate embeddings from audio and text
OpenAI API – Generate text responses using audio content as context
Docker – Containerization

Architecture

The uploaded audio files are stored in IRIS as persistent objects, and are also embedded then stored as vectors. To perform a query, the query text is first embedded, then a vector search is performed to find the most relevant audio embeddings, then the corresponding audio files are retrieved, and finally the answer is generated from the query text with the audio files as context.

The upload and query operations are built as Business Operations using the IRIS Native Python SDK. The FastAPI backend provides a REST API for external applications to interact with this system, while the React frontend provides a UI to interact with the backend.

[ React Frontend ]
        ↓
[ FastAPI Backend (REST API) ]
        ↓
[ IRIS Business Operations (Python SDK) ]
        ↓                      ↘
[ Store Audio in IRIS ]     [ Embed via TwelveLabs → Store vectors ]
                                ↓
                      [ Vector Search on Query Text ]
                                ↓
          [ Retrieve Relevant Audio → Answer using OpenAI ]

Discussion (1)0

Add reply

Comments

Iryna Mologa · Oct 9, 2025

Hi Yu,

Your video is available on InterSystems Developers YouTube:

⏯️python-iris-audio-query

0 0