Article
· Oct 5 2m read

IRIS Audio Query - Query Audio with Text using InterSystems IRIS

With the rapid adoption of telemedicine, remote consultations, and digital dictation, healthcare professionals are communicating more through voice than ever before. Patients engaging in virtual conversations generate vast amounts of unstructured audio data, so how can clinicians or administrators search and extract information from hours of voice recordings?

 

Enter IRIS Audio Query - a full-stack application that transforms audio into a searchable knowledge base. With it, you can:

  • Upload and store clinical conversations, consultation recordings, or dictations
  • Perform natural language queries (e.g., "What did the patient report about symptoms of fatigue?")
  • Receive a concise answer generated using Large Language Models

At its core, this application is powered by Intersystems IRIS for robust data handling and vector search and built on Intersystems Interoperability framework, all developed using the Python Native SDK.

 

User Interface

Uploading an audio file:

 Performing a query:

 

Tech Stack

  • InterSystems IRIS – Persistent object store & vector search foundation
  • Python (FastAPI) – Backend APIs and business logic
  • React – UI for upload and querying
  • TwelveLabs API – Generate embeddings from audio and text
  • OpenAI API – Generate text responses using audio content as context
  • Docker – Containerization 

 

Architecture

 

The uploaded audio files are stored in IRIS as persistent objects, and are also embedded then stored as vectors. To perform a query, the query text is first embedded, then a vector search is performed to find the most relevant audio embeddings, then the corresponding audio files are retrieved, and finally the answer is generated from the query text with the audio files as context.

The upload and query operations are built as Business Operations using the IRIS Native Python SDK. The FastAPI backend provides a REST API for external applications to interact with this system, while the React frontend provides a UI to interact with the backend.

[ React Frontend ]
        ↓
[ FastAPI Backend (REST API) ]
        ↓
[ IRIS Business Operations (Python SDK) ]
        ↓                      ↘
[ Store Audio in IRIS ]     [ Embed via TwelveLabs → Store vectors ]
                                ↓
                      [ Vector Search on Query Text ]
                                ↓
          [ Retrieve Relevant Audio → Answer using OpenAI ]
Discussion (0)1
Log in or sign up to continue