Predictions with Integrated ML and IRIS

Article

Luis Angel Pére... · Jun 27, 2023 11m read

Open Exchange

#Docker #IntegratedML #InterSystems IRIS

As you know, if you regularly read the articles that are published in the Community, last May InterSystems organized the JOnTheBeach2023 Hackathon held in Malaga (Spain). The topic that was proposed was the use of predictive analysis tools that InterSystems IRIS makes available to all developers with IntegratedML. We must thank both @Thomas Dyar and @Dmitry Maslennikov for all the work and effort they put into making it a resounding success.

Let's briefly introduce IntegratedML

IntegratedML

IntegratedML is a predictive analytics tool that enables any developer to simplify the tasks required to design, build, and test predictive models.

It allows us to go from a design model like this:

A much faster and simpler one like this:

And it does it using SQL commands, in such a way that everything is much easier and more comfortable to use. IntegratedML also allows us to choose which engine we are going to use in the creation of our model, thus being able to choose the one that is most suitable for us.

How to see it in action?

Whenever I've seen IntegratedML presentations I've loved its simplicity, but I was left wondering how to transfer that simplicity of use to a real case. Thinking a bit about our regular clients, I remembered how common it is to use IRIS to integrate data from hospital departmental applications with an HIS and the large amount of information on clinical events available in all of them, so I got down to the work to assemble a complete example.

You have the source code in Open Exchange. The project starts with Docker and you only have to feed the deployed production with the attached files that we will show.

As you can see, the project contains ObjectScript classes that will be loaded automatically when the image is built. To do this you just need to open the VS Code terminal and run the following commands (with Docker running).

docker-compose build
docker-compose up -d

When starting the container, a namespace called MLTEST will be created and a production will be started in which we will find all the necessary business components for the ingestion of raw data, the creation of the model, its training and its subsequent implementation through the reception of HL7 messaging.

But let's not get ahead of ourselves yet and follow the predictive analytics chart.

Data acquisition

Alright, let's narrow down the target of our prediction. Searching through the pages of the Public Administration of Spain I found a few CSVs that fit perfectly with the universe of integrations with origin and destination in a HIS. In this case, the file that I chose was the one related to the data on admissions and hospital discharges due to hip fracture in Castilla y León (Spanish autonomous region) between the years 2020 and 2022.

As you can see we have data such as the age and sex of the patient, dates of admission and discharge and hospital center. Perfect, with these data we could try to predict the hospital stay of each patient, that is, the number of days between admission and discharge.

We have a CSV but we need to store it in our IRIS and nothing is easier than using the IRIS Record Mapper. You can see the result of using Record Mapper in the Business Services column of the MLTEST production:

CSVToEpisodeTrain is the BS in charge of reading the CSV and sending the data to the BP MLTEST.BP.RecordToEpisodeTrain that we will explain later. The data obtained by this BS will be used to train our model.
CSVToEpisode is the BS that will read the data from the CSV that we will use later to launch test predictions before running our predictions obtained from HL7 messages.

Both BSs are going to create an object of the User.IctusMap.Record.cls class for each line of the CSV that will be sent to their respective BPs where the necessary transformations will be carried out to finally obtain records from our MLTEST_Data.Episode and MLTEST_Data.EpisodeTrain tables, the latter will be the table that we will use to generate the prediction model, while the former is where we will store our episodes.

Data preparation

Before creating our model we must transform the CSV reading into objects that are easily usable by the prediction engine and for this we will use the following BP:

MLTEST.BP.RecordToEpisode: which will perform the transformation of the CSV record to our episode table MLTEST_Data.Episode
MLTEST.BP.RecordToEpisodeTrain: which performs the same transformation as in the previous case but storing the episode in MLTEST_Data.EpisodeTrain.

We could have used a single BP for the record in both tables, but to make the process clearer we will leave it as is. In the transformation carried out by the BP we have replaced all the text fields with numerical values to speed up the training of the model.

Alright, we have our BS and BP working, let's feed them by copying the file /shared/train-data.csv into the project to the path /shared/csv/trainIn:

Here we have all the records from our file consumed, transformed and recorded in our training table. Let's repeat the operation with the records that we are going to use for a first test of predictions. By copying /shared/test-data.csv to the path /shared/csv/newIn we already have everything ready to create our model.

In this project it would not be necessary for you to execute the creation and training instructions, since they are included in the BO that manages the recording of the data received by HL7 messaging, but so that you can see it in more detail, we are going to do it before testing. integration with HL7 messages.

AutoML

We have our training data and our test data, we create our model. To do this, we will access the SQL screen of our IRIS (System Explorer --> SQL) from the MLTEST namespace and execute the following commands:

CREATE MODEL StayModel PREDICTING (Stay) FROM MLTEST_Data.EpisodeTrain

In this query we are creating a prediction model called StayModel that will predict the value of the Stay column of our table with training episodes. The stay column did not come in our CSV but we have calculated it in the BP in charge of transforming the CSV record.

Next we proceed to train the model:

TRAIN MODEL StayModel

This instruction will take a while but once the training is complete we can validate the model with our test data by running the following instruction:

VALIDATE MODEL StayModel FROM MLTEST_Data.Episode

This query will calculate how approximate our estimates are. As you can imagine with the data we have, these will not be exactly to get excited about. You can view the result of the validation with the following query:

SELECT * FROM INFORMATION_SCHEMA.ML_VALIDATION_METRICS

From the statistics obtained we can see that the model used by AutoML is a classification model instead of a regression model. Let us explain what the results obtained mean(thank you @Yuri Marx for your article!):

Precision: it is calculated by dividing the number of true positives by the number of predicted positives (sum of true positives and false positives).
Recall: It is calculated by dividing the number of true positives by the number of true positives (sum of true positives and false negatives).
F-Measure: calculated by the following expression: F = 2 * (Precision * Recall) / (Precision + Recall)
Accuracy: calculated by dividing the number of true positives and true negatives by the total number of rows (sum of true positives, false positives, true negatives, and false negatives) of the entire data set.

With this explanation we can already understand how good the generated model is:

As you can see, in general numbers our model is quite bad, we barely reached 35% hits, if we go into more detail we see that for short stays the accuracy is between 35% and 60%, so we would surely need to expand the data that we have with information about possible pathologies that the patient may have and the triage regarding the fracture.

Since we do not have these data that would refine our model much more, we are going to imagine that what we have is more than enough for our objective, so we can start feeding our production with ADT_A01 patient admission messages and we will see the predictions we obtain. .

Running in a production

Con el modelo ya entrenado sólo nos resta preparar la producción para crear un registro en nuestra tabla MLTEST_Data.Episode por cada mensaje recibido. Veamos los componentes de nuestra producción:

HL7ToEpisode: it is the BS that will capture the file with HL7 messages. This BS will redirect messages to the MLTEST.BP.RecordToEpisodeBPL BP
MLTEST.BP.RecordToEpisodeBPL: this BPL will have the following steps.
- Transforming the HL7 into an MLTEST.Data.Episode object
- Recording in the Episode object database.
- Call to MLTEST.BO.PredictStayEpisode BO to get the prediction of days of hospitalization.
- Trace writing with the prediction obtained.

MLTEST.BO.PredictStayEpisode: BO in charge of automatically launching the necessary queries to the prediction model. If this does not exist, it will be in charge of creating and training it automatically, in such a way that it will not be necessary to execute the sql commands. Let's take a look at the code.

Class MLTEST.BO.PredictStayEpisode Extends Ens.BusinessOperation
{

Property ModelName As %String(MAXLEN = 100);
/// Description
Parameter SETTINGS = "ModelName";
Parameter INVOCATION = "Queue";
/// Description
Method PredictStay(pRequest As MLTEST.Data.PredictionRequest, pResponse As MLTEST.Data.PredictionResponse) As %Status
{
    set predictionRequest = pRequest
    set pResponse = ##class("MLTEST.Data.PredictionResponse").%New()
    set pResponse.EpisodeId = predictionRequest.EpisodeId
    set tSC = $$$OK
    // CHECK IF MODEL EXISTS 
    set sql = "SELECT MODEL_NAME FROM INFORMATION_SCHEMA.ML_MODELS WHERE MODEL_NAME = '"_..ModelName_"'"
    set statement = ##class(%SQL.Statement).%New()
    set status = statement.%Prepare(sql)
    if ($$$ISOK(status)) {
        set resultSet = statement.%Execute()
        if (resultSet.%SQLCODE = 0) {
            set modelExists = 0
            while (resultSet.%Next() '= 0) {
                if (resultSet.%GetData(1) '= "") {
                    set modelExists = 1
                    // GET STAY PREDICTION WITH THE LAST EPISODE PERSISTED
                    set sqlPredict = "SELECT PREDICT("_..ModelName_") AS PredictedStay FROM MLTEST_Data.Episode WHERE %ID = ?"
                    set statementPredict = ##class(%SQL.Statement).%New(), statement.%ObjectSelectMode = 1
                    set statusPredict = statementPredict.%Prepare(sqlPredict)
                    if ($$$ISOK(statusPredict)) {
                        set resultSetPredict = statementPredict.%Execute(predictionRequest.EpisodeId)
                        if (resultSetPredict.%SQLCODE = 0) {
                                while (resultSetPredict.%Next() '= 0) {
                                    set pResponse.PredictedStay = resultSetPredict.%GetData(1)
                                }
                        }
                    }
                    else {
                        set tSC = statusPredict
                    }
                }
            }
            if (modelExists = 0) {
                // CREATION OF THE PREDICTION MODEL
                set sqlCreate = "CREATE MODEL "_..ModelName_" PREDICTING (Stay) FROM MLTEST_Data.EpisodeTrain"
                set statementCreate = ##class(%SQL.Statement).%New()
                set statusCreate = statementCreate.%Prepare(sqlCreate)
                if ($$$ISOK(status)) {
                    set resultSetCreate = statementCreate.%Execute()
                    if (resultSetCreate.%SQLCODE = 0) {
                        // MODEL IS TRAINED WITH THE CSV DATA PRE-LOADED
                        set sqlTrain = "TRAIN MODEL "_..ModelName
                        set statementTrain = ##class(%SQL.Statement).%New()
                        set statusTrain = statementTrain.%Prepare(sqlTrain)
                        if ($$$ISOK(statusTrain)) {
                            set resultSetTrain = statementTrain.%Execute()
                            if (resultSetTrain.%SQLCODE = 0) {
                                // VALIDATION OF THE MODEL WITH THE PRE-LOADED EPISODES
                                set sqlValidate = "VALIDATE MODEL "_..ModelName_" FROM MLTEST_Data.Episode"
                                set statementValidate = ##class(%SQL.Statement).%New()
                                set statusValidate = statementValidate.%Prepare(sqlValidate)
                                if ($$$ISOK(statusValidate)) {
                                    set resultSetValidate = statementValidate.%Execute()
                                    if (resultSetValidate.%SQLCODE = 0) {
                                        // GET STAY PREDICTION WITH THE LAST EPISODE PERSISTED
                                        set sqlPredict = "SELECT PREDICT("_..ModelName_") AS PredictedStay FROM MLTEST_Data.Episode WHERE %ID = ?"
                                        set statementPredict = ##class(%SQL.Statement).%New(), statement.%ObjectSelectMode = 1
                                        set statusPredict = statementPredict.%Prepare(sqlPredict)
                                        if ($$$ISOK(statusPredict)) {
                                            set resultSetPredict = statementPredict.%Execute(predictionRequest.EpisodeId)
                                            if (resultSetPredict.%SQLCODE = 0) {
                                                while (resultSetPredict.%Next() '= 0) {
                                                    set pResponse.PredictedStay = resultSetPredict.%GetData(1)
                                                }
                                            }
                                        }
                                        else {
                                            set tSC = statusPredict
                                        }
                                    }
                                }
                                else {
                                    set tSC = statusValidate
                                }
                            }
                        }
                        else {
                            set tSC = statusTrain
                        }
                    }
                }
                else {
                    set tSC = status
                }
            }
        }
    }
    else {
        set tSC = status
    }
    quit tSC
}

XData MessageMap
{
<MapItems>
  <MapItem MessageType="MLTEST.Data.PredictionRequest">
    <Method>PredictStay</Method>
  </MapItem>
</MapItems>
}

}

As you can see, we have a property that will help us to define the name we want for our prediction model and initially we will launch a query to the ML_MODELS table to make sure that the model exists.

Well then, we are ready to launch our messages, for this we will copy the project file /shared/messagesa01.hl7 to the folder /shared/hl7/in this action will send us 50 generated data messages to our production. Let's look at some of the predictions.

For our patient Sonia Martínez, 2 months old, we will have a stay of...

8 days! Get well soon!

Let's look at another patient:

Ana Torres Fernandez, 50 years old...

9 days of stay for her.

Well, that's all for today. The least important thing in this example is the numerical value of the prediction, you can see that it is quite poor from the statistics we have obtained, but it could be very useful for cases in which you have a good set of data on which to apply this functionality so cool from IntegratedML.

If you want to tinker with it you can download the Community version or use the one configured in the OpenExchange project associated with this article.

If you have any questions or need clarification, do not hesitate to ask in the comments.

Go to the original post written by @Luis Angel Pérez Ramos