Article
· Sep 27, 2023 6m read

Fun with IntegratedML Time Series

Recently @Anastasia Dyubaylo published a post (this one) showing a new IntegratedML functionality for time series predictions that @Thomas Dyar already presented to us at the Global Summit 2023 so, let's go to set up a small workshop to test it!

Introduction

For this workshop we have chosen as the topic the prediction of monthly users of the Valencia Metro line by line. To do this, we have monthly data broken down by lines since 2022 as well as annual data by lines since 2017 that we will extrapolate monthly.

Once compiled from the monthly amount of passengers, we will be able to create a CSV file that looks like this:

Year,Month,Passengers,Line
2017,1,549007,1
2017,1,515947,2
2017,1,770564,3
2017,1,621256,5
2017,1,429640,7
2017,1,520692,9
2017,1,322392,4
2017,1,95473,6
2017,1,18188,8
2017,1,0,10

In our csv file we have included a final line ",,," which will allow us to know when the file has finished.

Mapping CSV files

Although the mapping has already been done in the project included in this article, we are going to detail it for those of you who are not familiar with Record Mapper.

As you can see in the previous image, the first row will be the header of the file and we have the data for the year, month, passenger flow and line. To register the CSV in our IRIS database we will use the Record Mapper functionality, which we will access through the Management Portal from the menu options Interoperability -> Build -> CSV Record Wizard. Once we have accessed we will fill in the fields as indicated:

Keep in mind that our CSV file has headers, so we have checked the Sample has header row option. With this information IRIS will automatically generate the CSV mapping, displaying the following screen:

We will not need to change anything and by clicking on Generate the system will automatically create all the necessary classes for CSV mapping from our productions.

Production configuration

Let's take a look at the production that is configured and running in our project:

As we see, our production is made up of 3 Business Components. Let's see in a little more detail what each one does:

CSVIn

This Business Service is of the class EnsLib.RecordMap.Service.FileService and will help us capture the CSV files found in a path indicated in its configuration. For each line of the CSV file that it captures, it will create an object of the type MLTEST.PassengersMap.Record defined by the RecordMap that has been configured, in our case, PassengersMap, and send it to the Business Component indicated in TargetConfigNames.

PredictionModelGeneration

This Business Process is defined by the BPL MLTEST.BP.PopulateTable from which we will transform the raw data received into the format that interests us most to train our model. Let's take a look at the BPL:

Let's see in detail in the diagram:

Firstly, we will check if the object received corresponds to a real data or is the indicator of the end of the file that we have previously defined.

The object is NOT the end of file:

In case it is not the end of the file, we will proceed to transform the raw data received into something more usable by our model and for this we have defined a small DTL: MLTEST.DT.ParseDate

This DTL will only generate an object of type MLTEST.Data.PassengersInfo with a date of type DateTime formed with the data from the CSV. Once the transformation is completed, the BPL will just to save the object (in the PassengersInfo context variable) in our database.

 set sc = context.PassengersInfo.%Save()
 if ($$$ISERR(sc))
 {
   set context.Result = "Error"
   set context.ErrorMessage = sc
 }
 else
 {
   set context.Result = "Success"
 }

The object is the end of the file:

If we have finished reading the file, we must proceed to format the data so that the Time Series model understands it. ATTENTION!! We must take into consideration the following two points (and that I did not take into account 🙄):

  1. It is necessary to have a DateTime type field that will be used to continue the time series, a Date type will not be enough.
  2. The records must be ordered from the oldest record to the most recent for the training to work correctly

In the event that we already have the model generated, we will delete it, as well as the data in the training table. The process will continue as follows:

 

Let's see each step in detail:

  1. We perform a massive insertion with the transformed data.
    INSERT INTO MLTEST_Data.PassengersLine (DateOfData, Line1, Line2, Line3, Line4, Line5, Line6, Line7, Line8, Line9, Line10) 
    SELECT DateOfData, 
    (SELECT MAX(s1.Passengers) FROM MLTEST_Data.PassengersInfo s1 WHERE s1.Line = 1 AND s1.DateOfData = s.DateOfData) as Line1, 
    (SELECT MAX(s2.Passengers) FROM MLTEST_Data.PassengersInfo s2 WHERE s2.Line = 2 AND s2.DateOfData = s.DateOfData) as Line2, 
    (SELECT MAX(s3.Passengers) FROM MLTEST_Data.PassengersInfo s3 WHERE s3.Line = 3 AND s3.DateOfData = s.DateOfData) as Line3, 
    (SELECT MAX(s4.Passengers) FROM MLTEST_Data.PassengersInfo s4 WHERE s4.Line = 4 AND s4.DateOfData = s.DateOfData) as Line4, 
    (SELECT MAX(s5.Passengers) FROM MLTEST_Data.PassengersInfo s5 WHERE s5.Line = 5 AND s5.DateOfData = s.DateOfData) as Line5,
    (SELECT MAX(s6.Passengers) FROM MLTEST_Data.PassengersInfo s6 WHERE s6.Line = 6 AND s6.DateOfData = s.DateOfData) as Line6, 
    (SELECT MAX(s7.Passengers) FROM MLTEST_Data.PassengersInfo s7 WHERE s7.Line = 7 AND s7.DateOfData = s.DateOfData) as Line7, 
    (SELECT MAX(s8.Passengers) FROM MLTEST_Data.PassengersInfo s8 WHERE s8.Line = 8 AND s8.DateOfData = s.DateOfData) as Line8, 
    (SELECT MAX(s9.Passengers) FROM MLTEST_Data.PassengersInfo s9 WHERE s9.Line = 9 AND s9.DateOfData = s.DateOfData) as Line9, 
    (SELECT MAX(s10.Passengers) FROM MLTEST_Data.PassengersInfo s10 WHERE s10.Line = 10 AND s10.DateOfData = s.DateOfData) as Line10 
    FROM MLTEST_Data.PassengersInfo s GROUP BY DateOfData
  2. We create our model of type TIME SERIES, defining the values to predict and the column that contains the time series of type DateTime (DateOfData):
    CREATE TIME SERIES MODEL PassengersPrediction 
    PREDICTING (Line1,Line2,Line3,Line4,Line5,Line6,Line7,Line8,Line9,Line10) 
    BY (DateOfData) 
    FROM MLTEST_Data.PassengersLine USING {"Forward":3}
    
  3. We train our model:
    TRAIN MODEL PassengersPrediction
  4. And finally we call the Business Operation PredictionToCSV that will generate the prediction and record it in a CSV.

PredictionToCSV

This Business Operation is extremely simple and all thanks to @Evgeny Shvarov csvgen project that allows us to directly export the results of a SQL query to a CSV file in a direct way. Here you can see the code of said BO:

Class MLTEST.BO.PredictionToCSV Extends Ens.BusinessOperation
{

Parameter INVOCATION = "Queue";
Method ExportPrediction(pRequest As Ens.Request, pResponse As Ens.Response) As %Status
{
    set query="SELECT WITH PREDICTIONS (PassengersPrediction) * FROM MLTEST_Data.PassengersLine"
    w ##class(community.csvgen).SQLToCSV(",",1,"/shared/prediction.csv",query)

    Quit $$$OK
}

XData MessageMap
{
<MapItems>
  <MapItem MessageType="Ens.Request">
    <Method>ExportPrediction</Method>
  </MapItem>
</MapItems>
}

}

As you can see, the query responsible for obtaining the prediction is not complicated at all.

SELECT WITH PREDICTIONS (PassengersPrediction) * FROM MLTEST_Data.PassengersLine

Let's see what the result is generated in the file /shared/prediction.csv

Conclusions

As you can see, creating time series prediction models is relatively simple, you just have to make sure you have a DateTime type column ordered correctly so that the training works without problems.

If you have any suggestions or comments, do not hesitate to leave them in the comments section. Thank you very much for your time!

Discussion (1)2
Log in or sign up to continue