Article
· 17 hr ago 53m read

Embedded Python VS ObjectScript - Performance Testing Parsing XML

Since the introduction of Embedded Python there has always been doubt about its performance compared to ObjectScript and on more than one occasion I have discussed this with @Guillaume Rongier , well, taking advantage of the fact that I was making a small application to capture data from public competitions in Spain and to be able to perform searches using the capabilities of VectorSearch I saw the opportunity to carry out a small test.

Data for the test

Public tender information is provided monthly in XML files from this URL  and the typical format of a tender information is as follows:

 
Spoiler

As you can see, each contest has considerable dimensions and in each file we can find about 450 contests. This dimension does not make it feasible to use an ObjectScript class for its mapping (it could be done... but I'm not in the mood).  

Code for testing

My idea is to capture only the relevant fields for later searches, for this I have created the following class that will serve to store the captured information:

Class Inquisidor.Object.Licitacion Extends (%Persistent, %XML.Adaptor) [ DdlAllowed ]
{

Property IdLicitacion As %String(MAXLEN = 200);
Property Titulo As %String(MAXLEN = 2000);
Property URL As %String(MAXLEN = 1000);
Property Resumen As %String(MAXLEN = 2000);
Property TituloVectorizado As %Vector(DATATYPE = "DECIMAL", LEN = 384);
Property Contratante As %String(MAXLEN = 2000);
Property URLContratante As %String(MAXLEN = 2000);
Property ValorEstimado As %Numeric(STORAGEDEFAULT = "columnar");
Property ImporteTotal As %Numeric(STORAGEDEFAULT = "columnar");
Property ImporteTotalSinImpuestos As %Numeric(STORAGEDEFAULT = "columnar");
Property FechaAdjudicacion As %Date;
Property Estado As %String;
Property Ganador As %String(MAXLEN = 200);
Property ImporteGanador As %Numeric(STORAGEDEFAULT = "columnar");
Property ImporteGanadorSinImpuestos As %Numeric(STORAGEDEFAULT = "columnar");
Property Clasificacion As %String(MAXLEN = 10);
Property Localizacion As %String(MAXLEN = 200);
Index IndexContratante On Contratante;
Index IndexGanador On Ganador;
Index IndexClasificacion On Clasificacion;
Index IndexLocalizacion On Localizacion;
Index IndexIdLicitation On IdLicitacion [ PrimaryKey ];
}

To capture the data using Embedded Python I have used the  xml.etree.ElementTree library  that allows us to extract the values ​​node by node. Here is the Python method I have used to map the XML:

Method ReadXML(xmlPath As %String) As %String [ Language = python ]
{
    import xml.etree.ElementTree as ET
    import iris
    import pandas as pd

    try :
        tree = ET.parse(xmlPath)
        root = tree.getroot()
        for entry in root.iter("{http://www.w3.org/2005/Atom}entry"):
            licitacion = {"titulo": "", "resumen": "", "idlicitacion": "", "url": "", "contratante": "", "urlcontratante": "", "estado": "", "valorestimado": "", "importetotal": "", "importetotalsinimpuestos": "", "clasificacion": "", "localizacion": "", "fechaadjudicacion": "", "ganador": "", "importeganadorsinimpuestos": "", "importeganador": ""}
            for tags in entry:
                if tags.tag == "{http://www.w3.org/2005/Atom}title":
                    licitacion["titulo"] = tags.text
                if tags.tag == "{http://www.w3.org/2005/Atom}summary":
                    licitacion["resumen"] = tags.text
                if tags.tag == "{http://www.w3.org/2005/Atom}id":
                    licitacion["idlicitacion"] = tags.text
                if tags.tag == "{http://www.w3.org/2005/Atom}link":
                    licitacion["url"] = tags.attrib["href"]
                if tags.tag == "{urn:dgpe:names:draft:codice-place-ext:schema:xsd:CommonAggregateComponents-2}ContractFolderStatus":
                    for detailTags in tags:
                        if detailTags.tag == "{urn:dgpe:names:draft:codice-place-ext:schema:xsd:CommonAggregateComponents-2}LocatedContractingParty":
                            for infoContractor in detailTags:
                                if infoContractor.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}Party":
                                    for contractorDetails in infoContractor:
                                        if contractorDetails.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}PartyName" :
                                            for name in contractorDetails:
                                                licitacion["contratante"] = name.text
                                        elif contractorDetails.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}WebsiteURI":
                                            licitacion["urlcontratante"] = contractorDetails.text
                        elif detailTags.tag == "{urn:dgpe:names:draft:codice-place-ext:schema:xsd:CommonAggregateComponents-2}ContractFolderStatusCode":
                            licitacion["estado"] = detailTags.text
                        elif detailTags.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}ProcurementProject":
                            for infoProcurement in detailTags:
                                if infoProcurement.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}BudgetAmount":
                                    for detailBudget in infoProcurement:
                                        if detailBudget.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}EstimatedOverallContractAmount":
                                            licitacion["valorestimado"] = detailBudget.text
                                        elif detailBudget.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}TotalAmount":
                                            licitacion["importetotal"] = detailBudget.text
                                        elif detailBudget.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}TaxExclusiveAmount":
                                            licitacion["importetotalsinimpuestos"] = detailBudget.text
                                elif infoProcurement.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}RequiredCommodityClassification":
                                    for detailClassification in infoProcurement:
                                        if detailClassification.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}ItemClassificationCode":
                                            licitacion["clasificacion"] = detailClassification.text
                                elif infoProcurement.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}RealizedLocation":
                                    for detailLocalization in infoProcurement:
                                        if detailLocalization.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}CountrySubentity":
                                            licitacion["localizacion"] = detailLocalization.text
                        elif detailTags.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}TenderResult":
                            for infoResult in detailTags:
                                if infoResult.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}AwardDate":
                                    licitacion["fechaadjudicacion"] = infoResult.text
                                elif infoResult.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}WinningParty":
                                    for detailWinner in infoResult:
                                        if detailWinner.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}PartyName":
                                            for detailName in detailWinner:
                                                if detailName.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}Name":
                                                    licitacion["ganador"] = detailName.text
                                elif infoResult.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}AwardedTenderedProject":
                                    for detailTender in infoResult:
                                        if detailTender.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonAggregateComponents-2}LegalMonetaryTotal":
                                            for detailWinnerAmount in detailTender:
                                                if detailWinnerAmount.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}TaxExclusiveAmount":
                                                    licitacion["importeganadorsinimpuestos"] = detailWinnerAmount.text
                                                elif detailWinnerAmount.tag == "{urn:dgpe:names:draft:codice:schema:xsd:CommonBasicComponents-2}PayableAmount":
                                                    licitacion["importeganador"] = detailWinnerAmount.text
            iris.cls("Ens.Util.Log").LogInfo("Inquisidor.BP.XMLToLicitacion", "VectorizePatient", "Terminado mapeo "+licitacion["titulo"])
            if licitacion.get("importeganador") is not None and licitacion.get("importeganador") is not "":
                iris.cls("Ens.Util.Log").LogInfo("Inquisidor.BP.XMLToLicitacion", "VectorizePatient", "Lanzando insert "+licitacion["titulo"])
                stmt = iris.sql.prepare("INSERT INTO INQUISIDOR_Object.Licitacion (Titulo, Resumen, IdLicitacion, URL, Contratante, URLContratante, Estado, ValorEstimado, ImporteTotal, ImporteTotalSinImpuestos, Clasificacion, Localizacion, FechaAdjudicacion, Ganador, ImporteGanadorSinImpuestos, ImporteGanador) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,TO_DATE(?,'YYYY-MM-DD'),?,?,?)")
                try:
                    rs = stmt.execute(licitacion["titulo"], licitacion["resumen"], licitacion["idlicitacion"], licitacion["url"], licitacion["contratante"], licitacion["urlcontratante"], licitacion["estado"], licitacion["valorestimado"], licitacion["importetotal"], licitacion["importetotalsinimpuestos"], licitacion["clasificacion"], licitacion["localizacion"], licitacion["fechaadjudicacion"], licitacion["ganador"], licitacion["importeganadorsinimpuestos"], licitacion["importeganador"])
                except Exception as err:
                    iris.cls("Ens.Util.Log").LogInfo("Inquisidor.BP.XMLToLicitacion", "VectorizePatient", repr(err))
        return "Success"
    except Exception as err:
        iris.cls("Ens.Util.Log").LogInfo("Inquisidor.BP.XMLToLicitacion", "VectorizePatient", repr(err))
        return "Error"
}

Once the mapping is finished, we proceed to perform a simple insert with the record.

For mapping using ObjectScript I have used the %XML.TextReader functionality, let's see the method:

Method OnRequest(pRequest As Ens.StreamContainer, Output pResponse As Ens.Response) As %Status
{
    set filename = pRequest.OriginalFilename

    set status=##class(%XML.TextReader).ParseFile(filename,.textreader)
    //check status
    if $$$ISERR(status) {do $System.Status.DisplayError(status) quit}
    set tStatement = ##class(%SQL.Statement).%New()
    //iterate through document, node by node
    while textreader.Read()
    {        

        if ((textreader.NodeType = "element") && (textreader.Depth = 2) && (textreader.Path = "/feed/entry")) {
            if ($DATA(licitacion))
            {                
                if (licitacion.ImporteGanador '= ""){
                    //set sc = licitacion.%Save()
                    set myquery = "INSERT INTO INQUISIDOR_Object.LicitacionOS (Titulo, Resumen, IdLicitacion, URL, Contratante, URLContratante, Estado, ValorEstimado, ImporteTotal, ImporteTotalSinImpuestos, Clasificacion, Localizacion, FechaAdjudicacion, Ganador, ImporteGanadorSinImpuestos, ImporteGanador) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
                    set qStatus = tStatement.%Prepare(myquery)
                    if qStatus '= 1 {
                        write "%Prepare failed:" do $System.Status.DisplayError(qStatus)
                        quit
                    }
                    set rset = tStatement.%Execute(licitacion.Titulo, licitacion.Resumen, licitacion.IdLicitacion, licitacion.URL, licitacion.Contratante, licitacion.URLContratante, licitacion.Estado, licitacion.ValorEstimado, licitacion.ImporteTotal, licitacion.ImporteTotalSinImpuestos, licitacion.Clasificacion, licitacion.Localizacion, licitacion.FechaAdjudicacion, licitacion.Ganador, licitacion.ImporteGanadorSinImpuestos, licitacion.ImporteGanador)
                }                
            }
            set licitacion = ##class(Inquisidor.Object.LicitacionOS).%New()
        }        

        if (textreader.Path = "/feed/entry/title"){
            if (textreader.Value '= ""){
                set licitacion.Titulo = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/summary"){
            if (textreader.Value '= ""){
                set licitacion.Resumen = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/id"){
            if (textreader.Value '= ""){
                set licitacion.IdLicitacion = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/link"){            
            if (textreader.MoveToAttributeName("href")) {
                set licitacion.URL = textreader.Value                
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cbc-place-ext:ContractFolderStatusCode"){
            if (textreader.Value '= ""){
                set licitacion.Estado = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac-place-ext:LocatedContractingParty/cac:Party/cac:PartyName"){
            if (textreader.Value '= ""){
                set licitacion.Contratante = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac-place-ext:LocatedContractingParty/cac:Party/cbc:WebsiteURI"){
            if (textreader.Value '= ""){
                set licitacion.URLContratante = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:BudgetAmount/cbc:EstimatedOverallContractAmount"){
            if (textreader.Value '= ""){
                set licitacion.ValorEstimado = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:BudgetAmount/cbc:TotalAmount"){
            if (textreader.Value '= ""){
                set licitacion.ImporteTotal = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:BudgetAmount/cbc:TaxExclusiveAmount"){
            if (textreader.Value '= ""){
                set licitacion.ImporteTotalSinImpuestos = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:RequiredCommodityClassification/cbc:ItemClassificationCode"){
            if (textreader.Value '= ""){
                set licitacion.Clasificacion = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:ProcurementProject/cac:RealizedLocation/cbc:CountrySubentity"){
            if (textreader.Value '= ""){
                set licitacion.Localizacion = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:TenderResult/cbc:AwardDate"){
            if (textreader.Value '= ""){
                set licitacion.FechaAdjudicacion = $System.SQL.Functions.TODATE(textreader.Value,"YYYY-MM-DD")
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:TenderResult/cac:WinningParty/cac:PartyName/cbc:Name"){
            if (textreader.Value '= ""){
                set licitacion.Ganador = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:TenderResult/cac:AwardedTenderedProject/cac:LegalMonetaryTotal/cbc:TaxExclusiveAmount"){
            if (textreader.Value '= ""){
                set licitacion.ImporteGanadorSinImpuestos = textreader.Value
            }
        }
        if (textreader.Path = "/feed/entry/cac-place-ext:ContractFolderStatus/cac:TenderResult/cac:AwardedTenderedProject/cac:LegalMonetaryTotal/cbc:PayableAmount"){
            if (textreader.Value '= ""){
                set licitacion.ImporteGanador = textreader.Value
            }
        }       
    }    
    // set resultEmbeddings = ..GenerateEmbeddings()
    Quit $$$OK
}

Both codes will only register in the database those contests that have already been resolved (they have been informed of the total winning amount).

Production configuration

With our methods implemented in the corresponding Business Processes, all that remains for our test is to configure the production that will allow us to feed both methods. We will simply add two Business Services that will simply capture the files with the XML information and send it to the Business Processes.

We will create two Business Services to avoid any possible interference when capturing and sending information to the Business Processes. The production will look like this:

For the test we will introduce the public tenders corresponding to the month of February, which make a total of 91 files with 1.30 GB of data. Let's see how both codes behave.

Ready...

On your marks...

Go!

XML parsing results using ObjectScript

Let's start with the time it took the ObjectScript code to map the 91 files:

The first file started at 21:11:15, let's see when the last file was mapped:

If we look at the details of the last message we can see the date the processing ended:

The end time is 21:17:43, which makes a processing time of 6 minutes and 28 seconds.

XML parsing results using Embedded Python

Let's repeat the same operation with the process that uses Python:

It started at 21:11:15 as in the previous case, let's see when it ended:

Let's look at the message in detail to know the exact ending:

The end time was 21:12:03, so the total time of processing is 48 seconds.

Well, we have a winner! In this round, Embedded Python has beaten ObjectScript, at least when it comes to XML parsing. If you have any suggestions or improvements to the code of both methods, I encourage you to put them in the comments and I will repeat the tests to check for possible improvements.

What we can say is that with regard to the obvious performance superiority of ObjectScript over Python...

myth-busted – Mike Raffety, DTM, PID

Discussion (4)2
Log in or sign up to continue