Schematron is a rule-based validation language for making assertions about the presence or absence of certain patterns in XML documents. A schematron refers to a collection of one or more rules containing tests. Schematrons are written in a form of XML, making them relatively easy for everyone, even non-programmers, to inspect, understand, and write
Essentially, a Schematron performs two actions in sequence:
Find context nodes of interest in the document. A "context node" can be an element of a particular type or a specific element at a particular place in the document, an attribute, or an attribute value. For example, suppose you want to check if the sum of the <Percent> elements within each context node is 100%. In this case, the context node would be the <Total> element. For each of those nodes of interest, it checks whether a specific statement is true or false. For example, you might have a rule written to answer the question "Is the sum total 100%?"
The ideal resource to look for more detail on the subject would be: https://www.schematron.com/. What matters to us is that we can validate our XML document based on a Schematron definition. For this, it must be taken into account that there are multiple open source projects with Schematron implementations for XSLT. One of the most interesting is available at https://github.com/schxslt/schxslt.git.
This article aims to leverage the Python capabilities available in InterSystems IRIS (for Health) or HealthShare (Health Connect).
For this we need an instance of InterSystems IRIS or HealthShare Health Connect. For our example we will use a container with the latest community edition of InterSystems IRIS for Health. We have to start the instance by publishing the default ports and mapping the current directory to the durable folder in the container.
Now that we have our instance running we can start a console in the container.
Now we can focus on the Python module. We will use lxml. This is a Python binding for the C libraries libxml2 and libxslt. It is unique as it combines the speed and completeness of the XML functions of these libraries with the simplicity of a native Python API, mostly compatible with the familiar ElementTree API. For more information about lxml https://lxml.de/index.html
Assuming that the pip3 package manager (and of course Python 3) is already installed on the instance, the appropriate module will need to be installed.
The example method that we will use will be coded in Python and will be in charge of parsing and validating the schematron rules. The code of the class that we will use is the following:
Class dc.schematron Extends %RegisteredObject
{
/// Description
ClassMethod simpleTest() [ Language = python ]
{
from lxml import isoschematron
from lxml import etree
print("Validating File...\n")
# def runsch(rulesFile, xmlFile):
#open files
rules = open('/durable/test-schema.sch', 'rb') # Schematron schema
XMLhere = open('/durable/test-file.xml', 'rb') # XML file to check
#Parse schema
sct_doc= etree.parse(rules)
schematron=isoschematron.Schematron(sct_doc, store_report=True)
#Parse XML
doc = etree.parse(XMLhere)
#Validate against schema
validationResult = schematron.validate(doc)
report = schematron._validation_report
#Check result
if validationResult:
print("passed")
else:
print("failed")
print(report)
}
}
PythonPython
The truth is that it is a fairly simple method. It opens 2 files – the file with the rules (schematron) and an example file. The rule is to check if the sum of the <Percent> elements within each <Total> node is 100%. To execute it, you will have to launch the following command from the console:
d ##class(dc.schematron).simpleTest()
The result will be presented in the console. The same logic can be used in an interop production.
The source code containing all the elements is available here.