Question Julian Matthews · Feb 10

How can I correlate xml that contains CDATA?

Hey everyone.

I have a web service which is receiving XML as a stream, and I'm intending to then correlate the stream into an object. Unfortunately I am hitting a road block where the sender places a large chunk of the XML within a cdata block which is then causing the correlate of %XML.Reader to not work.

The structure of my xml is along the lines of:

<abc><def>010203</def><ghi>040506</ghi><jkl><mno><pqr>070809</pqr><stu>101112</stu></mno></jkl></abc>

I have my objectscript classes built that conform to the above, and the correlate works when the xml as is the above, however what is sometimes happening is:

<abc><def>010203</def><ghi>040506</ghi><jkl><![CDATA[<mno><pqr>070809</pqr><stu>101112</stu></mno>]]></jkl></abc>

I have seen some other posts where the answer was to add "CONTENT=MIXED" to the property where the CDATA is occurring, but this is not something that can be applied when the property is referencing another class and isn't simply a string. Additionally, one of the values within the XML being presented within a CDATA is going to be containing a base64 encoded file, so a string for this would be unsuitable.

Is there a simple way to receive xml and be able to parse the xml using %XML.Reader or a similar XML tool without being stymied by cdata wrapped data?

Comments

DC AI Bot · Feb 10 Bot

To handle XML containing CDATA using %XML.Reader or similar tools, you can follow these solutions:

  1. If the CDATA section contains XML or complex data, you can treat it as a raw string and then process it further once extracted. To do this:

    • Correlate the parent element containing the CDATA and retrieve it as a string.
    • Use additional XML parsing tools to process the data within the CDATA separately. For instance:
      CLASS MyClass Extends %RegisteredObject
        Property RawXML As %String;  // for CDATA content
      
        Method ProcessCDATA() As %Status {
          // parsing code for RawXML
        }
      
  2. Modify your XML-enabled classes to use the CONTENT = "MIXED" property if the structure allows. This enables the content of the CDATA to be read as raw XML. Note, however, that CONTENT = "MIXED" cannot generally be applied when the property is referencing another class [1].

  3. If you have cases where specific outputs are expected from the CDATA, you can:

    • Create a custom datatype that processes the XML content of the CDATA after correlation. The datatype would have methods to parse the CDATA into an XML structure directly [2].

These approaches work when handling variations in the data contained within CDATA blocks. Where structured object generation would be useful, specific parsing tailored to the content type would follow afterward.

Sources:

0
David Hockenbroch · Feb 10

You would probably be better served using XPath instead of the XML reader. When you evaluate and XPath expression, it will recognize the CDATA and parse it correctly, returning either a value or another XML object.

0
Julian Matthews  Feb 12 to David Hockenbroch

Hey David.

I'll give XPath a try and see how I get on and update accordingly.

Edit:

It seems that trying to use xpath for the elements within the cdata block fail, and attempting to use xpath for <jkl> returns the content of the entire cdata as a stream object. However I think this is going to be a limitation of the xml I am working with rather than xpath itself.

0
Julius Kavay · Feb 10

Assuming, your XML stream never has more  then MAXSTRING characters AND if you can live with a "quick-and-dirty" solution then try this one

Class DC.XML.abc Extends (%RegisteredObject, %XML.Adaptor)
{
Property def As%String;Property ghi As%String;Property jkl As jkl;ClassMethod Test(str)
{
    if$isobject(str) { set:str.Rewind() xml=str.Read(str.Size) } else { set xml=str }
    for{set i=$f(xml,"<![CDATA[",$g(i)) q:'i
        setj=$f(xml,"]]>",i) zt:'j"XMLE"set$e(xml,j-3,j-1)="", $e(xml,i-9,i-1)="", i=i-9
    }
    
    set rdr=##class(%XML.Reader).%New()
    if 'rdr.OpenString(xml) write"OpenErr",! quitdo rdr.Correlate("abc","DC.XML.abc")
    while rdr.Next(.abc,.st) { zzdo abc }
}
}

Class DC.XML.jkl Extends (%RegisteredObject, %XML.Adaptor)
{
Property mno As mno;
}

Class DC.XML.mno Extends (%RegisteredObject, %XML.Adaptor)
{
Property pqr As%String;Property stu As%String;
}

And some tests...

set s1="<abc><def>010203</def><ghi>040506</ghi><jkl><mno><pqr>070809</pqr><stu>101112</stu></mno></jkl></abc>"set s2="<abc><def>010203</def><ghi>040506</ghi><jkl><![CDATA[<mno><pqr>070809</pqr><stu>101112</stu></mno>]]></jkl></abc>"do##class(DC.XML.abc).Test(s1)
def................................: 010203
ghi................................: 040506
jkl.mno.pqr........................: 070809
jkl.mno.stu........................: 101112do##class(DC.XML.abc).Test(s2)
def................................: 010203
ghi................................: 040506
jkl.mno.pqr........................: 070809
jkl.mno.stu........................: 101112

Note: the above ZZDO command takes an oref as argument and prints it, you can replace it with a simple zw oref.

0
Julian Matthews  Feb 12 to Julius Kavay

Hey Julius, thank you for this. 

Unfortunately, there is a risk of hitting the maxstring lengths due to one of the fields within the CData blocks will be a base64 encoded document or two. But this does look interesting for other use cases.

0