Question
· 12 hr ago

How can I correlate xml that contains CDATA?

Hey everyone.

I have a web service which is receiving XML as a stream, and I'm intending to then correlate the stream into an object. Unfortunately I am hitting a road block where the sender places a large chunk of the XML within a cdata block which is then causing the correlate of %XML.Reader to not work.

The structure of my xml is along the lines of:

<abc>
    <def>010203</def>
    <ghi>040506</ghi>
    <jkl>
        <mno>
            <pqr>070809</pqr>
            <stu>101112</stu>
        </mno>
    </jkl>
</abc>

I have my objectscript classes built that conform to the above, and the correlate works when the xml as is the above, however what is sometimes happening is:

<abc>
    <def>010203</def>
    <ghi>040506</ghi>
    <jkl><![CDATA[<mno><pqr>070809</pqr><stu>101112</stu></mno>]]></jkl>
</abc>

I have seen some other posts where the answer was to add "CONTENT=MIXED" to the property where the CDATA is occurring, but this is not something that can be applied when the property is referencing another class and isn't simply a string. Additionally, one of the values within the XML being presented within a CDATA is going to be containing a base64 encoded file, so a string for this would be unsuitable.

Is there a simple way to receive xml and be able to parse the xml using %XML.Reader or a similar XML tool without being stymied by cdata wrapped data?

Discussion (3)2
Log in or sign up to continue

To handle XML containing CDATA using %XML.Reader or similar tools, you can follow these solutions:

  1. If the CDATA section contains XML or complex data, you can treat it as a raw string and then process it further once extracted. To do this:

    • Correlate the parent element containing the CDATA and retrieve it as a string.
    • Use additional XML parsing tools to process the data within the CDATA separately. For instance:

      CLASS MyClass Extends %RegisteredObject
      Property RawXML As %String;  // for CDATA content
      
      Method ProcessCDATA() As %Status {
       // parsing code for RawXML
      }
      
  2. Modify your XML-enabled classes to use the CONTENT = "MIXED" property if the structure allows. This enables the content of the CDATA to be read as raw XML. Note, however, that CONTENT = "MIXED" cannot generally be applied when the property is referencing another class [1].

  3. If you have cases where specific outputs are expected from the CDATA, you can:

    • Create a custom datatype that processes the XML content of the CDATA after correlation. The datatype would have methods to parse the CDATA into an XML structure directly [2].

These approaches work when handling variations in the data contained within CDATA blocks. Where structured object generation would be useful, specific parsing tailored to the content type would follow afterward.

Sources:

Assuming, your XML stream never has more  then MAXSTRING characters AND if you can live with a "quick-and-dirty" solution then try this one

Class DC.XML.abc Extends (%RegisteredObject, %XML.Adaptor)
{
Property def As %String;
Property ghi As %String;
Property jkl As jkl;
ClassMethod Test(str)
{
    if $isobject(str) { set:str.Rewind() xml=str.Read(str.Size) } else { set xml=str }
    for{set i=$f(xml,"<![CDATA[",$g(i)) q:'i
        set j=$f(xml,"]]>",i) zt:'j "XMLE"
        set $e(xml,j-3,j-1)="", $e(xml,i-9,i-1)="", i=i-9
    }
    
    set rdr=##class(%XML.Reader).%New()
    if 'rdr.OpenString(xml) write "OpenErr",! quit
    do rdr.Correlate("abc","DC.XML.abc")
    while rdr.Next(.abc,.st) { zzdo abc }
}
}

Class DC.XML.jkl Extends (%RegisteredObject, %XML.Adaptor)
{
Property mno As mno;
}

Class DC.XML.mno Extends (%RegisteredObject, %XML.Adaptor)
{
Property pqr As %String;
Property stu As %String;
}

And some tests...

set s1="<abc><def>010203</def><ghi>040506</ghi><jkl><mno><pqr>070809</pqr><stu>101112</stu></mno></jkl></abc>"
set s2="<abc><def>010203</def><ghi>040506</ghi><jkl><![CDATA[<mno><pqr>070809</pqr><stu>101112</stu></mno>]]></jkl></abc>"
do ##class(DC.XML.abc).Test(s1)
def................................: 010203
ghi................................: 040506
jkl.mno.pqr........................: 070809
jkl.mno.stu........................: 101112
do ##class(DC.XML.abc).Test(s2)
def................................: 010203
ghi................................: 040506
jkl.mno.pqr........................: 070809
jkl.mno.stu........................: 101112

Note: the above ZZDO command takes an oref as argument and prints it, you can replace it with a simple zw oref.