How can I correlate xml that contains CDATA?
Hey everyone.
I have a web service which is receiving XML as a stream, and I'm intending to then correlate the stream into an object. Unfortunately I am hitting a road block where the sender places a large chunk of the XML within a cdata block which is then causing the correlate of %XML.Reader to not work.
The structure of my xml is along the lines of:
<abc><def>010203</def><ghi>040506</ghi><jkl><mno><pqr>070809</pqr><stu>101112</stu></mno></jkl></abc>I have my objectscript classes built that conform to the above, and the correlate works when the xml as is the above, however what is sometimes happening is:
<abc><def>010203</def><ghi>040506</ghi><jkl><![CDATA[<mno><pqr>070809</pqr><stu>101112</stu></mno>]]></jkl></abc>I have seen some other posts where the answer was to add "CONTENT=MIXED" to the property where the CDATA is occurring, but this is not something that can be applied when the property is referencing another class and isn't simply a string. Additionally, one of the values within the XML being presented within a CDATA is going to be containing a base64 encoded file, so a string for this would be unsuitable.
Is there a simple way to receive xml and be able to parse the xml using %XML.Reader or a similar XML tool without being stymied by cdata wrapped data?
Comments
To handle XML containing CDATA using %XML.Reader or similar tools, you can follow these solutions:
If the CDATA section contains XML or complex data, you can treat it as a raw string and then process it further once extracted. To do this:
- Correlate the parent element containing the CDATA and retrieve it as a string.
- Use additional XML parsing tools to process the data within the CDATA separately. For instance:
CLASS MyClass Extends %RegisteredObject Property RawXML As %String; // for CDATA content Method ProcessCDATA() As %Status { // parsing code for RawXML }
Modify your XML-enabled classes to use the
CONTENT = "MIXED"property if the structure allows. This enables the content of the CDATA to be read as raw XML. Note, however, thatCONTENT = "MIXED"cannot generally be applied when the property is referencing another class [1].If you have cases where specific outputs are expected from the CDATA, you can:
- Create a custom datatype that processes the XML content of the CDATA after correlation. The datatype would have methods to parse the CDATA into an XML structure directly [2].
These approaches work when handling variations in the data contained within CDATA blocks. Where structured object generation would be useful, specific parsing tailored to the content type would follow afterward.
Sources:
You would probably be better served using XPath instead of the XML reader. When you evaluate and XPath expression, it will recognize the CDATA and parse it correctly, returning either a value or another XML object.
Hey David.
I'll give XPath a try and see how I get on and update accordingly.
Edit:
It seems that trying to use xpath for the elements within the cdata block fail, and attempting to use xpath for <jkl> returns the content of the entire cdata as a stream object. However I think this is going to be a limitation of the xml I am working with rather than xpath itself.
Assuming, your XML stream never has more then MAXSTRING characters AND if you can live with a "quick-and-dirty" solution then try this one
Class DC.XML.abc Extends (%RegisteredObject, %XML.Adaptor)
{
Property def As%String;Property ghi As%String;Property jkl As jkl;ClassMethod Test(str)
{
if$isobject(str) { set:str.Rewind() xml=str.Read(str.Size) } else { set xml=str }
for{set i=$f(xml,"<![CDATA[",$g(i)) q:'i
setj=$f(xml,"]]>",i) zt:'j"XMLE"set$e(xml,j-3,j-1)="", $e(xml,i-9,i-1)="", i=i-9
}
set rdr=##class(%XML.Reader).%New()
if 'rdr.OpenString(xml) write"OpenErr",! quitdo rdr.Correlate("abc","DC.XML.abc")
while rdr.Next(.abc,.st) { zzdo abc }
}
}
Class DC.XML.jkl Extends (%RegisteredObject, %XML.Adaptor)
{
Property mno As mno;
}
Class DC.XML.mno Extends (%RegisteredObject, %XML.Adaptor)
{
Property pqr As%String;Property stu As%String;
}
And some tests...
set s1="<abc><def>010203</def><ghi>040506</ghi><jkl><mno><pqr>070809</pqr><stu>101112</stu></mno></jkl></abc>"set s2="<abc><def>010203</def><ghi>040506</ghi><jkl><![CDATA[<mno><pqr>070809</pqr><stu>101112</stu></mno>]]></jkl></abc>"do##class(DC.XML.abc).Test(s1)
def................................: 010203
ghi................................: 040506
jkl.mno.pqr........................: 070809
jkl.mno.stu........................: 101112do##class(DC.XML.abc).Test(s2)
def................................: 010203
ghi................................: 040506
jkl.mno.pqr........................: 070809
jkl.mno.stu........................: 101112Note: the above ZZDO command takes an oref as argument and prints it, you can replace it with a simple zw oref.
Hey Julius, thank you for this.
Unfortunately, there is a risk of hitting the maxstring lengths due to one of the fields within the CData blocks will be a base64 encoded document or two. But this does look interesting for other use cases.