Article
· May 2 3m read

Minify XML in IRIS

In a project I'm working on we need to store some arbitrary XML in the database. This XML does not have any corresponding class in IRIS, we just need to store it as a string (it's relatively small and can fit in a string).
Since there are MANY (millions!) of records in the database I decided to reduce as much as possible the size without compressing. I know that some XML to be stored is indented, some not, it varies.
To reduce the size I decided to minify  the XML, but how do I minify an XML document in IRIS?
I searched across all the classes/utilities and I could not find a ready made code/method, so I had to implement it and it turned out to be fairly simple in IRIS using %XML.TextReader class, frankly simpler than I expected.

Since this can be useful in some other context, I decided to share this little utility with the Developer Community.
I've tested with some fairly complex XML documents and works fine, here is the code.

/// Minify an XML document passed in the XmlIn Stream, the minified XML is returned in XmlOut Stream
/// If XmlOut Stream is passed, then the minified XML is stored in the passed Stream, otherwise a %Stream.TmpCharacter in returned in XmlOut.
/// Collapse = 1 (default), empty elements are collapsed, e.g. <tag></tag> is returned as <tag/>
/// ExcludeComments = 1 (default), comments are not returned in the minified XML
ClassMethod MinifyXML(XmlIn As %Stream, ByRef XmlOut As %Stream = "", Collapse As %Boolean = 1, ExcludeComments As %Boolean = 1) As %Status
{
	#Include %occSAX
	Set sc=$$$OK
	Try {
		Set Mask=$$$SAXSTARTELEMENT+$$$SAXENDELEMENT+$$$SAXCHARACTERS+$$$SAXCOMMENT
		Set sc=##class(%XML.TextReader).ParseStream(XmlIn,.reader,,$$$SAXNOVALIDATION,Mask)
		#dim reader as %XML.TextReader
		If $$$ISERR(sc) Quit
		If '$IsObject(XmlOut) {
			Set XmlOut=##class(%Stream.TmpCharacter).%New()
		}
		While reader.Read() {
			Set type=reader.NodeType
			If ((type="error")||(type="fatalerror")) {
				Set sc=$$$ERROR($$$GeneralError,"Error loading XML "_type_"-"_reader.Value)
				Quit
			}
			If type="element" {
				Do XmlOut.Write("<"_reader.Name)
				If Collapse && reader.IsEmptyElement {
					; collapse empty element
					Do XmlOut.Write("/>")
					Set ElementEnded=1
				} Else {
					; add attributes
					For k=1:1:reader.AttributeCount {
						Do reader.MoveToAttributeIndex(k)
						Do XmlOut.Write(" "_reader.Name_"="""_reader.Value_"""")
					}
					Do XmlOut.Write(">")
				}
			} ElseIf type="chars" {
				Set val=reader.Value
				Do XmlOut.Write($select((val["<")||(val[">")||(val["&"):"<![CDATA["_$replace(val,"]]>","]]]]><![CDATA[>")_"]]>",1:val))
			} ElseIf type="endelement" {
				If $g(ElementEnded) {
					; ended by collapsing
					Set ElementEnded=0
				} Else {
					Do XmlOut.Write("</"_reader.Name_">")
				}
			} ElseIf 'ExcludeComments && (type="comment") {
				Do XmlOut.Write("<!--"_reader.Value_"-->")
			}
		}
	} Catch CatchError {
		#dim CatchError as %Exception.SystemException
		Set sc=CatchError.AsStatus()
	}
	Quit sc
}

P.S.: anyone know if there is other/simpler way to minify XML in IRIS?

Discussion (3)1
Log in or sign up to continue

You edited your post after my answer 😊

Am I missing something or canonicalization does not minify the XML?

For other reasons (how data is consumed) we cannot compress it and the target property is a %String.

Maybe creating a compressed string datatype can be another option in other situations but in this case the target property/class is part of HealthShare (a Registry Slot).