How to convert from XML to JSON?

Question

Question

Yuri Marx · Dec 8, 2020

#DTL #InterSystems IRIS

Is IRIS DTL, Adapter, Production or ObjectScript class have a function to transform an arbitrary XML to JSON?

Product version: IRIS 2020.4

Discussion (6)1

Log in or sign up to continue

Dmitry Maslennikov · Dec 8, 2020

I am not sure about something simple out of the box. I would go with creation some class for particular XML schema, which will extend %XML.Adaptor, so, will be able to be imported from XML. And extended by %JSON.Adaptor, so, it will be possible to export it to JSON. And vice-versa.

0 0

Vitaly Furman · Dec 8, 2020

I think Dmitriy is right. You would probably need to create an intermediate Object Model - a class or classes that model your XML Structure and another class that models your JSON Structure. The XML model extends %XML.Adaptor and imports your XML using tools like ImportXMLFromStream etc.., while the JSON model extends %JSON.Adaptor. Then you would step through your XML model, copying your XML properties to the properties of your JSON-aware class.

0 0

Stephen Morrison · Dec 14, 2020

> Clearly it's a popular and valid use case, and I want to know more about it.

While it's been proven (mathematically) that JSON and XML are equivalent serializations (anything that can be expressed in one format can be expressed without data loss in the other), historically and philosophically they are different beasts. Like its parent SGML, XML was originally envisioned as a generic, robust, declarative 'save file' specification between authoring software (producers) and 'smart' renderers (consumers). Over time, as needs changed, it evolved to service other application areas; many mature, complex processing tools have arisen to enable XML to address these issues, especially in the transformation and database arenas. JSON started as a net-friendly, platform-independent, datagram serialization scheme. Its roles have also diversified and its clean, flexible simplicity has lead to a position of dominance across the internet and in far reaching applications, ranging from document databases and web security to future-proofed remote procedure calls and robotics control.

From a data collection, persistence and projection point of view, XML came from a time when the prevailing opinion that, due to limited resources and platform impedance mismatches, one should make the data match the schema and throw away anything that doesn't meet expectations/anticipated need. In contrast JSON hails from the Wild West of the Web, where a typical cell phone has more computing power than the space shuttle did, data is everywhere and growing exponentially, and, perhaps most importantly, flexibility in the face of the unexpected is often a job requirement.

Veteran XML developers, especially in the DB world, often confuse JSON's lack of a formal, predefined schema for a lack of structure, consistency or means of validation; this is not a fair assumption. JSON can be quite organized and well structured, and conventions (though not yet standards) for "schema" validation exist. The key difference is that serializing _all_ the data (even if subsets of that data have radically different internal structures) is job one, and pruning, massaging and reformatting area of that data pool to align with a particular schema is an optional step, late in the process. When 'validating' XML often takes an all-or-nothing approach in the name of total data integrity; Applications relying on JSON frequently embrace more of an "Is it good enough for me to do my job?" approach and "my job" is often just one small lambda production in a larger pipeline that may never need to look at the data in its entirety.

The practical upshot of these two mind sets has direct bearing on why JSON took the web by storm (as well as contributing to the overall acceptance of JSON as the dominant serialization standard in use today where XML once reigned supreme). When dealing with cross platform communications, an XML schema is basically the lowest common denominator that all parties can agree upon; the application can't move forward unless all parties agree to move in lock-step. JSON's approach to a late-binding (if at all) schema and broad latitude with respect to what it means to be 'compliant' allows for natural evolution of the data streams; consumers of the JSON may elect to simply ignore unexpected key-value pairs any only raise exceptions if the provided data does not allow for completion of the job at hand.

Between the number of moving parts and the rate of change on the internet as a whole, XML proved to be simply too strict and too cumbersome to compete for anything other than its original purpose, a declarative mark-up language (HTML5) intended for a smart renderer. Over time, predictions that JSON's fast and loose approach to data checking would result in unmaintainable chaos, have proven (in general, there are always exceptions) to be incorrect and multiple software engineering studies have since found that application feature points built around JSON are both faster to develop initially and cheaper to maintain over the lifecycle of the application than equivalent feature points revolving around XML. This has been slowly eroding legacy web services like SOAP and SAML in favor of REST and OIDC and efforts to interact with (or convert for retirement) older XML based systems has created high demand for XML to JSON conversion utilities.

Due to this demand, there exist a number of third party utilities that will do a blind conversion from XML to JSON. In many cases this can suffice. The downside to this approach is that you don't really control the process, it's serial in / serial out without an exposed model in the middle to edit the pipeline which forces people to re-import the JSON into something they can manipulate, change, then export again to produce the final product if the default given by the third party tool doesn't cut it.

With respect to InterSystems' own technologies, we provide a number of options to address this issue (but not an out-of-the-box turnkey solution).

One approach is to make use of the XML and JSON Adaptors. You could: 1) Define (likely a series of) classes that align with the structure of your XML based on an XSD - all classes would need to be both XML and JSON enabled; 2) Import the XML document into your class structure using the XML adaptor; and, 3) Export the class tree using the JSON adaptor. This should work and is very much in keeping with XML's schema-driven mind set. It also allows for data validation, persistence, etc. but from an level of effort standpoint, you'll be paying for the ability to validate, the ability to persist, etc. whether you want/need those features or not.

The other approach is more heuristically based and 'JSON-esque' in its treatment of the data stream. Using %DynamicArray and %DynamicObject it is possible to build (blindly) an abstract entity tree in memory that represents the core elements of the XML document model and then simply call %ToJSON() on the root of the tree. This involves parsing the XML (we have utilities to help with that) and then applying a few heuristics to drive a consistent conversion:
1) A tag name/element becomes a key name in the parent of the current node
2) An attribute becomes a key name of the current node
3) A serially repeated element maps to a %DynamicArray to preserve lexical order
4) A non-repeating element with either attributes or children maps to a %DynamicObject
5) Attribute and element values of "true" and "false" map to JSON tokens true and false
6) Attribute and element values that parse as numbers map to JSON numerics
7) Non-boolean, non-numeric string values map to JSON strings
8) Simple, singular values embedded within elements map to scalars
9) Elements with a mix of text nodes and elements embedded in their children get special treatment
a) A new key is added with a conflict free name such as "0$Kids"
b) This new key is a sequence of nodes (%DynamicArray)
c) Array entries are either JSON strings (mapping from Text nodes)
or child objects (mapping from nested elements)

For example:
<record id="2" name="example" vital="false">
<unit>
<page>0</page>
<text>Hello</text>
</unit>
<unit>
<page>1</page>
<text>World</text>
</unit>
<coda>
End of <bold>sample</bold>!
</coda>
<record>

Becomes:
{
"record": {
"id":2, "name":"example", "vital":false,
"unit": [
{
"page":0, "text":"Hello"
},
{
"page":1, "text":"World"
}
},
"coda": {
"0$Kids": [
"End of",
{"bold":"sample"},
"!"
]
}
}
}

By blindly and recursively applying these rules to an XML DOM, a simple class method can build up an abstract model of the data while dispensing with XML-specific syntax. The model can then be serialized to JSON simply by calling %ToJSON() on the root. If the blind conversion needs to be cleaned up, you can tweak the resulting tree using the API for %DynamicAbstractObject, adding, deleting and restructuring specific nodes as desired before making the final call to serialize.

XML natively supports a more complex set of distinctions and constraints than basic JSON and the conversion 'rules' above dispense such XML specific artifacts so reversing the process is more difficult. Were one to write a %ToXML() method for %DynamicObject, one could start with a similar set of heuristics, but to get acceptable output from an XML perspective, there would have to be a schema (such as an XSD) supervising the process. This is quite doable, but is more complex and beyond the scope of what the original poster needs for his use case.

3 0

score 0 · Answer 1 · 2020-12-08T16:53:00-05:00

The key issue I see is: Is there a Related XML schema available.
If YES:
- you can generate a package with the existing tools
- import the file with %XML.Reader
- do an %JSON... export
The XML schema is necessary because straight XML is just TEXT with no datatypes
while JSON has data types. For XML the type of data is documented in XML Schema.

If NO:
You may call any of the public available XML to JSON converters.
They may guess rather easy for numerics vs. strings in most cases.
But detecting Boolean (true /false ) vs. Integer is somewhat mysterious to me.

My personal opinion: re-inventing this wheel is not worth the effort.
Writing an adapter makes sense

score 0 · Answer 2 · 2020-12-09T04:12:19-05:00

It's a popular question:

Can anyone tell me your use case? I encounter XML and JSON fairly often in integrations but it's always some specific schema, as defined by XSD for XML and plaintext for JSON (OpenAPI spec sometimes). So for me it's always XML ↔ Object ↔ JSON (where Object can be zero or more transformations between objects of different classes). But what's the case for a generic XML ↔ JSON with no regards for a particular schema? Clearly it's a popular and valid use case, and I want to know more about it.

score 0 · Answer 3 · 2020-12-09T05:44:50-05:00

I have a software catalog defined using XML and these XML use diferent schemas because each software vendor defined itself schema. I want transform into JSON to use IRIS DocDB as a managed repository and expose the data as REST API. This API will to allow get statistic information.