Validating XML

Problem:

I want to be able to import XML (say, using %XML.Reader) to Caché objects permissively (ignoring invalid/unexpected tags/attributes), but also to validate the same XML and list any/all invalid tags/attributes that it contains.

Possible path to a solution:

XML-enabled classes (see documentation) inherit XMLIGNOREINVALIDTAG and XMLIGNOREINVALIDATTRIBUTE parameters that can customize the treatment of unexpected tags/attributes in XML import. One heavy-handed option would be to have parallel sets of classes representing the same XML document, one with both parameters set to 0 and one with both parameter set to 1. (And different XML namespaces, of course.) But this seems like a lot of work, and I strongly suspect that %XML.Reader would give up at the first error, which doesn't help the user much if there are lots of problems in the XML document. Ideally there would be some runtime option to make a list of the times XMLIGNOREINVALIDTAG=1/XMLIGNOREINVALIDATTRIBUTE=1 made a difference.

Any thoughts/recommendations? Has anyone done this before in Caché?

  • + 1
  • 0
  • 384
  • 2
  • 1

Answers

I did not test this approach, but something along these lines can help. The general idea is that SAX parser can validate against the XML schema, and we can use that.

1. Generate schema from your XML enabled classes.

2. Call %XML.Reader:Open method, and provide it with your xml data and schema:

#include %occSAX
set reader = ##class(%XML.Reader).%New()
set reader.SAXSchemaSpec = "/path/to/schema.xsd" // "maybe "file:///path/to/schema.xsd"
set reader.SAXFlags = $$$SAXFULLDEFAULT
set sc = reader.OpenString("<xml/>")
w $System.Status.GetErrorText(sc)


3. sc should contain errors, $$$SAXFULLDEFAULT is defined in %occSAX and contains several possible values:

#; ------------------------------------------------------------------------
#; Bit flags for %XML.SAX.Parser feature selection (flags argument)
#; ------------------------------------------------------------------------
 
#; Specify this value if you want to accept the SAX defaults (see below)
#;
#define SAXDEFAULTS 27
 
#; Specify this value if you want the SAX defaults plus namespaces prefixes/
#define SAXFULLDEFAULT 95
 
#;
#; Specify this bit if you want the parser to perform validation
 
#; http://xml.org/sax/features/validation 
#; On: Report all validation errors. (default) 
#; Off: Do not report validation errors. 
 
#define SAXVALIDATION 1
 
#;
#; Specify this bit if you want the parser to recognize namespaces
 
#; http://xml.org/sax/features/namespaces 
#; On: Perform Namespace processing (default) 
#; Off: Optionally do not perform Namespace processing 
 
#define SAXNAMESPACES 2
 
#;
#; Specify this bit if you want the parser to process namespace prefixes
 
#; http://xml.org/sax/features/namespace-prefixes 
#; On: Report the original prefixed names and attributes used for Namespace declarations 
#; Off: Do not report attributes used for Namespace declarations, and optionally do not report original prefixed names (default)
 
#define SAXNAMESPACEPREFIXES 4
 
#;
#; Specify this bit if you want the parser to perform validation dynamically
 
#; http://apache.org/xml/features/validation/dynamic 
#; On: The parser will validate the document only if a grammar is specified. (http://xml.org/sax/features/validation must be true) (default)
#; Off: Validation is determined by the state of the http://xml.org/sax/features/validation feature
 
#define SAXVALIDATIONDYNAMIC 8
 
#;
#; Specify this bit if you want the parser to recognize schemas
 
#; http://apache.org/xml/features/validation/schema 
#; On: Enable the parser's schema support. (default) 
#; Off: Disable the parser's schema support. 
#define SAXVALIDATIONSCHEMA 16
 
#; Specify this bit if you want the parser to perform full schema checking
 
#; http://apache.org/xml/features/validation/schema-full-checking
#; On: Enable full schema constraint checking, including checking which may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation resriction checking are controlled by this option
#; Off: Disable full schema constraint checking (default). 
 
#define SAXVALIDATIONSCHEMAFULLCHECKING 32
 
#; http://apache.org/xml/features/validation/cache-grammarFromParse
#; On: Cache the grammar in the pool for re-use in subsequent parses
#; Off: Do not cache the grammar in the pool (default)
#; If set to true, the http://apache.org/xml/features/validation/use-cachedGrammarInParse is also set to true automatically.
#define SAXVALIDATIONREUSEGRAMMAR 64
 
#; Flags to force SAX not to validate but DO recognize namespaces and prefixes
#define SAXNOVALIDATION $$$SAXNAMESPACES+$$$SAXNAMESPACEPREFIXES

 

SAXFULLDEFAULT = All flags except SAXVALIDATIONSCHEMAFULLCHECKING.

Maybe try change the flags , but defaults seems to do what you need.

If you actually try this approach can you please post if it works or not?

Thanks Eduard, this looks promising. I'll try it out and post any interesting results.

Comments

Hi Tim,

I'm highly interested to see this result.
I tried to do something similar recently to to be able to feed back  to the sender what  was wrong if something was wrong.
I failed to get the combinations of params for XERCES parser right. XMLIGNORE* worked OK but was not so useful.

You might have a shorter link to contact Marvin Tenner then me.  wink  yes