How to write the home address right?

Primary tabs

How Tax Service, OpenStreetMap, and InterSystems IRIS
could help developers get clean addresses

 

Pieter Brueghel the Younger, Paying the Tax (The Tax Collector), 1640

 

In my previous article, we just skimmed the surface of objects. Let's continue our reconnaissance. Today's topic is a tough one. It's not quite BIG DATA, but it's still the data not easy to work with: we're talking about fairly large amounts of data. It won't all fit into RAM at once, and some of it won't even fit on the drive (not due to lack of space, but because there's a lot of junk). The name of our subject is FIAS DB: the Federal Information Address System database - the databases of addresses in Russia. The archive is 5.5 GB. And it's a compressed XML file. After extraction, it will be a full 53 GB (set aside 110 GB for extraction). And when you start to parse and convert it, that 110 GB won't be enough. There won't be enough RAM either.

That would all be fine, but you could keep digging. There's already an international open-source project for collecting and systematizing address data: OpenAddresses. Their databases will be even bigger. There is currently a lot of blank spots on their maps. Russia, for example, is almost empty. The size of this archive is 10 GB.

But first things first, we consider the database of the fairly well-known project OpenStreetMaps. It's built by volunteers, following Wikipedia's example. It's pretty thorough and multilingual. And the project has just received the 2018 The Free Software Foundation award. Right now, the size of the entire archive as a compressed XML file is 74 GB.

Speaking of addresses, there's been some unexpected news from DuckDuckGo, the best secure search engine to date, announced its move to Apple Maps. More precisely, to Apple MapKit JS. The most interesting about this for our purposes is the "improved address searches." Is Apple better than everyone else at meticulously collecting and protecting our data? We'll have to keep an eye out...

So, here's the challenge. How can we put all this address data into an easy-to-use repository, make it possible to imagine an easy API (in Python, of course), and prevent our beloved hardware from collapsing under this massive burden? Let's call this MicroBigData, mВD or µBD for short.

Just about every developer has heard the following quip. An address directory is a directory of placenames, which is a very useful thing. I don't know exactly how much of the address data in your projects is up to date. After all, there are so many regions, cities, and streets. But it seems that they're necessary for any project involving people. There are addresses where you can find someone or send them a package. Then there's the information needed for passports and other documents. And perhaps there's an address of some office or landmark that someone recommended to you. So what should you do? Where should you get it?

Without accounting for errors and duplicates, the simplest solution involves primitive objects that contain simple string literals (or string constants). Let users make additional entries. And objects know how to save themselves, as we've covered before.

Take the objects described in the class below as an example. A textbook case, albeit in the form of an USA address, but with an adjustment for our Russian dataset: postalCode instead of ZIP. I would also change the postal code to a number, but I'll leave it as a string to keep things uniform. Anyone who recognized the language (ObjectScript) right away will get a complimentary "like."

Class Sample.Address Extends %Persistent {

   Property streetName As %String;

   Property cityName As %String;

   Property areaName As %String;

   Property postalCode As %String;

}

Of course, many people will cry foul, saying all the literals are sticking out of the object. Whoever heard of an object publicly airing its fields? Let's leave it like that for now. It's quite an eloquent example, and any student could understand it.

Actually, that's all we need. We filled in the fields. Stored them. Handed them off to other objects. Someone else will inherit them. It all works. And it's stored!

But I have to say a few words about why it shouldn't be done this way. What is our object Address? Why can't it just be a group of text strings? The most obvious objections that pop up come from the context: who is using this Address, what form are they using it in, and for what purpose? Try to put your programmer logic to the side and imagine how a "foreign tourist," "historian," "tax collector," or "lawyer" think.

I'm guessing you immediately came up with a bunch of additional questions: what language and encoding to use, what time period to consider, and what kind of documents are involved in this operation: legal or postal? And a city: is that a named locality, or what? Even a street could be a boulevard, lane, avenue, or something else. How should all these important details be handled?

Let's look at a real-life example. Google is now run by Sundar Pichai. He is from India. He was born in the city of Chennai. Or is it Madras? In 1996, the residents decided that the name of the city sounded too Portuguese and renamed the capital of the state of Tamil Nadu from Madras to Chennai. So what should Sundar and his 72 million compatriots enter in their electronic documents?

In fact, there's a whole science that studies this: applied toponymy.

So, there are some follow-up questions. How should the time and date be handled? What about the obvious, money? Or geographic coordinates? And how should you implement this in your code? And will you be able to transfer it to your DBMS of choice without lowering the abstraction layer? How do you avoid the downward spiral into atomic types of machine data and constant thoughts about their reconstruction? In this case, it's worth looking for the source of a primitive or, conversely, good-quality API. Think about this at your leisure.

In short, context is the most important thing. And the object model allows us to make direct use of it by encapsulating "machine data" and implementing context-dependent "real-life" behavior. It's not that low-level tuples are arranged in tables ;-)

In the meantime, we'll return to "primitive" implementation and make things harder for ourselves. To start, we'll eliminate errors and duplicates. In other words, we'll look for a way to write addresses correctly the first time. At the same time, we'll help the UI developers provide hints for users when they're entering data into fields.

When there are two things in one place – texts and the InterSystems IRIS data platform – a developer has a real opportunity to really turn things around without stepping away from the machine. By using the embedded object components iKnow and iFind, for example. These components are meant for working with unstructured data and full-text search, respectively.

Let's try to find the data schema for OpenStreetMap. It's not as easy as it might seem at first. I do not know the exact reason, but there is no data scheme for OSM. And it would help us a lot, as will be seen below! And that would not re-invent the wheel, use a suitable XSD, which I found for you. And thanks Oliver Schrenk. Here's more pictures. I must say that for our purposes it is suitable and corresponds to the internal structure of XML files and OSM downloads. Why it is important, but the first line in XSD file should start with “<?xml...”.

Elements are the basic components of OpenStreetMap's conceptual data model of the physical world. They consist of

  • nodes – defining points in space,

  • ways – defining linear features and area boundaries, and

  • relations – which are sometimes used to explain how other elements work together.

All of the above can have one or more associated tags (which describe the meaning of a particular element). A tag consists of two items, a key and a value. Tags describe specific features of map elements: nodes, ways, or relations.

Where are the streets and cities? It's a big secret! The geometry was well taught? More about this next time. :)

In addition, we'll use the XSD schema wizard, graciously made for us by IRIS developers, for the appropriate class %XML.Adaptor fields. The percent sign at the beginning just means that this is a class from the system library. More information on it can be found in the documentation. We will perform the operations in the terminal.
 

set xmlSchema = ##class(%XML.Utils.SchemaReader).%New()

do xmlSchema.Process("/path/to/OSMSchema.xsd")


You can get the same thing from Atelier IDE (in the menu, go to Tools > Add-Ins > XML Schema Wizard):


https://habrastorage.org/webt/ed/3k/8c/ed3k8cfd9t_jc8gklg_euthhihy.png



Since we used the wizard by indicating the name of the resulting package and not parameters, they ended up in the Test package. As you can see from the second command, I passed the schema file to my local Python server:
 

python3 -m http.server 80


You can use any other http-server you want. Or load the file on your IRIS server and point  to it.

As a result, we have eight classes that completely reflect the structure of our address XML. This is main class OSM.osm:

Class OSM.osm Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] {

Parameter XMLNAME = "osm";

Parameter XMLSEQUENCE = 1;

Property bounds As OSM.bounds(XMLNAME = "bounds", XMLREF = 1) [ Required ];

Relationship node As OSM.node(XMLNAME = "node", XMLPROJECTION = "ELEMENT", XMLREF = 1) [ Cardinality = many, Inverse = osm ];

Relationship way As OSM.way(XMLNAME = "way", XMLPROJECTION = "ELEMENT", XMLREF = 1) [ Cardinality = many, Inverse = osm1 ];

Relationship relation As OSM.relation(XMLNAME = "relation", XMLPROJECTION = "ELEMENT", XMLREF = 1) [ Cardinality = many, Inverse = osm2 ];

Property version As %xsd.float(XMLNAME = "version", XMLPROJECTION = "ATTRIBUTE") [ InitialExpression = ".6", ReadOnly ];

Property generator As %String(MAXLEN = "", XMLNAME = "generator", XMLPROJECTION = "ATTRIBUTE") [ InitialExpression = "CGImap 0.0.2", ReadOnly ];

}

 

And  OSM.node:
 

Class OSM.node Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] {

Parameter XMLNAME = "node";

Parameter XMLSEQUENCE = 1;

Relationship tag As OSM.tag(XMLNAME = "tag", XMLPROJECTION = "ELEMENT", XMLREF = 1) [ Cardinality = many, Inverse = node ];

Property id As %xsd.unsignedLong(XMLNAME = "id", XMLPROJECTION = "ATTRIBUTE");

Property lat As %xsd.double(XMLNAME = "lat", XMLPROJECTION = "ATTRIBUTE");

Property lon As %xsd.double(XMLNAME = "lon", XMLPROJECTION = "ATTRIBUTE");

Property user As %String(MAXLEN = "", XMLNAME = "user", XMLPROJECTION = "ATTRIBUTE") [ SqlFieldName = _user ];

Property uid As %xsd.unsignedLong(XMLNAME = "uid", XMLPROJECTION = "ATTRIBUTE");

Property visible As %Boolean(XMLNAME = "visible", XMLPROJECTION = "ATTRIBUTE");

Property version As %xsd.unsignedLong(XMLNAME = "version", XMLPROJECTION = "ATTRIBUTE");

Property changeset As %xsd.unsignedLong(XMLNAME = "changeset", XMLPROJECTION = "ATTRIBUTE");

Property timestamp As %TimeStamp(XMLNAME = "timestamp", XMLPROJECTION = "ATTRIBUTE");

Relationship osm As OSM.osm(XMLPROJECTION = "NONE") [ Cardinality = one, Inverse = node ];

}

 

As you can see, some of the options I have already disabled as unnecessary for our solution.

The file size XML only for Russia is approximately 53 GB. You can't open it with the usual text-processing tools: they can't stomach files this large. You can take smaller samples to exercise, for example, addresses of Russia are available for the individual territories. The small volume of the Kaliningrad region in compressed format will be 18 MB, uncompressed XML file is 203 MB.

By the way, the maximum length of a string literal in InterSystems IRIS is 3,641,144 characters. In other words, loading a file or URL directly into it won't work. You can see the other limits in the documentation. To work with large amounts of data, you can use data streams that don't have length restrictions.

Let's see what we get for the set of node.

Next, we'll do things by the book. We'll create an object that understands XML as a native language by using a class from the system library %XML.Reader:
 

set reader = ##class(%XML.Reader).%New()


We'll give it instructions about what to take, and to ignore the rest. We'll take a single class:
 

do reader.Correlate("node","OSM.node")


After that, there are various ways to get the original mBD file. If convenient, you can put it next to the storage repository, locally in the file system of the IRIS server. Or, as in my example, request it to be sent via HTTP. There's also a more universal option, which I'll talk a bit about below.

set url="http://localhost/kaliningrad-latest.osm"   

write reader.OpenUrl(url)

Important! At this point, most people who try this example for themselves will encounter something horrifying. Instead of a happy "1" (everything's fine), the system will return something starting with "0, STORE..." And that will be disappointing. In other words, the file with what seems to be mBD turned out to be not so micro, and won't fit our object. There wasn't enough memory allocated to it. Can this be fixed? Absolutely. The IRIS data platform allows you to create objects up to 4 TB in RAM. So what went wrong? By default, the size of an object is 256 MB in the system settings. But we need much more than that. And remember, these are RAM requirements. Do you have enough room on your computer/server?

I experimented to determine the amount of memory we'll need to accommodate this giant: almost 170 GB. This will have to be specified in the settings (Menu > Configure memory > Max memory capacity per process (KB)) or through the system variable $ZSTORAGE (in kilobytes):

set $ZSTORAGE=170000000

Did you run a new process with the right memory settings? Then the next part is easy: we just read and save.

There's also an alternative (and probably preferable) option: use the UsePPGHandler property of the %XML.Reader class, which allows you not to store the XML in memory and works with the standard memory settings.

set reader = ##class(%XML.Reader).%New()

set reader.UsePPGHandler = 1

Next... Correlate / Read, etc. …

do reader.Next(.object)

do object.%Save()

And so on, 1,180,849 times for each operation :-) It's tedious. That's why we'll add our OSM.map class method for importing, based on the same commands:

ClassMethod Import(url) {

   Set reader = ##class(%XML.Reader).%New()

   Set reader.UsePPGHandler = 1

   Set status = reader.OpenURL(url)



   Do reader.Correlate("node","OSM.node")

   While (reader.Next(.object)) {

       Do object.%Save()

   }

     

   //back to top of XML file    

   Do reader.Rewind()

   Do reader.Correlate("way","OSM.way")

   While (reader.Next(.object)) {

       Do object.%Save()

   }

   

   Do reader.Rewind()

   Do reader.Correlate("relation","OSM.relation")

   While (reader.Next(.object)) {

       Do object.%Save()

   }
}


We'll use the power of our computer's exocortex with just one command in the terminal:

do ##class(OSM.osm).Import("http://localhost/kaliningrad-latest.osm")

And so, we get the address data from the open and maybe not a very reliable source. It's time to go through the same stages, but on the data that can be trusted. And also standardized, cleaned-up, well-documented, and made by the right government body: that's a thing of legends. To its credit, the Russian tax service is doing a good job with its digital product. To the extent that a good job is possible. To be sure, it does have its shortcomings, and data clean-up is ongoing. As to how can we solve this, let the government leaders mull that over. They're making decisions for themselves that benefit us all.

Аnd now let's get to the more incomprehensible – we'll teach Address to read the right data from our source. Fortunately, the federal tax service data set has ready-made descriptions for the XML document structure. According to the description from the FIAS website that accompanies the data, we'll need the  ADDROBJ data set that, in my case, corresponds to the file AS_ADDROBJ_2_250_01_04_01_01.xsd

Next, let’s use the XSD schema wizard. We will perform the operations in the terminal:
 

set xmlScheme = ##class(%XML.Utils.SchemaReader).%New()

do xmlScheme.Process("/path/to/AS_ADDROBJ_2_250_01_04_01_01.xsd")


As a result, we have two classes that completely reflect the structure of our address XML:

Test.AddressObjects

///  Composition and structure of the file with classifier information for FIAS DB elements in address form

Class Test.AddressObjects Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] {

   Parameter XMLNAME = "AddressObjects";

   Parameter XMLSEQUENCE = 1;


   /// Classifier for elements in address form

   Relationship Object As Test.Object(XMLNAME = "Object", XMLPROJECTION = "ELEMENT") [ Cardinality = many, Inverse = AddressObjects ];

}

Test.Object

/// Created from: http://localhost:28869/AS_ADDROBJ_2_250_01_04_01_01.xsd

Class Test.Object Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] {

   Parameter XMLNAME = "Object";

   Parameter XMLSEQUENCE = 1;

 /// Global unique identifier of the address object

Property AOGUID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "AOGUID", XMLPROJECTION = "ATTRIBUTE") [ Required ];

 /// Formal name

 Property FORMALNAME As %String(MAXLEN = 120, MINLEN = 1, XMLNAME = "FORMALNAME", XMLPROJECTION = "ATTRIBUTE") [ Required ];

/// Region code

Property REGIONCODE As %String(MAXLEN = 2, MINLEN = 2, XMLNAME = "REGIONCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];

/// Autonomy code

Property AUTOCODE As %String(MAXLEN = 1, MINLEN = 1, XMLNAME = "AUTOCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];

/// Area code

Property AREACODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "AREACODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];

/// City code

Property CITYCODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "CITYCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];

/// Code of area within city

Property CTARCODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "CTARCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];

/// Locality code

Property PLACECODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "PLACECODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];

/// Planning structure element code

Property PLANCODE As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "PLANCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];

/// Street code

Property STREETCODE As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "STREETCODE", XMLPROJECTION = "ATTRIBUTE");


   /// Code of additional element in address form

   Property EXTRCODE As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "EXTRCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];


   /// Code of subordinate additional element in address form

   Property SEXTCODE As %String(MAXLEN = 3, MINLEN = 3, XMLNAME = "SEXTCODE", XMLPROJECTION = "ATTRIBUTE") [ Required ];


   /// Official name

   Property OFFNAME As %String(MAXLEN = 120, MINLEN = 1, XMLNAME = "OFFNAME", XMLPROJECTION = "ATTRIBUTE");


   /// Postal code

   Property POSTALCODE As %String(MAXLEN = 6, MINLEN = 6, XMLNAME = "POSTALCODE", XMLPROJECTION = "ATTRIBUTE");


   /// Federal Tax Service - Private Individual code

   Property IFNSFL As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "IFNSFL", XMLPROJECTION = "ATTRIBUTE");


   /// Federal Tax Service - Private Individual territorial district code

   Property TERRIFNSFL As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "TERRIFNSFL", XMLPROJECTION = "ATTRIBUTE");

   /// Federal Tax Service - Legal Entity code

   Property IFNSUL As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "IFNSUL", XMLPROJECTION = "ATTRIBUTE");


   /// Federal Tax Service - Legal Entity territorial district code

   Property TERRIFNSUL As %String(MAXLEN = 4, MINLEN = 4, XMLNAME = "TERRIFNSUL", XMLPROJECTION = "ATTRIBUTE");


   /// Russian Classification on Objects of Administrative Division

   Property OKATO As %String(MAXLEN = 11, MINLEN = 11, XMLNAME = "OKATO", XMLPROJECTION = "ATTRIBUTE");


   /// Russian Classification of Territories of Municipal Formations

   Property OKTMO As %String(MAXLEN = 11, MINLEN = 8, XMLNAME = "OKTMO", XMLPROJECTION = "ATTRIBUTE");


   /// Date of record entry

   Property UPDATEDATE As %Date(XMLNAME = "UPDATEDATE", XMLPROJECTION = "ATTRIBUTE") [ Required ];


   /// Short name of object type

   Property SHORTNAME As %String(MAXLEN = 10, MINLEN = 1, XMLNAME = "SHORTNAME", XMLPROJECTION = "ATTRIBUTE") [ Required ];


   /// Address object level

Property AOLEVEL As %Integer(XMLNAME = "AOLEVEL", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ];


   /// Object identifier of the parent object

   Property PARENTGUID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "PARENTGUID", XMLPROJECTION = "ATTRIBUTE");


   /// Unique record identifier. Key field.

Property AOID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "AOID", XMLPROJECTION = "ATTRIBUTE") [ Required ];


   /// Record identifier associated with previous historical record

Property PREVID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "PREVID", XMLPROJECTION = "ATTRIBUTE");


   /// Record identifier associated with next historical record

Property NEXTID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "NEXTID", XMLPROJECTION = "ATTRIBUTE");


   /// Address object code in one string with validity indicator from Russian Classifier of Addresses (KLADR) 4.0.

Property CODE As %String(MAXLEN = 17, MINLEN = 0, XMLNAME = "CODE", XMLPROJECTION = "ATTRIBUTE");


   /// Address object code from KLADR 4.0 in one string without validity indicator (last two digits)

Property PLAINCODE As %String(MAXLEN = 15, MINLEN = 0, XMLNAME = "PLAINCODE", XMLPROJECTION = "ATTRIBUTE");


   /// Validity status of FIAS address object. Current address as of today's date. Usually the last entry about the address object.

   /// 0 - Not current

   /// 1 - Current

   Property ACTSTATUS As %Integer(XMLNAME = "ACTSTATUS", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ];


   /// Center status

   Property CENTSTATUS As %Integer(XMLNAME = "CENTSTATUS", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ];


   /// Operation status on record - reason for record's appearance (see description of OperationStatus table):

   /// 01 – Activation;

   /// 10 – Addition;

   /// 20 – Change;

   /// 21 – Group change;

   /// 30 – Deletion;

   /// 31 - Deletion due to the deletion of the parent object;

   /// 40 – Attachment of the address object (merger);

   /// 41 – Reassignment due to the merger of the parent object;

   /// 42 - Termination due to the attachment to another address object;

   /// 43 - Creation of a new address object due to a merger of address objects;

   /// 50 – Reassignment;

   /// 51 – Reassignment due to the reassignment of the parent object;

   /// 60 – Termination due to segmentation;

   /// 61 – Creation of a new address object due to segmentation

   Property OPERSTATUS As %Integer(XMLNAME = "OPERSTATUS", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ];


   /// KLADR 4 validity status (last two digits in the code)

   Property CURRSTATUS As %Integer(XMLNAME = "CURRSTATUS", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ];


   /// Start of record operation

   Property STARTDATE As %Date(XMLNAME = "STARTDATE", XMLPROJECTION = "ATTRIBUTE") [ Required ];


   /// End of record operation

   Property ENDDATE As %Date(XMLNAME = "ENDDATE", XMLPROJECTION = "ATTRIBUTE") [ Required ];


   /// Foreign key to requirements document

   Property NORMDOC As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "NORMDOC", XMLPROJECTION = "ATTRIBUTE");


   /// Current address object indicator

   Property LIVESTATUS As %xsd.byte(VALUELIST = ",0,1", XMLNAME = "LIVESTATUS", XMLPROJECTION = "ATTRIBUTE") [ Required ];


   /// Address type:

   /// 0 - not defined

   /// 1 - municipal;

   /// 2 - administrative/territorial

   Property DIVTYPE As %xsd.int(VALUELIST = ",0,1,2", XMLNAME = "DIVTYPE", XMLPROJECTION = "ATTRIBUTE") [ Required ];

   Relationship AddressObjects As Test.AddressObjects(XMLPROJECTION = "NONE") [ Cardinality = one, Inverse = Object ];

}


Out of the entire list of XML files in FIAS, we will only be using the file with the names of regions, cities, and streets. When I was preparing for publication, I had this one:

AS_ADDROBJ_20190106_90809714-fe22-45b2-929c-52bd950963e0.XML

Let's cook up some FIAS stuffed pepper. This is just preparation for a great future ahead. First, we'll get the initial minimum set. These are the only ingredients we'll need:

Class FIAS.AddressObject Extends (%Persistent, %XML.Adaptor) [ ProcedureBlock ] {

   Parameter XMLNAME = "Object";

   Parameter XMLSEQUENCE = 1;

   /// Global unique identifier of the address object

   Property AOGUID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "AOGUID", XMLPROJECTION = "ATTRIBUTE") [ Required ];

   /// Official name

   Property OFFNAME As %String(MAXLEN = 120, MINLEN = 1, XMLNAME = "OFFNAME", XMLPROJECTION = "ATTRIBUTE");

   /// Postal code

   Property POSTALCODE As %String(MAXLEN = 6, MINLEN = 6, XMLNAME = "POSTALCODE", XMLPROJECTION = "ATTRIBUTE");

   /// Short name of object type

   Property SHORTNAME As %String(MAXLEN = 10, MINLEN = 1, XMLNAME = "SHORTNAME", XMLPROJECTION = "ATTRIBUTE") [ Required ];

   /// Address object level

   Property AOLEVEL As %Integer(XMLNAME = "AOLEVEL", XMLPROJECTION = "ATTRIBUTE", XMLTotalDigits = 10) [ Required ];

   /// Object identifier of the parent object

   Property PARENTGUID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "PARENTGUID", XMLPROJECTION = "ATTRIBUTE");

   /// Unique record identifier. Key field.

   Property AOID As %String(MAXLEN = 36, MINLEN = 36, XMLNAME = "AOID", XMLPROJECTION = "ATTRIBUTE") [ Required ];

}

We'll create an object that understands XML as a native language by using a class from the system library %XML.Reader:

set reader = ##class(%XML.Reader).%New()


We'll give it instructions about who to take, and tell it to ignore the rest. We'll take a single serving for testing.

Next... Correlate / Read, etc. …

do reader.Correlate("Object","FIAS.AddressObject")

set url="http://localhost/AS_ADDROBJ_20190106_90809714-fe22-45b2-929c-52bd950963e0.XML"   

write reader.OpenUrl(url)


Then the next part is easy: we just read and save.

do reader.Next(.object)

do object.%Save()


And so on, 3,722,548 times for each operation :-)

It's even more exhausting than before. That's why we'll add our FIAS.AddressObject class method for importing, based on the same commands:
 

ClassMethod Import() {

       // Create object to read XML

       Set reader = ##class(%XML.Reader).%New()


       // Get source XML for parsing

       Set status = reader.OpenURL("http://localhost/AS_ADDROBJ_20190106_90809714-fe22-45b2-929c-52bd950963e0.XML")

       If $$$ISERR(status) {Do $System.Status.DisplayError(status)}


       // Join object with the right sample structure

       Do reader.Correlate("Object","FIAS.AddressObject")


       // Read and save the object in storage

       While (reader.Next(.object,.status)) {

           Set status = object.%Save()

                                If $$$ISERR(status) {do $System.Status.DisplayError(status)}

       }


      // If an error occurs during parsing, display a message

       If $$$ISERR(status) {Do $System.Status.DisplayError(status)}

   }

We'll use the power of our computer's exocortex with just one command in the terminal:

do ##class(FIAS.AddressObject).Import()


https://habrastorage.org/webt/bd/mh/t7/bdmht7k0urpyo07qcdcmkj5_nym.gif

Dinner's ready, everybody. It was mBD, and now it's a finished dish, a global with the verified names of Russian cities.

https://habrastorage.org/webt/1u/he/a6/1uhea6bdexgrpwlfpmtn2b2gdyy.png

And, finally, a few words about what to do when 4TB isn't enough. In that case, we use the streams. Everything is laid out in the documentation. You can use binary or characters. Storing in a global is also possible. Here's the recipe: take the stream, cut it into pieces, and assign it to the objects we need for consumption.

There wasn't enough room here for more on the lovely address ObjectScript objects and Python API. That will be another story.

Good news: Gartner has just completed its annual collection of real user ratings and feedback in the category of DBMS and used this information to publish its rankings of the best DBMSs of 2019. InterSystems Caché and InterSystems IRIS Data Platform received the highest rating for "Customers' Choice." You can check out which products were considered and how they were rated.

Best Operational Database Management Systems Software of 2019 as Reviewed by Customers.


https://habrastorage.org/webt/gw/jg/dd/gwjgdd1segqlvnnpfzgo9buww4y.png